Summary
MLXLanguageModel.toGenerateParameters hardcodes topP: 1.0 and repetitionPenalty: nil, leaves topK at its default 0, and never reads GenerationOptions.sampling. So callers can only tune temperature and maximumResponseTokens for MLX — even though the underlying MLXLMCommon.GenerateParameters fully supports topP, topK, minP, repetitionPenalty, and context sizes. This is pure plumbing, not new capability.
Evidence
- AnyLanguageModel hardcodes them — MLXLanguageModel.swift L1217-L1230 (v0.8.0):
temperature: Float(options.temperature ?? 0.6),
topP: 1.0, // hardcoded
repetitionPenalty: nil, // hardcoded
repetitionContextSize: 20
// topK not set (defaults to 0); options.sampling never read
- The backend already supports the full set — MLXLMCommon
GenerateParameters, Evaluate.swift L54-L103: temperature, topP, topK, minP, repetitionPenalty, repetitionContextSize, presencePenalty, frequencyPenalty — and the sampler uses topP/topK/minP (L142-L151).
CustomGenerationOptions currently exposes only kvCache / userInputProcessing / additionalContext — no sampler fields.
Use case
Per-model tuning (deterministic vs creative) and, importantly, repetitionPenalty to curb the repetition loops MLX chat models are prone to. These are common knobs that a hosting app surfaces per agent/model.
Proposed approach
Because GenerationOptions.SamplingMode is opaque/not easily destructured by callers, the cleanest route is to extend the library's own CustomGenerationOptions with sampler fields and read them in the mapper:
public struct CustomGenerationOptions: AnyLanguageModel.CustomGenerationOptions, Codable {
// existing: kvCache, userInputProcessing, additionalContext
public var topP: Float?
public var topK: Int?
public var minP: Float?
public var repetitionPenalty: Float?
public var repetitionContextSize: Int?
}
private func toGenerateParameters(_ options: GenerationOptions) -> MLXLMCommon.GenerateParameters {
let custom = options[custom: MLXLanguageModel.self]
return MLXLMCommon.GenerateParameters(
maxTokens: options.maximumResponseTokens,
maxKVSize: custom?.kvCache.maxSize,
kvBits: custom?.kvCache.bits,
kvGroupSize: custom?.kvCache.groupSize ?? 64,
quantizedKVStart: custom?.kvCache.quantizedStart ?? 0,
temperature: Float(options.temperature ?? 0.6),
topP: custom?.topP ?? 1.0,
topK: custom?.topK ?? 0,
repetitionPenalty: custom?.repetitionPenalty,
repetitionContextSize: custom?.repetitionContextSize ?? 20
)
}
(Optionally also map GenerationOptions.sampling → topP/topK when present, for parity with the Foundation backend.)
Acceptance
Setting topP / topK / repetitionPenalty via CustomGenerationOptions measurably changes MLX sampling output.
Summary
MLXLanguageModel.toGenerateParametershardcodestopP: 1.0andrepetitionPenalty: nil, leavestopKat its default0, and never readsGenerationOptions.sampling. So callers can only tunetemperatureandmaximumResponseTokensfor MLX — even though the underlyingMLXLMCommon.GenerateParametersfully supportstopP,topK,minP,repetitionPenalty, and context sizes. This is pure plumbing, not new capability.Evidence
GenerateParameters, Evaluate.swift L54-L103:temperature,topP,topK,minP,repetitionPenalty,repetitionContextSize,presencePenalty,frequencyPenalty— and the sampler usestopP/topK/minP(L142-L151).CustomGenerationOptionscurrently exposes onlykvCache/userInputProcessing/additionalContext— no sampler fields.Use case
Per-model tuning (deterministic vs creative) and, importantly,
repetitionPenaltyto curb the repetition loops MLX chat models are prone to. These are common knobs that a hosting app surfaces per agent/model.Proposed approach
Because
GenerationOptions.SamplingModeis opaque/not easily destructured by callers, the cleanest route is to extend the library's ownCustomGenerationOptionswith sampler fields and read them in the mapper:(Optionally also map
GenerationOptions.sampling→topP/topKwhen present, for parity with the Foundation backend.)Acceptance
Setting
topP/topK/repetitionPenaltyviaCustomGenerationOptionsmeasurably changes MLX sampling output.