Skip to content

MLX: expose topP / topK / repetitionPenalty (and minP) — MLXLMCommon supports them but they're hardcoded #165

Description

@james-333i

Summary

MLXLanguageModel.toGenerateParameters hardcodes topP: 1.0 and repetitionPenalty: nil, leaves topK at its default 0, and never reads GenerationOptions.sampling. So callers can only tune temperature and maximumResponseTokens for MLX — even though the underlying MLXLMCommon.GenerateParameters fully supports topP, topK, minP, repetitionPenalty, and context sizes. This is pure plumbing, not new capability.

Evidence

  • AnyLanguageModel hardcodes them — MLXLanguageModel.swift L1217-L1230 (v0.8.0):
    temperature: Float(options.temperature ?? 0.6),
    topP: 1.0,             // hardcoded
    repetitionPenalty: nil, // hardcoded
    repetitionContextSize: 20
    // topK not set (defaults to 0); options.sampling never read
  • The backend already supports the full set — MLXLMCommon GenerateParameters, Evaluate.swift L54-L103: temperature, topP, topK, minP, repetitionPenalty, repetitionContextSize, presencePenalty, frequencyPenalty — and the sampler uses topP/topK/minP (L142-L151).
  • CustomGenerationOptions currently exposes only kvCache / userInputProcessing / additionalContext — no sampler fields.

Use case

Per-model tuning (deterministic vs creative) and, importantly, repetitionPenalty to curb the repetition loops MLX chat models are prone to. These are common knobs that a hosting app surfaces per agent/model.

Proposed approach

Because GenerationOptions.SamplingMode is opaque/not easily destructured by callers, the cleanest route is to extend the library's own CustomGenerationOptions with sampler fields and read them in the mapper:

public struct CustomGenerationOptions: AnyLanguageModel.CustomGenerationOptions, Codable {
    // existing: kvCache, userInputProcessing, additionalContext
    public var topP: Float?
    public var topK: Int?
    public var minP: Float?
    public var repetitionPenalty: Float?
    public var repetitionContextSize: Int?
}

private func toGenerateParameters(_ options: GenerationOptions) -> MLXLMCommon.GenerateParameters {
    let custom = options[custom: MLXLanguageModel.self]
    return MLXLMCommon.GenerateParameters(
        maxTokens: options.maximumResponseTokens,
        maxKVSize: custom?.kvCache.maxSize,
        kvBits: custom?.kvCache.bits,
        kvGroupSize: custom?.kvCache.groupSize ?? 64,
        quantizedKVStart: custom?.kvCache.quantizedStart ?? 0,
        temperature: Float(options.temperature ?? 0.6),
        topP: custom?.topP ?? 1.0,
        topK: custom?.topK ?? 0,
        repetitionPenalty: custom?.repetitionPenalty,
        repetitionContextSize: custom?.repetitionContextSize ?? 20
    )
}

(Optionally also map GenerationOptions.samplingtopP/topK when present, for parity with the Foundation backend.)

Acceptance

Setting topP / topK / repetitionPenalty via CustomGenerationOptions measurably changes MLX sampling output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions