MLX: expose `topP` / `topK` / `repetitionPenalty` (and `minP`) — `MLXLMCommon` supports them but they're hardcoded

### Summary
`MLXLanguageModel.toGenerateParameters` hardcodes `topP: 1.0` and `repetitionPenalty: nil`, leaves `topK` at its default `0`, and never reads `GenerationOptions.sampling`. So callers can only tune `temperature` and `maximumResponseTokens` for MLX — even though the underlying `MLXLMCommon.GenerateParameters` fully supports `topP`, `topK`, `minP`, `repetitionPenalty`, and context sizes. This is pure plumbing, not new capability.

### Evidence
- AnyLanguageModel hardcodes them — [MLXLanguageModel.swift L1217-L1230 (v0.8.0)](https://github.com/huggingface/AnyLanguageModel/blob/0.8.0/Sources/AnyLanguageModel/Models/MLXLanguageModel.swift#L1217-L1230):
  ```swift
  temperature: Float(options.temperature ?? 0.6),
  topP: 1.0,             // hardcoded
  repetitionPenalty: nil, // hardcoded
  repetitionContextSize: 20
  // topK not set (defaults to 0); options.sampling never read
  ```
- The backend already supports the full set — [MLXLMCommon `GenerateParameters`, Evaluate.swift L54-L103](https://github.com/ml-explore/mlx-swift-lm/blob/main/Libraries/MLXLMCommon/Evaluate.swift#L54-L103): `temperature`, `topP`, `topK`, `minP`, `repetitionPenalty`, `repetitionContextSize`, `presencePenalty`, `frequencyPenalty` — and the sampler uses `topP`/`topK`/`minP` ([L142-L151](https://github.com/ml-explore/mlx-swift-lm/blob/main/Libraries/MLXLMCommon/Evaluate.swift#L142-L151)).
- `CustomGenerationOptions` currently exposes only `kvCache` / `userInputProcessing` / `additionalContext` — no sampler fields.

### Use case
Per-model tuning (deterministic vs creative) and, importantly, `repetitionPenalty` to curb the repetition loops MLX chat models are prone to. These are common knobs that a hosting app surfaces per agent/model.

### Proposed approach
Because `GenerationOptions.SamplingMode` is opaque/not easily destructured by callers, the cleanest route is to extend the library's own `CustomGenerationOptions` with sampler fields and read them in the mapper:

```swift
public struct CustomGenerationOptions: AnyLanguageModel.CustomGenerationOptions, Codable {
    // existing: kvCache, userInputProcessing, additionalContext
    public var topP: Float?
    public var topK: Int?
    public var minP: Float?
    public var repetitionPenalty: Float?
    public var repetitionContextSize: Int?
}

private func toGenerateParameters(_ options: GenerationOptions) -> MLXLMCommon.GenerateParameters {
    let custom = options[custom: MLXLanguageModel.self]
    return MLXLMCommon.GenerateParameters(
        maxTokens: options.maximumResponseTokens,
        maxKVSize: custom?.kvCache.maxSize,
        kvBits: custom?.kvCache.bits,
        kvGroupSize: custom?.kvCache.groupSize ?? 64,
        quantizedKVStart: custom?.kvCache.quantizedStart ?? 0,
        temperature: Float(options.temperature ?? 0.6),
        topP: custom?.topP ?? 1.0,
        topK: custom?.topK ?? 0,
        repetitionPenalty: custom?.repetitionPenalty,
        repetitionContextSize: custom?.repetitionContextSize ?? 20
    )
}
```
(Optionally also map `GenerationOptions.sampling` → `topP`/`topK` when present, for parity with the Foundation backend.)

### Acceptance
Setting `topP` / `topK` / `repetitionPenalty` via `CustomGenerationOptions` measurably changes MLX sampling output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MLX: expose `topP` / `topK` / `repetitionPenalty` (and `minP`) — `MLXLMCommon` supports them but they're hardcoded #165

Summary

Evidence

Use case

Proposed approach

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

MLX: expose topP / topK / repetitionPenalty (and minP) — MLXLMCommon supports them but they're hardcoded #165

Description

Summary

Evidence

Use case

Proposed approach

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

MLX: expose `topP` / `topK` / `repetitionPenalty` (and `minP`) — `MLXLMCommon` supports them but they're hardcoded #165