An experiment demonstrating instruction-following distillation, enabling the transfer of knowledge from large language models (Teachers) into smaller, efficient models (Students) using LoRA-based fine-tuning and structured LLM-based evaluation.
The LLM-Distillation-Lab provides a complete end-to-end pipeline for distilling chat capabilities. By using a high-performance Teacher model to generate gold-standard responses and an LLM Judge to quantify performance, this lab allows researchers to:
- Minimize Model Size: Transition from multi-billion parameter models to edge-capable 1.1B models.
- Maintain Quality: Use instruction-following distillation to preserve reasoning and style.
- Automate Evaluation: Eliminate human-in-the-loop bottlenecks using structured LLM judging.
- dataset: Logic for teacher data generation and .jsonl storage.
- evaluation: Scripts for LLM-based judging and result CSV generation.
- training: LoRA distillation implementation using Hugging Face Trainer.
- prompts: Source instruction set for the experiment.
- models: Storage for distilled adapters and checkpoints.
- logs: Comprehensive timestamped execution logs for debugging.
The pipeline follows a structured sequence from data synthesis to post-training validation.
%%{init: {'theme': 'neutral'}}%%
graph TD
subgraph "Phase 1: Data Generation"
A[prompts.txt] --> B{Teacher Model}
B -- "Qwen2.5-3B-Instruct" --> C[distillation_dataset.jsonl]
end
subgraph "Phase 2: Baseline Evaluation"
C --> D{LLM Judge}
D -- "M-Prometheus-3B" --> E[baseline_results.csv]
end
subgraph "Phase 3: Distillation (LoRA)"
C --> F[Student Training]
F -- "TinyLlama-1.1B + LoRA" --> G[Distilled Adapter]
end
subgraph "Phase 4: Post-Training Validation"
G --> H{Final Evaluation}
H -- "Compare: Teacher vs Base vs Distilled" --> I[Post-Training Summary]
end
| Role | Model | Description |
|---|---|---|
| Teacher | Qwen2.5-3B-Instruct | High-quality reasoning and instruction following. |
| Student | TinyLlama-1.1B-3T | Lightweight base model for edge deployment. |
| Judge | M-Prometheus-3B | Specialized evaluator for scoring response quality. |
Core Technologies:
- Frameworks: PyTorch, Hugging Face Transformers
- Efficient Tuning: PEFT (Parameter-Efficient Fine-Tuning) via LoRA.
- Data Handling: Pandas, Datasets
- Optimization: Accelerate for local hardware utilization.
Ensure you have uv or pip installed. We recommend using a virtual environment.
# Initialize venv and install dependencies
python3 -m venv .venv
source .venv/bin/activate
uv pip install -U -r requirements.txtThe pipeline can be executed step-by-step or in a single command.
Synthesize teacher responses from source prompts in prompts/prompts.txt.
python dataset/generate_teacher_data.pyScore the base student model before training to establish a performance floor.
python evaluation/baseline_evaluation.pyExecute LoRA fine-tuning. The distilled adapter will be saved in models/.
python training/distill_student.pyCompare the distilled student against the teacher and baseline.
python evaluation/post_training_evaluation.pyEvery run generates structured artifacts for analysis:
- Metrics: evaluation/summary.txt provides a quick overview of judge scores.
- Detailed Logs: Found in logs/ with DEBUG level granularity.
- Model Weights: Saved as PEFT adapters in models/<model_id>/distilled_/.