Skip to content

hemantjuyal/LLM-Distillation-Lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Distillation Lab

An experiment demonstrating instruction-following distillation, enabling the transfer of knowledge from large language models (Teachers) into smaller, efficient models (Students) using LoRA-based fine-tuning and structured LLM-based evaluation.

Purpose

The LLM-Distillation-Lab provides a complete end-to-end pipeline for distilling chat capabilities. By using a high-performance Teacher model to generate gold-standard responses and an LLM Judge to quantify performance, this lab allows researchers to:

  • Minimize Model Size: Transition from multi-billion parameter models to edge-capable 1.1B models.
  • Maintain Quality: Use instruction-following distillation to preserve reasoning and style.
  • Automate Evaluation: Eliminate human-in-the-loop bottlenecks using structured LLM judging.

Architecture Overview

  • dataset: Logic for teacher data generation and .jsonl storage.
  • evaluation: Scripts for LLM-based judging and result CSV generation.
  • training: LoRA distillation implementation using Hugging Face Trainer.
  • prompts: Source instruction set for the experiment.
  • models: Storage for distilled adapters and checkpoints.
  • logs: Comprehensive timestamped execution logs for debugging.

Distillation Pipeline

The pipeline follows a structured sequence from data synthesis to post-training validation.

%%{init: {'theme': 'neutral'}}%%
graph TD
    subgraph "Phase 1: Data Generation"
        A[prompts.txt] --> B{Teacher Model}
        B -- "Qwen2.5-3B-Instruct" --> C[distillation_dataset.jsonl]
    end

    subgraph "Phase 2: Baseline Evaluation"
        C --> D{LLM Judge}
        D -- "M-Prometheus-3B" --> E[baseline_results.csv]
    end

    subgraph "Phase 3: Distillation (LoRA)"
        C --> F[Student Training]
        F -- "TinyLlama-1.1B + LoRA" --> G[Distilled Adapter]
    end

    subgraph "Phase 4: Post-Training Validation"
        G --> H{Final Evaluation}
        H -- "Compare: Teacher vs Base vs Distilled" --> I[Post-Training Summary]
    end
Loading

Tech Stack and Models

Role Model Description
Teacher Qwen2.5-3B-Instruct High-quality reasoning and instruction following.
Student TinyLlama-1.1B-3T Lightweight base model for edge deployment.
Judge M-Prometheus-3B Specialized evaluator for scoring response quality.

Core Technologies:

  • Frameworks: PyTorch, Hugging Face Transformers
  • Efficient Tuning: PEFT (Parameter-Efficient Fine-Tuning) via LoRA.
  • Data Handling: Pandas, Datasets
  • Optimization: Accelerate for local hardware utilization.

Getting Started

1. Setup Environment

Ensure you have uv or pip installed. We recommend using a virtual environment.

# Initialize venv and install dependencies
python3 -m venv .venv
source .venv/bin/activate
uv pip install -U -r requirements.txt

2. Execution Workflow

The pipeline can be executed step-by-step or in a single command.

Step 1: Generate Distillation Data

Synthesize teacher responses from source prompts in prompts/prompts.txt.

python dataset/generate_teacher_data.py

Step 2: Baseline Evaluation

Score the base student model before training to establish a performance floor.

python evaluation/baseline_evaluation.py

Step 3: Distill the Student

Execute LoRA fine-tuning. The distilled adapter will be saved in models/.

python training/distill_student.py

Step 4: Post-Training Evaluation

Compare the distilled student against the teacher and baseline.

python evaluation/post_training_evaluation.py

Outputs and Monitoring

Every run generates structured artifacts for analysis:

  • Metrics: evaluation/summary.txt provides a quick overview of judge scores.
  • Detailed Logs: Found in logs/ with DEBUG level granularity.
  • Model Weights: Saved as PEFT adapters in models/<model_id>/distilled_/.

About

An experiment demonstrating instruction-following distillation, enabling the transfer of knowledge from large language models (Teachers) into smaller, efficient models (Students) using LoRA-based fine-tuning and structured LLM-based evaluation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages