A complete YOLOv8 object detection pipeline implemented from scratch in PyTorch — training, evaluation, fine-tuning, ONNX export, and inference. Bonus: a Rust binary for fast ONNX inference on static images or live webcam feed.
Table of Contents
- Description
- Features
- Project structure
- Installation
- Dataset format
- Usage
- Configuration files
- To contribute
- Licence
- Acknowledgments
- References
- Contact
This project is a full re-implementation of YOLOv8 in pure PyTorch — no Ultralytics dependency. It is designed to be readable, hackable, and easy to run on your own dataset. Every component (backbone, neck, head, loss, metrics, augmentations) is written from scratch and documented.
A companion Rust binary lets you run ONNX inference at native speed, either on a single image or in real-time from a webcam.
- Train from scratch on any dataset in YOLO format.
- Fine-tune a pre-trained model on a new set of classes in a few lines of config.
- Evaluate with full COCO-style metrics: mAP@0.5, mAP@0.5:0.95, Precision, Recall, F1, confusion matrix, PR curves.
- Export to ONNX (with optional FP16, graph simplification, and numerical verification).
- Inference on images from Python with a futuristic box renderer.
- Rust binary for fast ONNX inference on images or live webcam (
src/main.rs). - Rich augmentations: HSV jitter, affine transforms, MixUp, Cutout, blur, noise, grayscale.
- Cosine and linear LR schedulers with warm-up.
- Gradient accumulation, automatic checkpoint rotation, training history plots.
.
├── yolov8/
│ ├── model.py # Backbone, Neck, Head, MyYolo
│ ├── lossfn.py # TAL assigner + CIoU + DFL + BCE loss
│ ├── dataset.py # YOLODataset with augmentations
│ ├── metrics.py # NMS, mAP, MetricAccumulator
│ ├── metrics_eval.py # Full evaluation suite (curves, CSV, confusion matrix)
│ ├── config.py # Dataclasses + YAML loaders
│ ├── utils.py # Logging, model summary, history plot
│ └── entrypoints/
│ ├── train.py # Training loop
│ ├── evaluate.py # Full evaluation
│ ├── infer.py # Single-image inference
│ ├── export.py # ONNX export
│ └── finetuning.py # Build a fine-tunable checkpoint
├── configs/
│ ├── train.yaml
│ ├── eval.yaml
│ ├── infer.yaml
│ ├── export.yaml
│ └── finetune.yaml
└── src/
└── main.rs # Rust ONNX inference binary
You can install the package directly from GitHub using either pip or uv. This gives you immediate access to all CLI tools (yltrain, yleval, ylinfer, ylft, ylexport) without downloading the full repository.
With pip (works in any Python environment, no extra tools needed):
pip install git+https://github.com/cacybernetic/YOLO8With uv (faster, after installing uv):
uv pip install git+https://github.com/cacybernetic/YOLO8After installation, you can run the commands directly (see Usage) — just make sure you have the required configuration YAML files (download them from the configs/ folder if needed).
Note for contributors: if you plan to modify the code or contribute, please follow the full local installation instructions below.
1. Install uv (fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh2. Clone the repository
git clone https://github.com/cacybernetic/YOLO8
cd YOLO83. Create a virtual environment with Python 3.10
uv venv --python 3.10
source .venv/bin/activate4. Install the package and its dependencies
uv pip install -e .This reads all dependencies from pyproject.toml and also registers the command-line tools (yltrain, yleval, ylinfer, ylft, ylexport) so you can call them directly from your terminal.
Note — headless server (no display): if you are running on a server without a graphical interface, install these system libraries first:
sudo apt-get install libgl1-mesa-glx libglib2.0-0
- Download and install Python 3.10 from python.org.
- Open a command prompt inside the project folder.
- Install
uv:pip install uv
- Create the virtual environment:
uv venv --python 3.10 .venv\Scripts\activate
- Install the package and its dependencies:
uv pip install -e .
Only needed if you want to run the Rust ONNX inference binary. Skip this section if you only use the Python scripts.
- Install Rust: rustup.rs
- Build the release binary:
cargo build --release
The binary will be compiled to target/release/yolov8rust (Linux/macOS) or target\release\yolov8rust.exe (Windows). It automatically downloads ONNX Runtime on the first build.
Your dataset must follow the standard YOLO folder structure:
dataset/
├── train/
│ ├── images/ # .jpg, .png, .jpeg, ...
│ └── labels/ # one .txt per image
└── test/
├── images/
└── labels/
Each .txt label file contains one object per line:
<class_id> <cx> <cy> <w> <h>
All values are normalized between 0 and 1. Example for a single bounding box of class 0:
0 0.512 0.348 0.230 0.415
After installation, five commands are available in your terminal:
| Command | Role |
|---|---|
yltrain |
Train a model |
yleval |
Evaluate a model |
ylinfer |
Run inference on an image |
ylft |
Build a fine-tunable checkpoint |
ylexport |
Export to ONNX |
Each command takes a single --config argument pointing to its YAML file.
Edit configs/train.yaml to point to your dataset and set your number of classes, then run:
yltrain --config configs/train.yamlCheckpoints are saved in the checkpoints/ folder. The best model is saved as checkpoints/best.pt. A training history plot (loss curves) is regenerated after each epoch at checkpoints/training_history.png.
Edit configs/eval.yaml (dataset path, weights path, number of classes), then run:
yleval --config configs/eval.yamlResults are written to the results/ folder:
per_class.csv— per-class metrics (Precision, Recall, F1, AP@0.5, AP@0.5:0.95)global.csv— global metrics (mAP, losses, optimal confidence threshold, …)figures/— PR curves, F1-confidence curve, confusion matrices
Step 1 — build the fine-tunable checkpoint:
Edit configs/finetune.yaml with the source weights, the old number of classes, and the new number of classes, then run:
ylft --config configs/finetune.yamlThis creates a new .pt file with the backbone and neck transferred from the source model, and the classification heads re-initialized for the new classes.
Step 2 — train as usual:
In configs/train.yaml, set pretrained_weights to the output of step 1 and num_classes to your new class count. Optionally set freeze_feature_layers: true to only train the detection head (recommended for small datasets):
yltrain --config configs/train.yamlEdit configs/export.yaml, then run:
ylexport --config configs/export.yamlThe exported .onnx file is numerically verified against the PyTorch model by default.
Edit configs/infer.yaml (weights, number of classes, class names), then run:
ylinfer --config configs/infer.yaml --image path/to/image.jpgUseful options:
--save output.jpg— save the annotated image to disk--no-show— disable the display window (useful on a server)--conf 0.4— override the confidence threshold--iou 0.5— override the NMS IoU threshold
Two helper scripts at the project root let you run a pre-trained ONNX model
without any dependency on the yolov8 package — handy for quick demos,
deployment, or running the model on a machine where you only need
onnxruntime and a couple of small libraries.
Both scripts share the same --model, --nc, --conf, --iou, and
--names options, and accept --log-level to control verbosity. If the
model has 80 classes, the standard COCO names are used automatically;
otherwise pass a --names classes.txt file (one class name per line).
Runs on CPU only by default (uses GPU if onnxruntime-gpu is installed).
Pure numpy + Pillow for the image pipeline, no OpenCV or PyTorch needed.
python predict.py \
--model weights/best.onnx \
--nc 80 \
--image samples/photo.jpg \
--output result.jpgCommon options:
--conf 0.25— minimum confidence threshold (default 0.25)--iou 0.45— NMS IoU threshold (default 0.45)--show— display the annotated image after inference--names classes.txt— file with one class name per line
Streams predictions in real time on a webcam feed or video file using OpenCV for capture and display. Shows a live FPS counter and detection count, and can record the annotated stream to disk.
# Webcam (index 0)
python live.py --model weights/best.onnx --nc 80 --source 0
# Video file
python live.py --model weights/best.onnx --nc 80 --source path/to/video.mp4
# Headless mode + save the annotated stream
python live.py --model weights/best.onnx --nc 80 \
--source path/to/video.mp4 --output annotated.mp4 --no-show--source accepts either an integer (webcam index) or a path to a video file.
Press q or ESC in the display window to quit.
Required Python packages for these two scripts: numpy, onnxruntime,
Pillow (for predict.py), opencv-python (for live.py). They are already
included in the project dependencies, so nothing more to install.
On a single image:
./target/release/yolov8rust \
--model weights/best.onnx \
--image photo.jpg \
--output result.jpg \
--nc 80 \
--conf 0.25 \
--iou 0.45Live from webcam (uses the rustcv-based binary, compiled separately — see Cargo.toml at the root):
./target/release/yololivers \
--model weights/best.onnx \
--source 0 \
--nc 80--source 0 opens the first webcam. Press q or ESC to quit.
All behavior is controlled through YAML files in configs/. The most important fields:
| File | Key fields |
|---|---|
train.yaml |
dataset_dir, num_classes, version (n/s/m/l/x), epochs, batch_size, device |
eval.yaml |
dataset_dir, num_classes, weights, split (test or train) |
infer.yaml |
weights, num_classes, class_names, conf_threshold |
export.yaml |
weights, num_classes, output_path, simplify, half |
finetune.yaml |
pretrained_weights, old_num_classes, new_num_classes, output_weights |
All unknown keys in a YAML file are silently ignored, so you can add comments freely.
Contributions are welcome! Please follow these steps:
- Fork the repository and clone it locally.
- Create a new branch for your feature:
git checkout -b feature/my-feature - Commit your changes:
git commit -m 'Add a new feature' - Push to the branch:
git push origin feature/my-feature - Open a Pull Request.
This project is licensed under the MIT License. See the LICENSE file for details.
This project was built while learning the inner workings of YOLOv8. A huge
thank-you to dtdo90 for the excellent
educational repository
dtdo90/yolov8_detection
and the accompanying YouTube walkthrough,
both of which served as the primary reference for understanding the architecture
(backbone, neck, head).
Many implementation choices in this project — the structure of the Detect
head, and the integration of the
DFL into the box regression — are directly inspired by its work.
If you find this project useful, please consider giving the dtdo90 repository a star as a token of appreciation for the educational content that made it possible.
The implementation is based on the following papers and resources:
- TAL — Task-Aligned Assigner — Feng, C., Zhong, Y., Gao, Y., Scott, M. R., & Huang, W. (2021). TOOD: Task-Aligned One-stage Object Detection. ICCV 2021. arXiv:2108.07755
- DFL — Distribution Focal Loss — Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., & Yang, J. (2020). Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. NeurIPS 2020. arXiv:2006.04388
- CIoU Loss — Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2020). Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. AAAI 2020. arXiv:1911.08287
- Focal Loss — Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P.
(2017). Focal Loss for Dense Object Detection. ICCV 2017. Used as the
reference for the bias initialization of the new classification heads
during fine-tuning (
b = -log((1-π)/π)with π=0.01). arXiv:1708.02002
- CSPNet — Wang, C.-Y., Liao, H.-Y. M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., & Yeh, I.-H. (2020). CSPNet: A New Backbone that can Enhance Learning Capability of CNN. CVPRW 2020. Foundation of the C2f blocks used in the backbone. arXiv:1911.11929
- PAN — Path Aggregation Network — Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path Aggregation Network for Instance Segmentation. CVPR 2018. Used as the basis for the multi-scale neck. arXiv:1803.01534
- SPPF — He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. The Fast variant (SPPF) used in this project is the standard YOLOv5/v8 design. arXiv:1406.4729
- COCO evaluation protocol — Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. ECCV 2014. Source of the 101-point AP interpolation and the IoU thresholds 0.5:0.05:0.95. arXiv:1405.0312
- Survey on detection metrics — Padilla, R., Netto, S. L., & da Silva, E. A. B. (2020). A Survey on Performance Metrics for Object-Detection Algorithms. IWSSIP 2020. DOI
- dtdo90/yolov8_detection — DT Do (2024). Implementation of the YOLOv8 detection model with an accompanying YouTube tutorial.
For questions or suggestions:
- Author: DOCTOR MOKIRA — dr.mokira@gmail.com
- Maintainer: CONSOLE ART CYBERNETIC — ca.cybernetic@gmail.com
- GitHub: cacybernetic/YOLO8
