In school, I learned topics like differentiation and matrices, but I often wondered how they were actually used in the real world. Later in college, I took a course on Deep Learning where things started to make sense.
As part of the coursework, I built a feedforward neural network from scratch—without using any AI/ML libraries—to classify images from the Fashion MNIST (FMNIST) dataset into one of 10 categories. Through this project, I learned how core mathematical concepts like matrix multiplication and differentiation play a crucial role in machine learning. I also understood how gradient descent helps the model learn by adjusting weights in the direction opposite to the gradient. It was a fun and insightful experience.
I then extended my learning by implementing a Convolutional Neural Network (CNN), which further improved image classification performance.
More recently, I implemented the Transformer architecture from the paper "Attention is All You Need." This is a text-to-text model that I trained on a subset (English-Italian) of the OPUS Books dataset. It's fascinating to see how such models can be built from scratch.
This implementation was guided by the original paper and a helpful YouTube tutorial. In the working branch of this fork, you'll find a Kaggle notebook that:
- Builds the Transformer model
- Trains it for 100 iterations
- Saves the final trained model on CPU, making it easy to use for inference on a local machine
Attention is all you need implementation
YouTube video with full step-by-step implementation: https://www.youtube.com/watch?v=ISNdQcPhsts