Skip to content

anup1005/Transformers-Attention_is_all_you_need

Repository files navigation

About This Project

In school, I learned topics like differentiation and matrices, but I often wondered how they were actually used in the real world. Later in college, I took a course on Deep Learning where things started to make sense.

As part of the coursework, I built a feedforward neural network from scratch—without using any AI/ML libraries—to classify images from the Fashion MNIST (FMNIST) dataset into one of 10 categories. Through this project, I learned how core mathematical concepts like matrix multiplication and differentiation play a crucial role in machine learning. I also understood how gradient descent helps the model learn by adjusting weights in the direction opposite to the gradient. It was a fun and insightful experience.

I then extended my learning by implementing a Convolutional Neural Network (CNN), which further improved image classification performance.

pytorch-transformer

More recently, I implemented the Transformer architecture from the paper "Attention is All You Need." This is a text-to-text model that I trained on a subset (English-Italian) of the OPUS Books dataset. It's fascinating to see how such models can be built from scratch.

This implementation was guided by the original paper and a helpful YouTube tutorial. In the working branch of this fork, you'll find a Kaggle notebook that:

  • Builds the Transformer model
  • Trains it for 100 iterations
  • Saves the final trained model on CPU, making it easy to use for inference on a local machine

Attention is all you need implementation

YouTube video with full step-by-step implementation: https://www.youtube.com/watch?v=ISNdQcPhsts

About

PyTorch implementation of the Transformer architecture (“Attention Is All You Need”) for English–Italian text-to-text translation, featuring encoder–decoder layers, multi-head attention, and training on a subset of the OPUS Books dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors