A multilingual sentiment analysis web application that classifies input text as Positive, Neutral, or Negative using a transformer-based NLP model.
Built with Hugging Face Transformers and Gradio, and deployed on Hugging Face Spaces.
K. Siddhartha — AI / NLP Developer
🔗 GitHub: https://github.com/k-siddhartha-ai 🤗 Hugging Face: https://huggingface.co/Siddhartha001
This project demonstrates real-time sentiment analysis across multiple languages using a pretrained multilingual transformer model. Users can enter text in different languages and instantly receive:
- Sentiment label
- Confidence score
The goal of this project is to showcase practical deployment of multilingual NLP models using a lightweight web interface.
Model: cardiffnlp/twitter-xlm-roberta-base-sentiment
Architecture: XLM-RoBERTa (Transformer Encoder)
- Supports sentiment analysis across ~100 languages
- Strong multilingual generalization capability
- Pretrained on large-scale Twitter datasets
- Balanced trade-off between accuracy and inference speed
- Positive
- Neutral
- Negative
Frontend: Gradio Interface Inference Layer: Hugging Face Transformers Pipeline Model: XLM-RoBERTa Multilingual Transformer Deployment: Hugging Face Spaces (CPU)
Flow:
User Input → Tokenizer → Transformer Model → Sentiment Prediction → UI Output
- 🌐 Multilingual sentiment detection (~100 languages)
- 📊 Confidence score for predictions
- ⚡ Real-time inference
- 🖥️ Interactive web interface
- ☁️ Cloud deployment via Hugging Face Spaces
- Python
- Hugging Face Transformers
- Gradio
- PyTorch (via Transformers)
Clone the repository and install dependencies:
pip install -r requirements.txt
python app.py
Then open the local Gradio URL shown in the terminal.
Hugging Face Space:
https://huggingface.co/spaces/Siddhartha001/multilingual-sentiment-analysis
- Model size: ~1GB
- CPU inference latency: ~1–3 seconds per request
- Maximum input length: 512 tokens
- Model is trained primarily on Twitter data; performance may vary on long formal text.
- Very long inputs are truncated to 512 tokens.
- Sarcasm, slang, or mixed-language content may reduce accuracy.
- GPU acceleration for faster inference
- Language detection before prediction
- Batch input support
- Sentiment visualization UI
This project uses a pretrained model provided by CardiffNLP under its respective license.
