Skip to content

meller/alphafolddemo

Repository files navigation

AlphaFold Demo for Biology Students

An interactive Streamlit application demonstrating AI-powered protein structure prediction using AlphaFold/ESMFold. Perfect for biology students with no programming background!

What This Demo Does

  • Predicts 3D protein structures from amino acid sequences
  • Visualizes proteins in interactive 3D
  • Explains the difference between creating vs using AI models
  • Introduces modern bio-AI tools and resources
  • Encourages hands-on exploration ("vibe coding")

Quick Start (For Non-Programmers)

Prerequisites

You'll need:

  • A computer (Windows, Mac, or Linux)
  • Internet connection
  • 15 minutes for setup

Easy Installation (Recommended)

We've included automated setup scripts to make installation easier!

Windows:

  1. Double-click setup.bat
  2. Wait for installation to complete
  3. Double-click run.bat to start the app

Mac/Linux:

  1. Open Terminal in the project folder
  2. Run: bash setup.sh
  3. Run: bash run.sh to start the app

The scripts will automatically:

  • Create a virtual environment
  • Install all dependencies
  • Launch the app in your browser

Manual Installation Steps

Step 1: Install Python

Windows:

  1. Go to https://www.python.org/downloads/
  2. Download Python 3.10 or newer
  3. Run the installer
  4. IMPORTANT: Check the box "Add Python to PATH"
  5. Click "Install Now"

Mac:

  1. Open Terminal (search for "Terminal" in Spotlight)
  2. Install Homebrew if you don't have it:
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  3. Install Python:
    brew install python

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install python3 python3-pip

Step 2: Download This Project

Option A: Using Git (if you have it)

git clone <your-repository-url>
cd alphafolddemo

Option B: Download ZIP

  1. Click the green "Code" button on GitHub
  2. Click "Download ZIP"
  3. Extract the ZIP file
  4. Open Terminal/Command Prompt and navigate to the folder:
    cd path/to/alphafolddemo

Step 3: Create Virtual Environment and Install Packages

A virtual environment keeps this project's packages separate from your system Python.

Windows:

# Create virtual environment
python -m venv venv-windows

# Activate it
venv-windows\Scripts\activate

# Install packages
pip install -r requirements.txt

Mac/Linux:

# Create virtual environment
python3 -m venv venv

# Activate it
source venv/bin/activate

# Install packages
pip install -r requirements.txt

You should see (venv-windows) (Windows) or (venv) (Mac/Linux) at the beginning of your command prompt, indicating the virtual environment is active.

This will install:

  • streamlit - The web app framework
  • py3Dmol - 3D molecule visualization
  • stmol - Streamlit integration for py3Dmol
  • requests - For making API calls

Step 4: Run the App

Make sure your virtual environment is activated (you should see (venv) in your prompt), then run:

streamlit run app.py

The app will automatically open in your web browser at http://localhost:8501

If it doesn't open automatically, copy that URL into your browser.

Important: Every time you open a new terminal to run the app, you need to activate the virtual environment first:

  • Windows: venv-windows\Scripts\activate
  • Mac/Linux: source venv/bin/activate

Using the Demo

  1. Choose a protein: Select from the dropdown menu or paste your own sequence
  2. Click "Predict Structure": Wait 30-60 seconds for the prediction
  3. Explore the 3D visualization: Rotate, zoom, and examine the structure
  4. Read the educational content: Check the sidebar and tabs for learning materials
  5. Download structures: Save PDB files for further analysis

What's Included

Files in This Project

  • app.py - The main Streamlit application
  • requirements.txt - Python package dependencies
  • PRESENTATION.md - Complete presentation slides (20+ slides)
  • README.md - This file!
  • setup.sh / setup.bat - Automated setup scripts (Mac/Linux & Windows)
  • run.sh / run.bat - Quick run scripts (Mac/Linux & Windows)
  • venv/ - Mac/Linux virtual environment folder (created after setup.sh)
  • venv-windows/ - Windows virtual environment folder (created after setup.bat)

Features

  • Interactive 3D protein visualization with py3Dmol
  • Pre-loaded example proteins (insulin, lysozyme, myoglobin)
  • Educational sidebar explaining:
    • What is AlphaFold?
    • Creating vs Using AI models
    • Other bio-AI tools (Benchling, PubMed, BioRender, etc.)
    • Career paths in computational biology
  • Three educational tabs:
    • About Proteins (biology basics)
    • About the AI (how AlphaFold works)
    • Vibe Coding Tips (getting started with programming)
  • Download predictions as PDB files
  • Free API usage (ESMFold from Meta)

Example Proteins Included

  1. Human Insulin - Hormone regulating blood sugar
  2. Human Lysozyme - Antibacterial enzyme
  3. Myoglobin - Oxygen storage protein
  4. Custom - Enter your own sequence!

Troubleshooting

"streamlit: command not found"

Make sure your virtual environment is activated first:

Windows:

venv-windows\Scripts\activate
streamlit run app.py

Mac/Linux:

source venv/bin/activate
streamlit run app.py

If that doesn't work, run streamlit via Python:

Windows:

python -m streamlit run app.py

Mac/Linux:

python3 -m streamlit run app.py

"Module not found" errors

Make sure your virtual environment is activated, then reinstall:

# Activate venv first (see above)
pip install --upgrade -r requirements.txt

Virtual environment not activating

Windows (PowerShell): You may need to enable script execution:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Then try activating again:

venv-windows\Scripts\activate

Mac/Linux: Make sure you're using source:

source venv/bin/activate

Prediction fails or times out

  • Check your internet connection
  • Try a shorter protein sequence (< 200 amino acids)
  • Wait a few minutes and try again (API might be busy)

Port already in use

Change the port:

streamlit run app.py --server.port 8502

Browser doesn't open

Manually open: http://localhost:8501

Understanding the Code (For the Curious)

Main Components

# 1. Import libraries
import streamlit as st  # Web app framework
import requests          # API calls
import py3Dmol          # 3D visualization
from stmol import showmol  # Display molecules

# 2. Set up the page
st.set_page_config(...)  # Configure appearance

# 3. Create UI elements
st.header("...")         # Headers
st.text_area("...")      # Input boxes
st.button("...")         # Buttons

# 4. Make predictions
response = requests.post(api_url, data=sequence)

# 5. Visualize results
view = py3Dmol.view()    # Create viewer
view.addModel(pdb_data)  # Add structure
showmol(view)            # Display

How the Prediction Works

Your Sequence
    ↓
requests.post() sends to ESMFold API
    ↓
ESMFold AI model predicts structure
    ↓
Returns PDB file (3D coordinates)
    ↓
py3Dmol visualizes in browser

Extending This Project

Easy Modifications (No Experience Needed)

  1. Add your own protein examples:

  2. Change colors:

    • Find view.setStyle({'cartoon': {'color': 'spectrum'}})
    • Change 'spectrum' to 'red', 'blue', 'green', etc.
  3. Add information:

    • Edit the sidebar expanders
    • Add your favorite proteins or diseases

Intermediate Modifications (Some Python)

  1. Add protein properties:

    • Calculate molecular weight
    • Count amino acid types
    • Predict isoelectric point
  2. Batch predictions:

    • Upload a FASTA file with multiple sequences
    • Predict all structures
    • Download as a ZIP file
  3. Compare structures:

    • Predict two proteins
    • Visualize side-by-side
    • Calculate RMSD (structural similarity)

Advanced Modifications (More Programming)

  1. Use real AlphaFold:

    • Set up Google Colab
    • Run full AlphaFold2
    • Get highest accuracy predictions
  2. Add analysis tools:

    • Ramachandran plots
    • Secondary structure prediction
    • Binding site identification
  3. Create database:

    • Store predictions in SQLite
    • Track prediction history
    • Compare multiple versions

Resources for Learning

Biological Background

AlphaFold Resources

Learning to Code

Visualization Tools

AI Tools for Biology

Using the Presentation

The PRESENTATION.md file contains a complete 20-slide presentation covering:

  1. Introduction to protein folding
  2. The AlphaFold breakthrough
  3. Creating vs using AI models
  4. Foundation models in biology
  5. Other bio-AI tools
  6. Career paths
  7. Vibe coding philosophy
  8. Hands-on exercises
  9. Real-world impact
  10. Getting started guide

How to Present

Option 1: Convert to PowerPoint

Use a markdown-to-slides tool:

Option 2: Present from Markdown

  • Open in a markdown viewer
  • Use presentation mode in VS Code
  • Convert to PDF and present

Option 3: Create Custom Slides

  • Use the content as a script
  • Create slides in PowerPoint/Google Slides
  • Add images and animations

Suggested Presentation Flow

  1. Introduction (5 min)

    • Slides 1-3: Problem setup
  2. AlphaFold Breakthrough (10 min)

    • Slides 4-6: The solution
  3. Creating vs Using (10 min)

    • Slides 5-8: Key distinction
  4. Live Demo (10 min)

    • Slide 10: Use the app!
  5. Career Paths (10 min)

    • Slides 11-12: Future opportunities
  6. Hands-On (15 min)

    • Slide 13: Let students try it
  7. Q&A (10 min)

    • Slide 19: Discussion

Total: ~70 minutes (adjust as needed)

Common Use Cases

For Students

  • Learn structural biology: See how sequence determines structure
  • Explore proteins: Visualize your favorite proteins
  • Class projects: Use for presentations and reports
  • Research: Predict structures for your lab work

For Educators

  • Lecture demos: Show real AI in action
  • Lab exercises: Let students predict structures
  • Assignments: "Predict and analyze protein X"
  • Inspiration: Encourage computational thinking

For Researchers

  • Quick predictions: Fast structure estimation
  • Hypothesis generation: "What if this mutant..."
  • Preliminary analysis: Before expensive experiments
  • Visualization: Share structures with collaborators

Citation

If you use this demo in your work or teaching, please cite:

AlphaFold:

Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure
prediction with AlphaFold. Nature 596, 583–589 (2021).
https://doi.org/10.1038/s41586-021-03819-2

ESMFold:

Lin, Z., Akin, H., Rao, R. et al. Evolutionary-scale prediction of atomic-level
protein structure with a language model. Science 379, 1123-1130 (2023).
https://doi.org/10.1126/science.ade2574

License

This demo is for educational purposes. The AlphaFold and ESMFold models have their own licenses:

  • AlphaFold: Apache 2.0 License
  • ESMFold: MIT License
  • This demo code: MIT License (free to use, modify, share)

Contributing

Want to improve this demo? Great! Here's how:

  1. Fork the repository
  2. Make your changes
  3. Test thoroughly
  4. Submit a pull request
  5. Describe what you changed and why

Ideas for contributions:

  • Add more example proteins
  • Improve visualization options
  • Add more educational content
  • Fix bugs or improve performance
  • Translate to other languages
  • Add accessibility features

Support

Getting Help

  • GitHub Issues: Report bugs or request features
  • Discussions: Ask questions and share ideas
  • Email: [Your email if you want to provide support]

Known Limitations

  • ESMFold is slightly less accurate than AlphaFold2
  • Long sequences (>400 aa) may be slow or fail
  • No support for protein complexes (use AlphaFold-Multimer instead)
  • Requires internet connection
  • API may have rate limits

Acknowledgments

  • DeepMind for AlphaFold
  • Meta AI for ESMFold and the free API
  • Streamlit for the web framework
  • 3Dmol.js for visualization
  • The open-source bioinformatics community

FAQ

Q: Is this real AlphaFold? A: This demo uses ESMFold, a similar but faster model from Meta. For the highest accuracy, use AlphaFold2.

Q: Can I use this for my research? A: Yes! But validate critical predictions experimentally.

Q: Do I need a GPU? A: No! The prediction happens on Meta's servers.

Q: Is it free? A: Yes! ESMFold's API is currently free to use.

Q: How accurate is it? A: ~85-90% accuracy on average, similar to experimental methods for many proteins.

Q: Can I predict protein complexes? A: Not with this demo. Use AlphaFold-Multimer for complexes.

Q: What's the maximum sequence length? A: Technically ~1000 amino acids, but shorter (<400) is recommended for speed.

Q: Can I run this offline? A: No, it requires internet for the API. You could set up local AlphaFold/ESMFold for offline use.

Q: Can I modify and share this? A: Yes! It's open source (MIT License).

What's Next?

After trying this demo, consider:

  1. Explore AlphaFold Database: Download pre-computed structures
  2. Learn PyMOL/ChimeraX: Professional visualization tools
  3. Try BioPython: Analyze sequences and structures programmatically
  4. Take a course: Rosalind, Coursera, or edX bioinformatics
  5. Join a lab: Get hands-on research experience
  6. Build something: Modify this app or create your own!

Version History

  • v1.0 (2024): Initial release
    • Basic prediction and visualization
    • Educational content
    • Example proteins

Contact

Questions? Suggestions? Found a bug?

  • GitHub: [Your GitHub]
  • Email: [Your email]
  • Twitter/X: [Your handle]

Remember: You don't need to understand everything to get started. Just start exploring!

The best way to learn is by doing. Pick a protein, predict its structure, and see what you discover.

Happy folding!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors