MetaQuest is a comprehensive command-line bioinformatics toolkit for analyzing metagenomic datasets based on genome containment. The software processes Branchwater CSV files, downloads SRA metadata from NCBI, and provides advanced visualization and analysis capabilities including diversity analysis, interactive plotting, and taxonomic validation.
- Branchwater Integration: Process and analyze containment data from JGI Branchwater
- Intelligent SRA Management: Advanced downloading with resume capability, quality profiling, and statistical reporting
- Diversity Analysis: Calculate alpha/beta diversity metrics with statistical testing
- Interactive Visualizations: Create dynamic plots (PCA, heatmaps, diversity comparisons)
- Taxonomic Validation: Validate species names against NCBI taxonomy database
- Plugin Architecture: Extensible format handlers and visualization plugins
- Robust Implementation: Type hints, comprehensive testing (995 tests, 88%+ coverage), numerical stability
git clone https://github.com/FOI-Bioinformatics/MetaQuest.git
cd MetaQuest
make dev-install # Installs with all development dependencies# Traditional approach (still supported)
pip install -r requirements.txt
python setup.py installmake help # Show all available commands
make test # Run tests with coverage (995 tests passing, 88%+ coverage)
make lint # Run code quality checks
make check # Full quality validation
make clean # Clean build artifactsFirst, visit https://branchwater.jgi.doe.gov/ to search and download containment files for your genomes of interest. Save these CSV files to a designated folder.
Process the downloaded files to prepare them for the MetaQuest pipeline:
metaquest use_branchwater --branchwater-folder /path/to/branchwater/files --matches-folder matchesbranchwater-folder: The directory where Branchwater CSV files are located.matches-folder: The directory where the processed files will be saved.
You can extract basic metadata directly from Branchwater CSV files without downloading from NCBI:
metaquest extract_branchwater_metadata --branchwater-folder /path/to/branchwater/files --metadata-folder metadataAfter processing the Branchwater files, you can summarize the results:
metaquest parse_containment --matches-folder matches --parsed-containment-file parsed_containment.txt --summary-containment-file summary_containment.txt --step-size 0.05 --file-format branchwaterExample output: summary.txt and containment.txt
For more comprehensive metadata, you can download it from NCBI:
metaquest download_metadata --matches-folder matches --metadata-folder metadata --threshold 0.95 --email [EMAIL]matches_folder: Directory containing match files.metadata_folder: Directory where the metadata files will be saved.threshold: Only consider matches with containment above this threshold.
Once the metadata is downloaded, you can parse it to generate a more concise and readable format:
metaquest parse_metadata --metadata-folder metadata --metadata-table-file parsed_metadata.txtExample output: parsed_metadata.txt
This step helps in understanding the distribution of metadata attributes:
metaquest check_metadata_attributes --file-path parsed_metadata.txt --output-file parsed_metadata_overview.txtExample output: parsed_metadata_overview.txt
This step helps in understanding the distribution of genomes across different datasets:
metaquest count_metadata --summary-file parsed_containment.txt --metadata-file parsed_metadata.txt --metadata-column Sample_Scientific_Name --threshold 0.95 --output-file genome_counts.txtExample output: genome_counts.txt
To analyze a single sample from the summary, you can use the single_sample command:
metaquest single_sample --summary-file parsed_containment.txt --metadata-file parsed_metadata.txt --summary-column GCF_000008985.1 --metadata-column Sample_Scientific_Name --threshold 0.95Download SRA datasets with intelligent resume capability and bandwidth optimization:
# Intelligent download with resume capability
metaquest sra-download-intelligent \
--accessions-file accessions.txt \
--output-dir sra_downloads \
--max-parallel-downloads 4 \
--max-bandwidth-mbps 100 \
--resume
# Dry run to estimate download time and requirements
metaquest sra-download-intelligent \
--accessions-file accessions.txt \
--dry-runGenerate comprehensive quality profiles for downloaded SRA datasets:
# Profile multiple datasets with detailed reports
metaquest sra-profile-quality \
--accessions-file accessions.txt \
--fastq-dir sra_downloads \
--output-dir quality_profiles \
--detailed-reports
# Profile single dataset
metaquest sra-profile-quality \
--accession SRR123456 \
--fastq-dir sra_downloads \
--include-contaminationGenerate interactive HTML dashboards for SRA analysis:
# Comprehensive dashboard
metaquest sra-dashboard \
--accessions-file accessions.txt \
--output-dir dashboards \
--title "Project SRA Analysis" \
--dashboard-type full
# Quality analysis dashboard only
metaquest sra-dashboard \
--accessions-file accessions.txt \
--dashboard-type qualityPerform statistical comparisons between SRA dataset groups:
# Compare treatment vs control groups
metaquest sra-compare \
--groups-file comparison_groups.json \
--fastq-dir sra_downloads \
--statistical-tests \
--generate-reportExample groups file format:
{
"Treatment_Group": ["SRR123456", "SRR123457"],
"Control_Group": ["SRR789012", "SRR789013"]
}For additional SRA capabilities with technology detection:
# Get detailed dataset information before downloading
metaquest sra_info \
--accessions-file accessions.txt \
--email your.email@domain.com \
--output-report sra_analysis.csv
# Enhanced download with technology detection
metaquest sra_download \
--accessions-file accessions.txt \
--fastq-folder fastq \
--email your.email@domain.com \
--num-threads 8 \
--max-workers 4Plot the distribution of containment scores:
metaquest plot_containment --file-path parsed_containment.txt --column max_containment --plot-type rank --save-format png --threshold 0.05Available plot types: rank, histogram, box, violin
Visualize the distribution of metadata attributes:
metaquest plot_metadata_counts --file-path counts_Sample_Scientific_Name.txt --plot-type bar --save-format pngAvailable plot types: bar, pie, radar
Calculate comprehensive diversity metrics for your metagenomic datasets:
# Calculate alpha and beta diversity with PERMANOVA
metaquest diversity_analysis \
--abundance-file abundance_matrix.csv \
--metadata-file sample_metadata.csv \
--alpha-metrics shannon simpson chao1 \
--beta-metric bray_curtis \
--permanova-formula "treatment + site"Create dynamic, browser-based plots for data exploration:
# Interactive PCA plot
metaquest interactive_plot \
--data-file abundance_matrix.csv \
--metadata-file metadata.csv \
--plot-type pca \
--color-by treatment \
--output-file pca_plot.html
# Interactive heatmap
metaquest interactive_plot \
--data-file abundance_matrix.csv \
--plot-type heatmap \
--title "Species Abundance Heatmap"
# Diversity comparison plots
metaquest interactive_plot \
--data-file abundance_matrix.csv \
--metadata-file metadata.csv \
--plot-type diversity \
--color-by treatment_groupValidate species names against NCBI taxonomy database:
# Validate species from text file
metaquest validate_taxonomy \
--species-file species_list.txt \
--email your.email@domain.com \
--output-file validation_results.csv
# Validate from CSV with specific column
metaquest validate_taxonomy \
--species-file data.csv \
--species-column organism_name \
--email your.email@domain.comGenerate comprehensive taxonomic summaries at multiple levels:
metaquest taxonomic_summary \
--abundance-file abundance_matrix.csv \
--taxonomy-file validation_results.csv \
--levels phylum class order family genus \
--output-dir taxonomic_summariesFor comprehensive documentation including advanced features and technical details, see the docs/ directory:
- Enhanced SRA Features - Advanced SRA downloading with technology detection and statistics
- Branchwater Workflow - Detailed workflow guide for branchwater functionality
- Architecture - Technical architecture and design decisions
- CLAUDE.md - Development guidelines, testing strategies, and architectural patterns for contributors
- Test Coverage Report - Comprehensive test coverage metrics and methodology (199 tests, 88%+ coverage)
MetaQuest follows modern Python development practices with comprehensive testing and quality assurance.
- Test Coverage: 88%+ overall (from 53% baseline, 199 new tests added)
- CLI Commands: 100% coverage, including intelligent SRA commands at 86% ✅
- Data Layer: 93-99% coverage for all core modules (sra_metadata, sra_enhanced, taxonomy) ✅
- Core Processing: 92-99% coverage with comprehensive edge case testing ✅
- SRA Advanced Features: 95% coverage for reporting, quality profiling, and analytics ✅
- Visualization Plugins: Bar chart plugin at 99% coverage ✅
- Integration Tests: 12 end-to-end workflow tests ✅
- Performance Benchmarks: 25 tests with pytest-benchmark for regression detection ✅
- Code Quality: All linting checks passing ✅
Significant improvements have been implemented across the codebase:
- Intelligent SRA Package: Complete implementation of next-generation SRA capabilities including intelligent downloads with resume functionality, comprehensive quality profiling, and interactive dashboard generation
- Major Test Coverage Achievement: Improved from 53% to 88%+ with 199 new comprehensive tests across 8 files
- Extended test suites for critical modules (sra_reporting, sra_intelligent, sra_enhanced, sra_metadata, bar visualizer, taxonomy)
- Integration test suite with 12 end-to-end workflow tests
- Performance benchmarks with 25 tests using pytest-benchmark
- All modules now at 86-99% coverage
- Architecture Refinement: Orphan code removal and clean separation of concerns between data layer and advanced SRA features
- Quality Assurance: All linting violations resolved and formatting standards enforced
- Testing Best Practices: Comprehensive mocking patterns, edge case coverage, and realistic test data established
# Set up development environment
make dev-install
# Run quality checks before committing
make check # Format, lint, and type check
make test # Run tests with coverage report
make pipeline # Full integration test
# View available commands
make help- Comprehensive Test Suite: 995 tests covering CLI, data processing, visualization, and advanced SRA features
- Unit tests: 170+ tests per critical module with extended test files
- Integration tests: 12 end-to-end workflow tests (
tests/test_integration_simple.py) - Performance tests: 25 benchmarked tests (
tests/test_performance_simple.py)
- Numerical Stability: Edge cases in statistical computing properly handled (zero values, inf, nan, boundary conditions)
- Integration Testing: End-to-end workflow validation via
local_test.shand comprehensive integration suite - Mock Architecture: NCBI APIs (Entrez, Taxonomy), Plotly/Jinja2 templates, file operations, and subprocess calls systematically mocked
- Quality Assurance: Automated formatting, linting, and type checking with zero warnings
- Detailed Coverage Reports: See
FINAL_TEST_COVERAGE_REPORT.mdfor complete metrics and methodology
- Layered Architecture: CLI, Core, Data, Processing, Visualization layers
- Advanced SRA Package: Specialized module for intelligent SRA operations (
metaquest.sra) - Plugin System: Extensible format handlers and visualizers
- Command Registry: Modular CLI command architecture
- Modern Packaging: Uses
pyproject.tomlwith backward compatibility
We welcome contributions to MetaQuest. Whether you want to report a bug, suggest a feature, or contribute code, your input is valuable.
# 1. Fork and clone the repository
git clone https://github.com/YOUR-USERNAME/MetaQuest.git
cd MetaQuest
# 2. Set up development environment
make dev-install
# 3. Create a feature branch
git checkout -b feature/my-new-feature
# 4. Make your changes and test
make check # Ensure code quality
make test # Run test suite
make pipeline # Integration tests
# 5. Commit and push
git commit -m "Add new feature"
git push origin feature/my-new-feature
# 6. Create a Pull Request on GitHub- Code Quality: All contributions must pass
make check(formatting, linting, type checking) - Testing: New features should include tests with proper edge case handling
- Documentation: Update relevant documentation for new features
- Architecture: Follow existing patterns (see
CLAUDE.mdfor detailed guidance)
Help us improve MetaQuest by contributing to these areas:
- Remaining Visualization Modules: Add tests for
interactive.py,reporting.py, andplots.py(currently 0% coverage) - Processing Layer Enhancement: Optimize containment analysis algorithms and extend diversity analysis features
- Plugin Development: Add new format handlers or visualization plugins beyond bar charts
- Performance Optimization: Large-scale dataset handling and memory efficiency improvements
- Documentation: Expand workflow examples, API documentation, and testing guides
Note: Critical SRA modules, data layer, and CLI commands now have excellent coverage (86-99%). The focus has shifted to remaining visualization and processing modules.
For detailed development guidelines, architectural patterns, and comprehensive testing strategies, see CLAUDE.md.