Docker wrapper repository for the DNMB core package.

Only one thing is required from the user:
- one or more input GenBank files (
*.gb,*.gbk, or*.gbff)
- installs the DNMB core package inside the container image
- uses a shared host cache at
~/.dnmb-cache - downloads module databases and tool assets on first use when needed
- records module metadata in the shared cache so stale installs can be refreshed
- reuses the shared cache on later runs
The simplest way to use DNMBsuite is the published Docker image.
Pull the image:
docker pull ghcr.io/jaeyoonsung/dnmbsuite:latestOn Linux, create the shared cache first so the container can reuse it without leaving root-owned files behind:
mkdir -p "$HOME/.dnmb-cache"Run from a folder that already contains one or more GenBank files:
cd [/path/to/folder/with/genbank]
docker run --rm \
--user "$(id -u):$(id -g)" \
-v "$PWD:/data" \
-v "$HOME/.dnmb-cache:/opt/dnmb-cache" \
ghcr.io/jaeyoonsung/dnmbsuite:latestRun a single GenBank file explicitly:
docker run --rm \
--user "$(id -u):$(id -g)" \
-v /path/to/parent-dir:/data \
-v "$HOME/.dnmb-cache:/opt/dnmb-cache" \
ghcr.io/jaeyoonsung/dnmbsuite:latest \
/data/GCF_030369615.1.gbffRun selected modules only with direct Docker:
docker run --rm \
--user "$(id -u):$(id -g)" \
-e DNMB_MODULES=defensefinder,padloc,defensepredictor,iselement,prophage \
-e DNMB_MODULE_CPU=8 \
-v "$PWD:/data" \
-v "$HOME/.dnmb-cache:/opt/dnmb-cache" \
ghcr.io/jaeyoonsung/dnmbsuite:latestRun the anti-defense stack only with direct Docker:
docker run --rm \
--user "$(id -u):$(id -g)" \
-e DNMB_MODULES=defensefinder,dbapis,acrfinder \
-e DNMB_MODULE_CPU=8 \
-v "$PWD:/data" \
-v "$HOME/.dnmb-cache:/opt/dnmb-cache" \
ghcr.io/jaeyoonsung/dnmbsuite:latestRun while disabling selected modules with direct Docker:
docker run --rm \
--user "$(id -u):$(id -g)" \
-e DNMB_SKIP_MODULES=interproscan,eggnog \
-e DNMB_MODULE_CPU=8 \
-v "$PWD:/data" \
-v "$HOME/.dnmb-cache:/opt/dnmb-cache" \
ghcr.io/jaeyoonsung/dnmbsuite:latestWhat this does:
- uses the working directory as
/data - mounts
~/.dnmb-cacheto/opt/dnmb-cache - detects
*.gb,*.gbk, or*.gbffautomatically in folder mode - writes outputs back into the same host folder
- keeps raw InterProScan TSV outputs inside
dnmb_interproscan/ - on Linux, runs as your current host UID/GID in the direct
docker runexamples so output and cache files stay writable by you - on arm64 hosts, add
--platform linux/amd64to the directdocker runcommand because the published image is currentlyamd64 - lets you control module selection through
DNMB_MODULES,DNMB_SKIP_MODULES,DNMB_MODULE_CPU,DNMB_PROPHAGE_BACKEND,DNMB_CLEAN_PREVIOUS,DNMB_COMPARATIVE, andDNMB_COMPARATIVE_DATA_ROOT
If you prefer not to type the Docker command manually, use the bundled shell launcher.
Setup:
git clone https://github.com/JAEYOONSUNG/DNMBsuite.git
cd DNMBsuiteRun with default output location:
bash run-dnmb.sh /path/to/GCF_030369615.1.gbffRun with a custom output directory:
bash run-dnmb.sh /path/to/GCF_030369615.1.gbff /path/to/output-dirRun with selected modules only:
bash run-dnmb.sh /path/to/GCF_030369615.1.gbff --modules defensefinder,iselement,prophage
bash run-dnmb.sh /path/to/GCF_030369615.1.gbff --modules defensefinder,padloc,defensepredictor,iselement,prophageRun the anti-defense stack only:
bash run-dnmb.sh /path/to/GCF_030369615.1.gbff --modules defensefinder,dbapis,acrfinderRun while disabling selected modules:
bash run-dnmb.sh /path/to/GCF_030369615.1.gbff --skip-modules interproscan,eggnog --cpu 8Opt in to a startup DNMB refresh from GitHub:
bash run-dnmb.sh /path/to/GCF_030369615.1.gbff --dnmb-auto-update --dnmb-auto-update-branch masterWhat this launcher does:
- pulls
ghcr.io/jaeyoonsung/dnmbsuite:latestautomatically when missing - mounts the output directory to
/data - mounts
~/.dnmb-cacheto/opt/dnmb-cache - copies the input GenBank file into the output directory when needed
- runs the same built-in container launcher
- keeps the bundled DNMB package fixed unless you explicitly opt in to
DNMB_AUTO_UPDATE=1
In single-file mode, DNMBsuite stages only that file, runs the analysis, and writes outputs back next to the original input file.
Supported module names for --modules and --skip-modules:
interproscaneggnogcleandbcanrebasefinderdefensefinderdbapisacrfinderpadlocdefensepredictormeropspazygapmindiselementphispyvirsorter2pideprophage(deprecated alias — maps to the backend chosen bymodule_Prophage_backend, defaultphispy)
Special values for --modules:
allnonecore
CLEAN and PIDE are both GPU-heavy: CLEAN runs LayerNormNet
embeddings, and PIDE runs an ESM-650M protein language model. Both run
~50–100× faster on a CUDA GPU than on CPU.
- The direct Docker entrypoint and
run-dnmb.shboth probenvidia-smi -Land turnCLEAN/PIDEon only when a CUDA GPU is detected; otherwise they are skipped so a typical laptop run completes in minutes rather than hours. - When CUDA is detected,
run-dnmb.shalso adds--gpus alltodocker runso the container can reach the GPU, and exportsDNMB_CUDA=TRUEfor the R defaults to agree. - Force CPU-based execution only when you explicitly want the slow CPU path by
setting
DNMB_FORCE_CPU_HEAVY=1or selecting the modules directly withDNMB_MODULES=clean,pide/--modules clean,pide. - Force-disable them with
DNMB_CUDA=0,DNMB_SKIP_MODULES=clean,pide, or--skip-modules clean,pide.
Direct Docker example for forcing the CPU path:
docker run --rm \
--platform linux/amd64 \
--user "$(id -u):$(id -g)" \
-e DNMB_FORCE_CPU_HEAVY=1 \
-v "$PWD:/data" \
-v "$HOME/.dnmb-cache:/opt/dnmb-cache" \
ghcr.io/jaeyoonsung/dnmbsuite:latestOutputs are written into the same data/ directory that contains the input
GenBank file.
Example output structure:
data/
├── GCF_030369615.1.gbff
├── GCF_030369615_total.xlsx
├── dnmb_interproscan/
├── dnmb_module_acrfinder/
├── dnmb_module_clean/
├── dnmb_module_dbapis/
├── dnmb_module_defensefinder/
├── dnmb_module_padloc/
├── dnmb_module_defensepredictor/
├── dnmb_module_eggnog/
├── dnmb_module_gapmindaa/
├── dnmb_module_gapmindcarbon/
├── dnmb_module_iselement/
├── dnmb_module_merops/
├── dnmb_module_pazy/
├── dnmb_module_prophage/
├── dnmb_module_rebasefinder/
└── visualizations/
When anti-defense hits are found, DNMBsuite aggregates:
DefenseFinder --antidefensefinderdbAPISAcrFinder
into the final workbook and the anti-defense visualization set.
Typical anti-defense output paths are:
data/
├── dnmb_module_acrfinder/
├── dnmb_module_dbapis/
├── dnmb_module_defensefinder/
└── visualizations/
├── AntiDefense_overview.pdf
└── AntiDefenseFinder_overview.pdf
Notes:
AntiDefense_overview.pdfis the integrated anti-defense summary across the available anti-defense modules.AntiDefenseFinder_overview.pdfis the DefenseFinder-specific anti-defense plot.- If all anti-defense modules complete with
0hits for a genome, the anti-defense PDFs can be skipped because there is nothing to draw.
- the first run can take a long time because module assets may need to be downloaded into
~/.dnmb-cache - later runs are much faster because DNMB reuses the shared cache and matching previous outputs
clean_previous = TRUEremoves stale run artifacts but preserves matching reusable outputs when the input genome has not changed- DNMB core auto-update is disabled by default so a published image stays reproducible; opt in only when you intentionally want the container to track the latest GitHub branch
Build from the latest core on master:
docker build \
--build-arg DNMB_REF=master \
-t ghcr.io/jaeyoonsung/dnmbsuite:latest .Build from a fixed core commit or tag:
docker build \
--build-arg DNMB_REF=<dnmb-git-ref> \
-t ghcr.io/jaeyoonsung/dnmbsuite:<tag> .Once the image is published to GHCR:
docker pull ghcr.io/jaeyoonsung/dnmbsuite:latestIf the package is private, authenticate first:
echo <GITHUB_TOKEN> | docker login ghcr.io -u <github-user> --password-stdinFor release or manuscript runs, prefer a fixed image tag or digest and keep DNMB_AUTO_UPDATE=0.
Open an interactive R session inside the container:
docker compose run --rm dnmbshellInside R:
library(DNMB)
setwd("/data")
run_DNMB(clean_previous = TRUE)To also emit the full comparative suite across sibling genome folders at
the end of the per-genome run, pass comparative = TRUE:
library(DNMB)
setwd("/data/<focal-genome>")
run_DNMB(clean_previous = TRUE, comparative = TRUE)
# or point elsewhere:
# run_DNMB(clean_previous = TRUE, comparative = TRUE, comparative_data_root = "/data")run_DNMB(comparative = TRUE) renders the full suite end-to-end after
the per-genome pipeline finishes (it calls each plotter below against
dirname(getwd()), or comparative_data_root when supplied). The
individual plotters stay useful when you want a subset or custom colors.
Run the comparative suite directly with Docker, without opening R, after per-genome DNMB runs have already produced one genome folder per GenBank file:
data/
├── genome_1/
│ └── input_1.gbff
├── genome_2/
│ └── input_2.gbff
└── genome_3/
└── input_3.gbff
From the parent directory that contains data/:
docker run --rm \
--platform linux/amd64 \
--user "$(id -u):$(id -g)" \
-e DNMB_MODULE_CPU=8 \
-v "$PWD/data:/data" \
-v "$HOME/.dnmb-cache:/opt/dnmb-cache" \
ghcr.io/jaeyoonsung/dnmbsuite:latest \
comparative /dataThis writes comparative PDFs and count matrices under:
data/comparative/
To run a per-genome DNMB analysis and render the comparative suite at the end from a focal genome folder, use:
cd /path/to/data/genome_1
docker run --rm \
--platform linux/amd64 \
--user "$(id -u):$(id -g)" \
-e DNMB_COMPARATIVE=1 \
-e DNMB_COMPARATIVE_DATA_ROOT=/data \
-e DNMB_MODULE_CPU=8 \
-v "$(dirname "$PWD"):/data" \
-v "$HOME/.dnmb-cache:/opt/dnmb-cache" \
ghcr.io/jaeyoonsung/dnmbsuite:latest \
/data/$(basename "$PWD")After per-genome DNMB runs finish, render across-genome heatmaps for
defense-module families as well as enzyme/CAZyme modules. Each plotter
treats every GenBank-bearing subfolder of data_root as one genome,
auto-runs only the relevant module (db = "<Module>", not the full
pipeline) for genomes missing that output, and writes a PDF + count
matrix under <data_root>/comparative/.
library(DNMB)
data_root <- "/data" # parent directory with one subfolder per genome
# Defense-system heatmaps (purple palette)
dnmb_plot_comparative_defensefinder(data_root) # DefenseFinder
dnmb_plot_comparative_padloc(data_root) # PADLOC
dnmb_plot_comparative_defensepredictor(data_root) # DefensePredictor
dnmb_plot_comparative_rebasefinder(data_root) # REBASEfinder
# Enzyme / CAZyme heatmaps (module-specific palettes)
dnmb_plot_comparative_merops(data_root) # MEROPS family (C26, S8, …)
dnmb_plot_comparative_merops_catalytic(data_root) # MEROPS catalytic type (Cysteine, Serine, …)
dnmb_plot_comparative_dbcan(data_root) # dbCAN class (GH, GT, PL, …)
dnmb_plot_comparative_dbcan_family(data_root) # dbCAN family (GH13, GT2, …)
dnmb_plot_comparative_cgc(data_root) # CGC signature mix (CAZyme+TC+TF, …)
dnmb_plot_comparative_cgc_substrate(data_root) # CGC substrate (starch, melibiose, …)
dnmb_plot_comparative_pazy(data_root) # PAZy families
# Prophage heatmaps (purple palette)
dnmb_plot_comparative_phispy(data_root) # PhiSpy regions bucketed by size
dnmb_plot_comparative_virsorter2(data_root) # VirSorter2 max_score_group
dnmb_plot_comparative_pide(data_root) # PIDE regions bucketed by sizePass auto_run_missing = FALSE to render only what already exists.
- Put exactly the input GenBank files you want to analyze into
./data. - The first run can take a long time because module assets may need to be downloaded into
~/.dnmb-cache. - Later runs should be much faster because DNMB reuses the shared cache and matching previous outputs.
- If you want a clean rerun but still keep reusable outputs, keep
clean_previous = TRUE. The current DNMB logic preserves matching cached module and external annotation outputs when the input genome has not changed. - DNMB auto-update is off by default. Leave it off for reproducible runs; enable it only when you explicitly want to refresh the bundled DNMB package from GitHub.
- If older Linux runs left root-owned files in
~/.dnmb-cache, fix them before rerunning with:sudo chown -R "$(id -u):$(id -g)" "$HOME/.dnmb-cache" - If an older image fails only in PADLOC with
mkdir: cannot create directory '/opt/biotools/bin/../data': Permission denied, either pull a rebuilt image from this repo or add:-v "$HOME/.dnmb-cache/padloc-bootstrap:/opt/biotools/data"to the directdocker runcommand after creating that host folder once with:mkdir -p "$HOME/.dnmb-cache/padloc-bootstrap" - Advanced launcher environment variables:
DNMB_MODULES,DNMB_SKIP_MODULES,DNMB_MODULE_CPU,DNMB_PROPHAGE_BACKEND,DNMB_CLEAN_PREVIOUS,DNMB_FORCE_CPU_HEAVY,DNMB_COMPARATIVE,DNMB_COMPARATIVE_DATA_ROOT,DNMB_AUTO_UPDATE,DNMB_AUTO_UPDATE_BRANCH
Create the input directory first:
mkdir -p dataBuild and run the pipeline:
docker compose build
docker compose upOpen an interactive R shell instead:
docker compose run --rm dnmbshellUse a different core ref only if you explicitly want to override the default:
DNMB_REF=<dnmb-git-ref> docker compose buildOpt in to DNMB startup refresh only when desired:
DNMB_AUTO_UPDATE=1 DNMB_AUTO_UPDATE_BRANCH=master docker compose upUse a different output tag locally:
IMAGE_TAG=test docker compose buildDNMBsuite expects a shared host cache at:
~/.dnmb-cache
This directory is mounted to:
/opt/dnmb-cache
inside the container, and DNMB_CACHE_ROOT is set automatically.
This means:
- module databases are downloaded once and reused
- Docker and local DNMB runs can share the same cache
- users normally do not need to set
module_cache_rootmanually inside the container - the DNMBsuite container seeds the CLEAN Python environment into the mounted host cache when it is missing
https://github.com/JAEYOONSUNG/DNMB.git
By default DNMBsuite installs the DNMB core from this repository at image build time using:
master
Container startup does not mutate that installed core unless you explicitly set DNMB_AUTO_UPDATE=1.
If you need to override the build-time core ref, use DNMB_REF:
docker build --build-arg DNMB_REF=master -t ghcr.io/jaeyoonsung/dnmbsuite:latest .
docker build --build-arg DNMB_REF=<commit-sha> -t ghcr.io/jaeyoonsung/dnmbsuite:dev .This project is released under MIT licence, which allows for both personal and commercial use, modification, and distribution of our work, provided that proper credit is given.
We hope our resources will prove invaluable to your research in systems biology. For any questions or feedback, please don't hesitate to reach out through our GitHub issues or contact section.
If you use this piepline, please cite:
[DNMB] DNMB: Programmable domestication of thermophilic bacteria through removal of non-canonical defense systems.
Sung, J.Y., Lee, M.H., Park, J.S., Kim, H.B., Ganbat, D., Kim, D.G., Cho, H.W., Suh, M.K., Lee, J.S., Lee, S.J., Kim, S.B.*, and Lee, D.W.*.
*bioRxiv* 2026.03.21.173436. (2026)
Please, cite also the underlying algorithm/database if it was used for the search step of DNMB:
[EggNOG-mapper v2] eggNOG-mapper v2: Functional annotation, orthology assignments, and domain prediction at
the metagenomic scale. Carlos P. Cantalapiedra, Ana Hernandez-Plaza,
Ivica Letunic, Peer Bork, Jaime Huerta-Cepas. 2021. Molecular Biology and Evolution
38(12):5825-5829. https://doi.org/10.1093/molbev/msab293
[CLEAN] Enzyme function prediction using contrastive learning. Tianhao Yu, Haiyang Cui, Jianan Canal Li,
Yunan Luo, Guangde Jiang, Huimin Zhao. 2023. Science 379(6639):1358-1363.
https://doi.org/10.1126/science.adf2465
[InterProScan] InterProScan 5: genome-scale protein function classification.
Philip Jones, David Binns, Hsin-Yu Chang, Matthew Fraser, Weizhong Li, Craig McAnulla,
Hamish McWilliam, John Maslen, Alex Mitchell, Gift Nuka, Sebastien Pesseat, Antony F. Quinn,
Amaia Sangrador-Vegas, Maxim Scheremetjew, Siew-Yit Yong, Rodrigo Lopez, Sarah Hunter.
2014. Bioinformatics 30(9):1236-1240. https://doi.org/10.1093/bioinformatics/btu031
[DefenseFinder] DefenseFinder: Systematic and quantitative view of the antiviral arsenal of prokaryotes.
Florian Tesson, Alexandre Herve, Ernest Mordret, Marie Touchon, Camille d'Humieres, Jean Cury,
Aude Bernheim. 2022. Nature Communications 13:2561. https://doi.org/10.1038/s41467-022-30269-9
[REBASE] REBASE-a database for DNA restriction and modification: enzymes, genes and genomes.
Richard J. Roberts, Tamas Vincze, Janos Posfai, Dana Macelis. 2010. Nucleic Acids Research
38(Database issue):D234-D236. https://doi.org/10.1093/nar/gkp874
[GapMindAA] GapMind: Automated annotation of amino acid biosynthesis.
Morgan N. Price, Adam M. Deutschbauer, Adam P. Arkin. 2020. mSystems 5(3):e00291-20.
https://doi.org/10.1128/mSystems.00291-20
[GapMindCarbon] Filling gaps in bacterial catabolic pathways with computation and high-throughput genetics.
Morgan N. Price, Adam M. Deutschbauer, Adam P. Arkin. 2022. PLoS Genetics 18(4):e1010156.
https://doi.org/10.1371/journal.pgen.1010156
[MEROPS] The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a
comparison with peptidases in the PANTHER database. Neil D. Rawlings, Alan J. Barrett,
Paul D. Thomas, Xiaosong Huang, Alex Bateman, Robert D. Finn. 2018. Nucleic Acids Research
46(D1):D624-D632. https://doi.org/10.1093/nar/gkx1134
[dbCAN] dbCAN3: automated carbohydrate-active enzyme and substrate annotation.
Jinfang Zheng, Qiwei Ge, Yuchen Yan, Xinpeng Zhang, Le Huang, Yanbin Yin. 2023.
Nucleic Acids Research 51(W1):W115-W121. https://doi.org/10.1093/nar/gkad328
[PAZy] Plastics degradation by hydrolytic enzymes: The plastics-active enzymes database-PAZy.
Patrick C. F. Buchholz, Golo Feuerriegel, Hongli Zhang, Pablo Perez-Garcia,
Lena-Luisa Nover, Jennifer Chow, Wolfgang R. Streit, Jurgen Pleiss. 2022.
Proteins 90(7):1443-1456. https://doi.org/10.1002/prot.26325
[ISelement] ISEScan: automated identification of insertion sequence elements in prokaryotic genomes.
Zhiqun Xie, Haixu Tang. 2017. Bioinformatics 33(21):3340-3347.
https://doi.org/10.1093/bioinformatics/btx433
ISfinder: the reference centre for bacterial insertion sequences.
Philippe Siguier, Jerome Perochon, Lucie Lestrade, Jacques Mahillon,
Michael Chandler. 2006. Nucleic Acids Research 34(Database issue):D32-D36.
https://doi.org/10.1093/nar/gkj014
[PhiSpy] PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity-
and composition-based strategies. Sajia Akhter, Rashedul Aziz,
Robert A. Edwards. 2012. Nucleic Acids Research 40(16):e126. https://doi.org/10.1093/nar/gks406
[VirSorter2] VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses.
Jiarong Guo, Ben Bolduc, Ahmed A Zayed, Arvind Varsani, Guillermo Dominguez-Huerta, Tom O Delmont,
Akbar Adjie Pratama, M Consuelo Gazitúa, Dean Vik, Matthew B Sullivan, Simon Roux. 2021. Microbiome 9:37.
https://doi.org/10.1186/s40168-020-00990-y
[PIDE] PIDE: a deep learning-based tool for prophage identification using genome-wide features.
https://github.com/BackofenLab/PIDE
[dbAPIS] dbAPIS: dbAPIS: a database of anti-prokaryotic immune system genes. Yuchen Yan , Jinfang Zheng ,
Xinpeng Zhang , Yanbin Yin. 2025. Nucleic Acids Research 52(D1):D419–D425, https://doi.org/10.1093/nar/gkad932