| title | Contributing | ||||
|---|---|---|---|---|---|
| description | How to contribute to the Physical AI Toolchain | ||||
| author | Microsoft Robotics-AI Team | ||||
| ms.date | 2026-03-11 | ||||
| ms.topic | how-to | ||||
| keywords |
|
Contributions are welcome across infrastructure code, deployment automation, documentation, training scripts, and ML workflows. Read the relevant sections below before making your contribution.
If you are new to the project, start with issues labeled good first issue or documentation updates before making larger changes.
Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Follow the instructions provided by the bot. You only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any questions or comments.
- Read the Contributing Guide for prerequisites, workflow, and conventions
- Review the Prerequisites for required tools and Azure access
- Fork the repository and clone your fork locally
- Review the README for project overview and architecture
- Create a descriptive feature branch (for example,
feature/...orfix/...) and follow Conventional Commits for commit messages - Run validation before submitting
Detailed documentation lives in docs/contributing/:
| Guide | Description |
|---|---|
| Contributing Guide | Main hub — prerequisites, workflow, commit messages, style |
| Prerequisites | Required tools, Azure access, NGC credentials, build commands |
| Contribution Workflow | Bug reports, feature requests, first contributions |
| Pull Request Process | PR workflow, reviewers, approval criteria |
| Infrastructure Style | Terraform conventions, shell scripts, copyright headers |
| Deployment Validation | Validation levels, testing templates, cost optimization |
| Cost Considerations | Component costs, budgeting, regional pricing |
| Security Review | Security checklist, credential handling, dependency updates |
| Accessibility | Accessibility scope, documentation and CLI output guidelines |
| Updating External Components | Process for updating reused externally-maintained components |
| Documentation Maintenance | Documentation update triggers, ownership, freshness policy |
| Deprecation Policy | Interface deprecation lifecycle, announcements, migration |
Search existing resources before asking:
- Search GitHub Issues for similar questions or problems
- Check GitHub Discussions for community Q&A
- Review the docs/ directory for troubleshooting guides
If you cannot find an answer, open a new discussion in the Q&A category. Provide context about what you are trying to accomplish, what you have tried, and any error messages. For bugs or feature requests, use GitHub Issues instead.
Run the setup script to configure your local development environment:
./setup-dev.shThis installs npm dependencies for linting, spell checking, and link validation. See the Prerequisites guide for required tools and version requirements.
Reverse the changes made by setup-dev.sh and remove deployed Azure resources.
The setup script creates a virtual environment at .venv/ and syncs dependencies from pyproject.toml.
# Deactivate if currently active
command -v deactivate &>/dev/null && deactivate
# Remove the virtual environment
rm -rf .venvThe setup script clones IsaacLab for IntelliSense support.
# Remove IsaacLab clone
rm -rf external/IsaacLab
# Remove Node.js linting dependencies (if installed separately via npm install)
rm -rf node_modulesFree disk space by clearing uv and npm caches. This affects all projects using these tools, not just this repository.
# Clear uv download and build cache
uv cache clean
# Clear npm cache
npm cache clean --forceRemove all deployed Azure resources:
cd infrastructure/terraform
terraform destroy -var-file=terraform.tfvarsWarning
terraform destroy permanently deletes all deployed Azure resources including AKS clusters, storage accounts, Key Vault, and networking. Back up training data and model checkpoints before running this command.
For automation deployments:
cd infrastructure/terraform/automation
terraform destroy -var-file=terraform.tfvarsVerify no orphaned resources remain:
az group list --query "[?starts_with(name, 'your-prefix')].name" -o tsvSee Cost Considerations for component costs and cleanup timing.
Run these commands to validate changes before submitting a PR:
npm run lint:md # Markdownlint
npm run lint:links # Markdown link validation
npm run spell-check # cspell
npm run test:tf # Terraform module tests (no Azure credentials required)For Terraform and shell script validation, see the Prerequisites guide.
Reused externally-maintained components (Helm charts, container images, Terraform providers, Python packages, GitHub Actions) require periodic updates for security patches and compatibility. Dependabot automates updates for Python, Terraform, and GitHub Actions ecosystems. Helm charts and container images require manual updates.
See the Updating External Components guide for the full process including component inventory, vetting criteria, and breaking change handling.
Use structured titles to maintain consistency and enable automation.
| Format | Use Case | Example |
|---|---|---|
type(scope): |
Code changes | feat(ci): Add pytest workflow |
[Task]: |
Work items | [Task]: Achieve OpenSSF badge |
[Policy]: |
Governance | [Policy]: Define code of conduct |
[Docs]: |
Doc planning | [Docs]: Publish security policy |
[Infra]: |
Infrastructure | [Infra]: Sign release tags |
| Type | Description |
|---|---|
feat |
New feature or capability |
fix |
Bug fix |
docs |
Documentation only |
refactor |
Code change that neither fixes nor adds |
test |
Adding or correcting tests |
ci |
CI configuration changes |
chore |
Maintenance tasks |
| Scope | Area |
|---|---|
terraform |
Infrastructure as Code |
scripts |
Shell and Python scripts |
training |
ML training code |
workflows |
AzureML/Osmo workflows |
ci |
GitHub Actions |
deploy |
Deployment artifacts |
docs |
Documentation |
security |
Security-related changes |
feat(ci): Add CodeQL security scanning workflow
fix(terraform): Correct AKS node pool configuration
docs(deploy): Add VPN deployment documentation
refactor(scripts): Consolidate common functions
test(training): Add pytest fixtures
[Task]: Achieve code coverage target
[Policy]: Define input validation requirements
This project uses release-please for automated version management. All commits to main must follow Conventional Commits format:
feat:commits trigger a minor version bumpfix:commits trigger a patch version bumpdocs:,chore:,refactor:commits appear in the changelog without a version bump- Commits with
BREAKING CHANGE:footer trigger a major version bump
After merging to main, release-please automatically creates a release PR with updated CHANGELOG.md and version bumps. Merging that PR creates a GitHub Release and git tag.
For commit message format details, see commit-message.instructions.md.
All release tags are required to be signed. Unsigned release tags are non-compliant with project policy.
This repository uses Sigstore gitsign with GitHub OIDC identity for keyless tag signing.
# Install gitsign
# https://docs.sigstore.dev/cosign/signing/gitsign/
# Configure git for keyless x509 signing
git config --global gpg.format x509
git config --global gpg.x509.program gitsign
git config --global tag.gpgSign truegit tag -s v1.0.0 -m "Release v1.0.0"
git push origin v1.0.0git fetch --tags
git tag -v v1.0.0GitHub Actions validates signatures for pushed version tags (v*).
Important
Maintainer GPG key distribution is not required for this repository because release tags are signed using keyless Sigstore identities.
External interfaces follow a formal deprecation lifecycle before removal. The policy covers shell script arguments, environment variables, Terraform variables and outputs, configuration schemas, and workflow templates.
No external interface is removed without a deprecation notice in a prior release. See the Deprecation Policy for scope, deprecation periods, announcement channels, and migration guidance.
All contributions require appropriate tests. This policy supports code quality and the project's OpenSSF Best Practices goals.
- New features require accompanying unit tests.
- Bug fixes require regression tests that reproduce the fixed behavior.
- Refactoring changes must not reduce test coverage.
At least half of all bug fix PRs must include a regression test.
A regression test is required when:
- The bug affected user-facing functionality
- The fix changes control flow
- The bug could reasonably recur
A regression test may be omitted when:
- The bug was in documentation only
- The fix is purely cosmetic (whitespace, formatting)
- A test is technically impractical (requires external services that cannot be mocked)
| Test Type | Counts as Regression Test |
|---|---|
| Unit test verifying the fix | Yes |
| Integration test covering the scenario | Yes |
| Manual test documented in PR | Only if automated test is impractical |
| Informal local verification | No |
Optionally run the RL end-to-end suite to capture regressions. This is good practice for changes to submission scripts, workflow templates, MLflow wiring, checkpoint handling, or shared RL training assets. The end-to-end suite validates:
- Azure ML or OSMO job submission and lifecycle transitions
- MLflow metrics and parameter tracking for the completed run
- Checkpoint output upload for Azure ML runs
- Workflow task success for OSMO runs
Caution
These tests submit real GPU workloads and consume Azure ML, OSMO, Kubernetes, and MLflow resources. They are intentionally excluded from default pytest runs and must be invoked explicitly.
Requirements:
| Requirement | Details |
|---|---|
| Azure CLI | az must be installed and authenticated. The Azure ML CLI extension must also be available. |
| Azure subscription context | Set AZURE_SUBSCRIPTION_ID, or make sure az account show resolves to the subscription you want the test to use. |
| Azure workspace context | Set AZURE_RESOURCE_GROUP and AZUREML_WORKSPACE_NAME, or make sure terraform output -json or infrastructure/terraform/terraform.tfvars resolves them. |
| Azure ML compute target | For Azure ML validation, the compute target must resolve from AZUREML_COMPUTE or Terraform naming and its provisioning state must be Succeeded. |
| OSMO and Kubernetes access | For OSMO validation, osmo and kubectl must be installed and authenticated, and the target cluster must expose at least one reachable GPU node. Connect the VPN first for private clusters. |
| MLflow access | The Azure ML workspace used by the tests must expose a working MLflow tracking URI because both validation paths assert metrics and parameters after the run completes. |
Run these commands from the repository root:
# Azure ML submission path only
uv run pytest -vv -s -m e2e tests/e2e/test_e2e_training.py::test_aml_rl_training_e2e
# OSMO submission path only
uv run pytest -vv -s -m e2e tests/e2e/test_e2e_training.py::test_osmo_rl_training_e2e
# Full RL e2e suite
uv run pytest -vv -s -m e2e tests/e2e/test_e2e_training.pyWhen submitting a bug fix:
- Link to the issue being fixed
- Include a regression test, or document why one is omitted
- Describe what the test verifies
Reviewers verify regression tests are included. Compliance is tracked over time via PR labels (has-regression-test, regression-test-omitted).
Once a tests/ directory exists, run the full test suite:
pytest tests/Run tests within the devcontainer:
uv run pytest tests/Run tests with coverage reporting:
coverage run -m pytest tests/
coverage report -mTests mirror the source directory structure under tests/:
| Source Path | Test Path |
|---|---|
training/rl/utils/env.py |
training/tests/test_env.py |
training/rl/utils/metrics.py |
training/tests/test_metrics.py |
training/rl/cli_args.py |
tests/unit/test_cli_args.py |
| Marker | Description | Planned CI Behavior |
|---|---|---|
| (default) | Unit tests, fast, no external deps | Always runs |
slow |
Tests exceeding 5 seconds | Runs on main, optional on PRs |
integration |
Requires external services | Runs on main only |
gpu |
Requires CUDA runtime | Excluded from standard CI |
Skip categories selectively:
pytest tests/ -m "not slow and not gpu"Coverage thresholds increase with each milestone:
| Milestone | Minimum Coverage |
|---|---|
| v0.4.0 | 40% |
| v0.5.0 | 60% |
| v0.6.0 | 80% |
These coverage levels are contribution targets for local test runs. CI enforcement of coverage thresholds is planned for a future milestone.
Pytest and coverage are not yet centrally configured in pyproject.toml. When adding tests, follow standard pytest conventions (a tests/ directory with shared fixtures as needed) and align with existing tests in this repository.
Use BATS-core for shell script tests, Pester v5 for PowerShell tests, and the native terraform test framework for Terraform modules. When adding tests, include framework-specific details in the README for each area.
Documentation stays current through update triggers, ownership rules, and freshness reviews. See the Documentation Maintenance guide for the complete policy including review criteria, PR requirements, and release lifecycle.
This project uses a corporate-sponsored maintainer model. See GOVERNANCE.md for decision-making processes, roles, and how governance can change.
This project currently produces no user-facing applications or localizable content. All technical documentation is maintained in English.
If user-facing components are added in the future, follow W3C Internationalization guidelines and Unicode CLDR for locale data. Use BCP 47 language tags for locale identifiers.
This project adopts the Microsoft Open Source Code of Conduct. See CODE_OF_CONDUCT.md for details, or contact opencode@microsoft.com with questions.
Do not report security vulnerabilities through public GitHub issues. See SECURITY.md for reporting instructions.
For questions and community discussion, see SUPPORT.md.
By contributing, you agree that your contributions will be licensed under the MIT License.
This contributing guide is adapted for reference architecture contributions and Azure + NVIDIA robotics infrastructure.
Copyright (c) Microsoft Corporation. Licensed under the MIT License.
🤖 Crafted with precision by ✨Copilot following brilliant human instruction, then carefully refined by our team of discerning human reviewers.