This runbook provides a concrete AWS deployment path for production use.
If you do not have an AWS account yet, complete account onboarding first:
docs/aws-account-onboarding.mddocs/aws-owner-checklist.mddocs/what-you-need-to-do.mddocs/what-you-need-to-do.zh-CN.md(Chinese console version)
Use this for fastest production rollout if your team is in China but serving global users:
- Region:
ap-southeast-1(Singapore) as primary - Compute:
- API, worker, web: Amazon ECS Fargate
- Ollama inference: ECS on EC2 GPU instances (
g5.xlargeor higher)
- Data:
- PostgreSQL: Amazon RDS for PostgreSQL (Multi-AZ)
- Redis: Amazon ElastiCache for Redis
- Vector DB: Qdrant on ECS EC2 (with EBS gp3), or managed Qdrant Cloud
- Edge and security:
- ALB + ACM TLS certificates
- CloudFront in front of ALB (optional but recommended)
- AWS WAF for L7 protection
Why this is recommended:
- ECS is simpler to operate than EKS for this stack size
- Fargate removes host management for stateless services
- GPU workload is isolated to Ollama service where EC2 is required
If you must host inside mainland China (aws-cn), you need additional legal/compliance setup:
- Separate AWS China account (
cn-north-1orcn-northwest-1) - Local business qualification and ICP filing for mainland-hosted public websites
- China-specific domain and compliance operations
For most teams, start in ap-southeast-1, then add a China deployment later if required.
- AWS CLI v2 configured (
aws configure) - Docker installed and logged in to ECR
- Domain managed in Route 53 (or external DNS)
- TLS certificates in ACM for your public domains
- Local
.envvalues finalized for production
Repository root used by command examples in this runbook:
cd /Users/liweiguang/aiagent/complyraCreate one production environment file from .env.example and set at least:
APP_ENV=prodAPP_JWT_SECRET_KEY=<strong-random-secret>APP_COOKIE_SECURE=trueAPP_CORS_ORIGINS=https://<your-web-domain>APP_TRUSTED_HOSTS=<api-domain>,<web-domain>APP_DATABASE_URL=postgresql+psycopg://...APP_REDIS_URL=redis://...APP_QDRANT_URL=http://<qdrant-service>:6333APP_OLLAMA_BASE_URL=http://<ollama-service>:11434
cd /Users/liweiguang/aiagent/complyra
./scripts/aws/00_preflight.sh
./scripts/aws/01_prepare_prod_env.sh
./scripts/aws/04_validate_env_prod.shcd /Users/liweiguang/aiagent/complyra
./scripts/aws/02_bootstrap_ecr.shcd /Users/liweiguang/aiagent/complyra
./scripts/aws/03_build_and_push.shTerraform now covers ALB, ECS services/task definitions, RDS, ElastiCache, and optional CloudWatch Synthetics.
cd /Users/liweiguang/aiagent/complyra
./scripts/aws/07_terraform_plan.shThen apply from infra/terraform after reviewing terraform.tfvars.
Option B: create manually in AWS console
- Create VPC with at least 2 AZs
- Public subnets: ALB/NAT
- Private subnets: ECS services, RDS, Redis
- Security groups:
- ALB: 80/443 inbound from internet
- API: allow from ALB SG
- Worker: no public ingress
- RDS: allow from API/worker SG
- Redis: allow from API/worker SG
When using Terraform full stack, RDS and ElastiCache are already created. Verify endpoints and connectivity, then run API health checks.
Option A (recommended): Managed Qdrant Cloud (lower ops overhead)
Option B: Self-host Qdrant on ECS EC2:
- Persistent EBS volume
- Daily snapshots/backup policy
- Private service endpoint only
For current codebase, Ollama is required.
- Create ECS EC2 capacity provider with GPU instances (for example
g5.xlarge) - Run Ollama service in private subnet
- Ensure model availability at startup:
- keep
APP_OLLAMA_PREPULL=true - pre-pull model in warm-up task if cold start latency is unacceptable
- optional script-based pre-pull:
./scripts/pull_ollama_model.sh qwen2.5:3b-instruct
- keep
Create separate tasks:
complyra-api(FastAPI container)complyra-worker(RQ worker command)complyra-web(Nginx static web)
Suggested naming for task families and services:
complyra-apicomplyra-workercomplyra-web
Inject environment variables through AWS Secrets Manager or SSM Parameter Store, not plaintext task definition values.
For release-based automation, register task definitions directly from release manifests:
cd /Users/liweiguang/aiagent/complyra
export ECS_TASK_EXECUTION_ROLE_ARN=arn:aws:iam::<ACCOUNT_ID>:role/complyra-ecs-exec
export ECS_TASK_ROLE_ARN=arn:aws:iam::<ACCOUNT_ID>:role/complyra-ecs-task
export APP_DATABASE_URL='postgresql+psycopg://...'
export APP_REDIS_URL='redis://...'
export APP_QDRANT_URL='http://...:6333'
export APP_OLLAMA_BASE_URL='http://...:11434'
export JWT_SECRET_ARN='arn:aws:secretsmanager:...:secret:complyra-jwt'
# Optional:
export SENTRY_DSN_ARN='arn:aws:secretsmanager:...:secret:complyra-sentry'
./scripts/aws/08_register_taskdefs_from_release.sh <release_tag>- API service behind ALB target group
- Worker service without load balancer
- Web service behind ALB (or CloudFront origin)
- Configure health checks:
- API:
/api/health/live - Web:
/healthz
- API:
When using Terraform full stack, ECS services are created by Terraform apply.
- Request certificate in ACM for API/web domains
- Attach certificate to ALB HTTPS listener
- Route 53 records:
api.<domain>-> ALBapp.<domain>-> ALB or CloudFront
- CloudWatch logs for all ECS services
- Prometheus/Grafana deployment in private network or managed alternative
- Set
APP_SENTRY_DSNfor production exception tracking - Prometheus alert rules are provided in
ops/alert_rules.yml(error rate, p95 latency, ingest queue backlog) - Optional CloudWatch Synthetics canary is provisioned via Terraform (
enable_synthetics=true)
cd /Users/liweiguang/aiagent/complyra
./scripts/aws/05_smoke_test.sh https://api.<your-domain> <username> <password>- Login works and cookie is secure over HTTPS
- Ingest job transitions:
queued -> processing -> completed - Approval flow works end-to-end
- Audit search/export works
/api/health/readyreturns all checkstrue- Rollback plan validated (previous task definition revision)
After task definitions are ready, deploy all three ECS services and wait for stability:
cd /Users/liweiguang/aiagent/complyra
./scripts/aws/09_deploy_services_from_release.sh <cluster_name> <release_tag>By default it deploys services named complyra-api, complyra-worker, and complyra-web.
Override names with environment variables when needed:
API_SERVICE_NAMEWORKER_SERVICE_NAMEWEB_SERVICE_NAME
./scripts/aws/03_build_and_push.sh writes release manifests to releases/<tag>.json.
Rollback preparation:
cd /Users/liweiguang/aiagent/complyra
./scripts/aws/06_prepare_rollback.sh <release_tag>Blue/green deployment trigger (when CodeDeploy app + deployment group already exist):
cd /Users/liweiguang/aiagent/complyra
./scripts/aws/10_trigger_codedeploy_ecs.sh <codedeploy_app_name> <deployment_group_name> <task_definition_arn> [container_name] [container_port]cd /Users/liweiguang/aiagent/complyra
./scripts/iac/01_conftest_check.shUse GitHub Actions or GitLab CI:
- Run backend compile and frontend build
- Build and tag images by git SHA
- Push to ECR
- Deploy new ECS task definition revision
- Run post-deploy smoke tests
- Auto-rollback on health check failure
- Store secrets in Secrets Manager
- Enforce least-privilege IAM roles per task
- Enable WAF managed rule set on ALB/CloudFront
- Restrict database and Redis to private subnets
- Rotate JWT secrets and DB credentials periodically
- Keep metrics endpoint private or token-protected
If your team is in mainland China and users are global:
- Prefer
ap-southeast-1for lower latency from China compared with US/EU regions - Use CloudFront and optimized TLS settings for better edge reachability
- Keep operations access through VPN or secure bastion hosts
If mainland users are your primary audience and strict low-latency is required:
- Plan a separate
aws-cndeployment with legal/compliance readiness - Keep architecture consistent so global and China stacks share the same code and runbooks
- Extend Terraform to include CodeDeploy blue/green resources
- Expand synthetic checks with additional flows (ingest, audit export)
- Add policy-as-code checks for Terraform/CDK infrastructure changes