Mastering Microservices: A Complete Guide to Modern Deployment and Release Patterns
In the rapidly evolving landscape of cloud-native applications, choosing the right deployment and release patterns can make the difference between seamless user experiences and catastrophic outages. This comprehensive guide explores modern deployment strategies, from traditional Blue-Green deployments to cutting-edge progressive delivery patterns with service mesh integration.
Table of Contents
Open Table of Contents
Blue-Green Deployment Strategy
Blue-Green deployment maintains two identical production environments, providing zero-downtime deployments with instant rollback capabilities. While resource-intensive, it remains the gold standard for mission-critical applications.
Architecture Overview
graph TB
subgraph "Load Balancer"
LB[Load Balancer/Ingress]
end
subgraph "Blue Environment (Production)"
B1[App v1.0]
B2[App v1.0]
B3[App v1.0]
BDB[(Database)]
end
subgraph "Green Environment (Staging)"
G1[App v2.0]
G2[App v2.0]
G3[App v2.0]
GDB[(Database)]
end
subgraph "Shared Services"
Cache[(Redis Cache)]
Queue[Message Queue]
end
Users --> LB
LB -->|100% Traffic| B1
LB -->|100% Traffic| B2
LB -->|100% Traffic| B3
LB -.->|0% Traffic| G1
LB -.->|0% Traffic| G2
LB -.->|0% Traffic| G3
B1 --> BDB
B2 --> BDB
B3 --> BDB
G1 --> GDB
G2 --> GDB
G3 --> GDB
B1 --> Cache
B2 --> Cache
B3 --> Cache
G1 --> Cache
G2 --> Cache
G3 --> Cache
classDef blue fill:#e3f2fd
classDef green fill:#e8f5e8
classDef shared fill:#fff3e0
class B1,B2,B3,BDB blue
class G1,G2,G3,GDB green
class Cache,Queue shared
Kubernetes Implementation
Here’s a practical Blue-Green deployment using Kubernetes and Argo Rollouts:
# blue-green-rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: user-service
namespace: production
spec:
replicas: 6
strategy:
blueGreen:
activeService: user-service-active
previewService: user-service-preview
autoPromotionEnabled: false
scaleDownDelaySeconds: 30
prePromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: user-service-preview
postPromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: user-service-active
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
spec:
containers:
- name: user-service
image: myregistry/user-service:v2.0.0
ports:
- containerPort: 8080
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
---
# Active service (Blue environment)
apiVersion: v1
kind: Service
metadata:
name: user-service-active
namespace: production
spec:
selector:
app: user-service
ports:
- port: 80
targetPort: 8080
type: ClusterIP
---
# Preview service (Green environment)
apiVersion: v1
kind: Service
metadata:
name: user-service-preview
namespace: production
spec:
selector:
app: user-service
ports:
- port: 80
targetPort: 8080
type: ClusterIP
---
# Analysis template for automated testing
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
namespace: production
spec:
args:
- name: service-name
metrics:
- name: success-rate
successCondition: result[0] >= 0.95
provider:
prometheus:
address: http://prometheus.monitoring:9090
query: |
sum(rate(http_requests_total{service="{{args.service-name}}", status=~"2.."}[5m])) /
sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))
Deployment Workflow
sequenceDiagram
participant Dev as Developer
participant Git as Git Repository
participant CI as CI/CD Pipeline
participant Argo as Argo Rollouts
participant K8s as Kubernetes
participant Monitor as Monitoring
Dev->>Git: Push v2.0.0 code
Git->>CI: Trigger build pipeline
CI->>CI: Run tests & security scans
CI->>CI: Build container image
CI->>Git: Update deployment manifest
Git->>Argo: Detect manifest change
Argo->>K8s: Deploy to Green environment
K8s->>Monitor: Start health checks
Monitor->>Argo: Health checks passing
Argo->>Argo: Run pre-promotion analysis
Argo->>Dev: Request manual approval
Dev->>Argo: Approve promotion
Argo->>K8s: Switch traffic to Green
K8s->>Monitor: Monitor post-deployment
alt Success
Monitor->>Argo: All metrics healthy
Argo->>K8s: Scale down Blue environment
else Failure
Monitor->>Argo: Metrics degraded
Argo->>K8s: Rollback to Blue
Argo->>Dev: Send failure notification
end
Canary Deployment with Traffic Splitting
Canary deployments reduce risk by gradually exposing new versions to increasing percentages of users while monitoring key metrics. This approach is ideal for high-traffic applications where Blue-Green would be cost-prohibitive.
Service Mesh Architecture
graph TB
subgraph "Istio Service Mesh"
subgraph "Gateway"
IGW[Istio Gateway]
end
subgraph "Virtual Service"
VS[Traffic Splitting Rules]
end
subgraph "Destination Rules"
DR[Version Subsets]
end
subgraph "Production Pods"
subgraph "Version 1 (90%)"
V1P1[Pod v1]
V1P2[Pod v1]
V1P3[Pod v1]
V1P4[Pod v1]
V1P5[Pod v1]
V1P6[Pod v1]
end
subgraph "Version 2 Canary (10%)"
V2P1[Pod v2]
V2P2[Pod v2]
end
end
subgraph "Monitoring"
Prom[Prometheus]
Graf[Grafana]
Jaeger[Jaeger Tracing]
end
subgraph "Automation"
Flagger[Flagger Controller]
end
end
Users --> IGW
IGW --> VS
VS -->|90%| V1P1
VS -->|90%| V1P2
VS -->|90%| V1P3
VS -->|90%| V1P4
VS -->|90%| V1P5
VS -->|90%| V1P6
VS -->|10%| V2P1
VS -->|10%| V2P2
V1P1 --> Prom
V1P2 --> Prom
V2P1 --> Prom
V2P2 --> Prom
Flagger --> VS
Flagger --> Prom
Prom --> Graf
classDef v1 fill:#e3f2fd
classDef v2 fill:#ffebee
classDef control fill:#f3e5f5
class V1P1,V1P2,V1P3,V1P4,V1P5,V1P6 v1
class V2P1,V2P2 v2
class VS,DR,Flagger control
Istio Configuration
# Gateway configuration
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: user-service-gateway
namespace: production
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- api.example.com
---
# Destination rule defining version subsets
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: user-service-dr
namespace: production
spec:
host: user-service
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
trafficPolicy:
loadBalancer:
simple: LEAST_CONN
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
maxRequestsPerConnection: 2
---
# Virtual service with traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: user-service-vs
namespace: production
spec:
hosts:
- api.example.com
gateways:
- user-service-gateway
http:
- match:
- uri:
prefix: /api/users
route:
- destination:
host: user-service
subset: v1
weight: 90
- destination:
host: user-service
subset: v2
weight: 10
timeout: 30s
retries:
attempts: 3
perTryTimeout: 10s
Automated Canary with Flagger
# Flagger canary deployment
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: user-service
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: user-service
progressDeadlineSeconds: 60
service:
port: 80
targetPort: 8080
gateways:
- user-service-gateway
hosts:
- api.example.com
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
threshold: 99
interval: 1m
- name: request-duration
threshold: 500
interval: 1m
- name: error-rate
threshold: 1
interval: 1m
webhooks:
- name: load-test
url: http://load-tester.test/
timeout: 5s
metadata:
cmd: "hey -z 10m -q 10 -c 2 http://api.example.com/api/users"
- name: integration-test
url: http://integration-tester.test/
timeout: 30s
metadata:
type: bash
cmd: "curl -s http://api.example.com/api/users/health | grep OK"
provider: istio
Progressive Traffic Shifting
gantt
title Canary Deployment Timeline
dateFormat X
axisFormat %s
section Traffic Split
0% Canary (Baseline) :0, 300
10% Canary (Initial) :300, 600
20% Canary :600, 900
30% Canary :900, 1200
50% Canary (Half Split) :1200, 1500
100% Canary (Complete) :1500, 1800
section Health Checks
Readiness Probes :0, 1800
Success Rate Monitor :300, 1800
Latency Monitor :300, 1800
Error Rate Monitor :300, 1800
section Automated Actions
Initial Deployment :0, 300
Traffic Increment :300, 1500
Promotion Decision :1500, 1650
Cleanup Old Version :1650, 1800
Rolling Updates with Kubernetes
Rolling updates provide the most resource-efficient deployment strategy, gradually replacing old pods with new ones while maintaining service availability.
Rolling Update Flow
sequenceDiagram
participant User as Users
participant LB as Load Balancer
participant K8s as Kubernetes Controller
participant RS1 as ReplicaSet v1
participant RS2 as ReplicaSet v2
participant Pods1 as Pods v1
participant Pods2 as Pods v2
Note over User,Pods2: Initial State: 6 pods running v1
User->>LB: Traffic (100%)
LB->>Pods1: Route to 6 v1 pods
Note over K8s,Pods2: Rolling Update Initiated
K8s->>RS2: Create ReplicaSet v2
K8s->>RS2: Scale up to 2 pods (maxSurge=2)
RS2->>Pods2: Create 2 v2 pods
Note over Pods2: Wait for readiness probes
Pods2->>K8s: Ready signals
K8s->>RS1: Scale down by 1 pod
RS1->>Pods1: Terminate 1 v1 pod
Note over User,Pods2: Traffic: 5 v1 + 2 v2 pods
User->>LB: Traffic (100%)
LB->>Pods1: Route to 5 v1 pods
LB->>Pods2: Route to 2 v2 pods
K8s->>RS2: Scale up to 4 pods
RS2->>Pods2: Create 2 more v2 pods
Pods2->>K8s: Ready signals
K8s->>RS1: Scale down by 2 pods
RS1->>Pods1: Terminate 2 v1 pods
Note over User,Pods2: Traffic: 3 v1 + 4 v2 pods
K8s->>RS2: Scale up to 6 pods
RS2->>Pods2: Create 2 more v2 pods
Pods2->>K8s: Ready signals
K8s->>RS1: Scale down to 0
RS1->>Pods1: Terminate remaining v1 pods
Note over User,Pods2: Final State: 6 pods running v2
User->>LB: Traffic (100%)
LB->>Pods2: Route to 6 v2 pods
Advanced Rolling Update Configuration
# Deployment with rolling update strategy
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: production
labels:
app: user-service
version: v2.0.0
spec:
replicas: 6
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # Allow 2 extra pods during update
maxUnavailable: 1 # Max 1 pod unavailable at a time
minReadySeconds: 30 # Wait 30s before considering pod ready
progressDeadlineSeconds: 600 # 10min timeout for rollout
revisionHistoryLimit: 5 # Keep 5 previous versions
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
version: v2.0.0
spec:
containers:
- name: user-service
image: myregistry/user-service:v2.0.0
ports:
- containerPort: 8080
name: http
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
env:
- name: ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: database-credentials
key: url
# Comprehensive health checks
readinessProbe:
httpGet:
path: /health/ready
port: 8080
httpHeaders:
- name: Custom-Header
value: health-check
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 2
failureThreshold: 3
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
# Startup probe for slow-starting applications
startupProbe:
httpGet:
path: /health/startup
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 12 # 60 seconds total
# Graceful shutdown
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
# Security context
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
# Pod security and scheduling
securityContext:
fsGroup: 1000
terminationGracePeriodSeconds: 30
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- user-service
topologyKey: kubernetes.io/hostname
---
# Service for rolling updates
apiVersion: v1
kind: Service
metadata:
name: user-service
namespace: production
labels:
app: user-service
spec:
selector:
app: user-service
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
type: ClusterIP
---
# Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: user-service-pdb
namespace: production
spec:
minAvailable: 4 # Always keep at least 4 pods running
selector:
matchLabels:
app: user-service
Health Check Implementation
// Node.js health check endpoints
const express = require("express");
const app = express();
let isReady = false;
let isLive = true;
let startupComplete = false;
// Simulate application startup
setTimeout(() => {
startupComplete = true;
isReady = true;
}, 10000); // 10 second startup time
// Startup probe - for slow-starting applications
app.get("/health/startup", (req, res) => {
if (startupComplete) {
res.status(200).json({
status: "started",
timestamp: new Date().toISOString(),
});
} else {
res.status(503).json({
status: "starting",
timestamp: new Date().toISOString(),
});
}
});
// Readiness probe - ready to receive traffic
app.get("/health/ready", (req, res) => {
// Check dependencies (database, external services)
const checks = {
database: checkDatabase(),
redis: checkRedis(),
externalAPI: checkExternalAPI(),
};
const allHealthy = Object.values(checks).every(check => check);
if (isReady && allHealthy) {
res.status(200).json({
status: "ready",
checks,
timestamp: new Date().toISOString(),
});
} else {
res.status(503).json({
status: "not ready",
checks,
timestamp: new Date().toISOString(),
});
}
});
// Liveness probe - application is healthy
app.get("/health/live", (req, res) => {
if (isLive) {
res.status(200).json({
status: "alive",
uptime: process.uptime(),
memory: process.memoryUsage(),
timestamp: new Date().toISOString(),
});
} else {
res.status(503).json({
status: "unhealthy",
timestamp: new Date().toISOString(),
});
}
});
function checkDatabase() {
// Implement actual database connectivity check
return Math.random() > 0.1; // 90% success rate for demo
}
function checkRedis() {
// Implement actual Redis connectivity check
return Math.random() > 0.05; // 95% success rate for demo
}
function checkExternalAPI() {
// Implement actual external API check
return Math.random() > 0.15; // 85% success rate for demo
}
// Graceful shutdown handler
process.on("SIGTERM", () => {
console.log("SIGTERM received, shutting down gracefully");
isReady = false; // Stop accepting new requests
setTimeout(() => {
isLive = false; // Mark as unhealthy
process.exit(0);
}, 15000); // Wait 15 seconds for existing requests
});
app.listen(8080, () => {
console.log("Health check server running on port 8080");
});
Feature Flags and Progressive Delivery
Feature flags enable runtime control over feature availability, allowing teams to separate deployment from release and implement sophisticated rollout strategies.
Feature Flag Architecture
graph TB
subgraph "Client Applications"
Web[Web App]
Mobile[Mobile App]
API[API Gateway]
end
subgraph "Feature Flag Service"
FFS[Feature Flag Server]
Admin[Admin Dashboard]
SDK[Client SDKs]
end
subgraph "Configuration Store"
DB[(Flag Database)]
Cache[(Redis Cache)]
CDN[Edge Cache/CDN]
end
subgraph "User Context"
UserDB[(User Database)]
Segments[User Segments]
Analytics[Analytics Engine]
end
subgraph "Application Services"
UserSvc[User Service]
PaymentSvc[Payment Service]
OrderSvc[Order Service]
end
subgraph "Monitoring & Observability"
Metrics[Metrics Store]
Logs[Log Aggregation]
Alerts[Alert Manager]
end
Web --> SDK
Mobile --> SDK
API --> SDK
SDK --> FFS
Admin --> FFS
FFS --> DB
FFS --> Cache
SDK --> CDN
FFS --> UserDB
FFS --> Segments
FFS --> Analytics
UserSvc --> SDK
PaymentSvc --> SDK
OrderSvc --> SDK
FFS --> Metrics
FFS --> Logs
FFS --> Alerts
classDef client fill:#e3f2fd
classDef service fill:#e8f5e8
classDef data fill:#fff3e0
classDef monitor fill:#fce4ec
class Web,Mobile,API client
class FFS,Admin,SDK service
class DB,Cache,CDN,UserDB data
class Metrics,Logs,Alerts monitor
Progressive Rollout Implementation
// Feature flag service implementation
interface FeatureFlag {
key: string;
name: string;
description: string;
enabled: boolean;
rolloutPercentage: number;
userSegments: string[];
environmentRules: Record<string, any>;
constraints: Constraint[];
createdAt: Date;
updatedAt: Date;
}
interface User {
id: string;
email: string;
segments: string[];
attributes: Record<string, any>;
}
interface Constraint {
type: "user_id" | "segment" | "attribute" | "percentage";
operator: "in" | "not_in" | "equals" | "greater_than" | "less_than";
values: any[];
}
class FeatureFlagService {
private flags: Map<string, FeatureFlag> = new Map();
private cache: Map<string, boolean> = new Map();
constructor(
private database: Database,
private analytics: AnalyticsService,
private logger: Logger
) {}
async isEnabled(
flagKey: string,
user: User,
context?: any
): Promise<boolean> {
const cacheKey = `${flagKey}:${user.id}`;
// Check cache first
if (this.cache.has(cacheKey)) {
return this.cache.get(cacheKey)!;
}
const flag = await this.getFlag(flagKey);
if (!flag || !flag.enabled) {
this.cache.set(cacheKey, false);
return false;
}
const result = await this.evaluateFlag(flag, user, context);
// Cache result for 5 minutes
this.cache.set(cacheKey, result);
setTimeout(() => this.cache.delete(cacheKey), 5 * 60 * 1000);
// Track flag evaluation
this.analytics.track("feature_flag_evaluated", {
flagKey,
userId: user.id,
result,
timestamp: new Date(),
});
return result;
}
private async evaluateFlag(
flag: FeatureFlag,
user: User,
context?: any
): Promise<boolean> {
// Check constraints
for (const constraint of flag.constraints) {
if (!this.evaluateConstraint(constraint, user, context)) {
return false;
}
}
// Check user segments
if (flag.userSegments.length > 0) {
const hasSegment = flag.userSegments.some(segment =>
user.segments.includes(segment)
);
if (!hasSegment) {
return false;
}
}
// Check rollout percentage
if (flag.rolloutPercentage < 100) {
const userHash = this.hashUser(user.id, flag.key);
const userPercentile = userHash % 100;
return userPercentile < flag.rolloutPercentage;
}
return true;
}
private evaluateConstraint(
constraint: Constraint,
user: User,
context?: any
): boolean {
switch (constraint.type) {
case "user_id":
return this.evaluateOperator(
constraint.operator,
user.id,
constraint.values
);
case "segment":
return this.evaluateOperator(
constraint.operator,
user.segments,
constraint.values
);
case "attribute":
const attributeValue = user.attributes[constraint.values[0]];
return this.evaluateOperator(
constraint.operator,
attributeValue,
constraint.values.slice(1)
);
case "percentage":
const userHash = this.hashUser(user.id, "percentage");
const percentile = userHash % 100;
return percentile < constraint.values[0];
default:
return false;
}
}
private evaluateOperator(
operator: string,
userValue: any,
constraintValues: any[]
): boolean {
switch (operator) {
case "in":
return constraintValues.includes(userValue);
case "not_in":
return !constraintValues.includes(userValue);
case "equals":
return userValue === constraintValues[0];
case "greater_than":
return userValue > constraintValues[0];
case "less_than":
return userValue < constraintValues[0];
default:
return false;
}
}
private hashUser(userId: string, salt: string): number {
// Simple hash function for consistent user bucketing
let hash = 0;
const str = userId + salt;
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i);
hash = (hash << 5) - hash + char;
hash = hash & hash; // Convert to 32-bit integer
}
return Math.abs(hash);
}
async updateRolloutPercentage(
flagKey: string,
percentage: number
): Promise<void> {
const flag = await this.getFlag(flagKey);
if (!flag) throw new Error(`Flag ${flagKey} not found`);
flag.rolloutPercentage = percentage;
flag.updatedAt = new Date();
await this.database.updateFlag(flag);
// Clear cache to force re-evaluation
this.cache.clear();
this.logger.info(
`Updated rollout percentage for ${flagKey} to ${percentage}%`
);
}
private async getFlag(flagKey: string): Promise<FeatureFlag | null> {
if (this.flags.has(flagKey)) {
return this.flags.get(flagKey)!;
}
const flag = await this.database.getFlag(flagKey);
if (flag) {
this.flags.set(flagKey, flag);
}
return flag;
}
}
Feature Flag Rollout Strategy
gantt
title Progressive Feature Rollout Strategy
dateFormat X
axisFormat %d/%m
section Development
Feature Development :0, 7
Unit Testing :5, 10
Integration Testing :8, 12
section Internal Rollout
Team Testing (0.1%) :12, 15
Beta Users (1%) :15, 18
Power Users (5%) :18, 21
section Progressive Rollout
Early Adopters (10%) :21, 25
Segment A (25%) :25, 28
Segment B (50%) :28, 32
Full Rollout (100%) :32, 35
section Monitoring
Error Rate Monitoring :12, 40
Performance Monitoring :12, 40
User Feedback Collection :12, 40
Business Metrics Tracking :21, 40
GitOps Patterns with ArgoCD and Flux
GitOps treats Git repositories as the single source of truth for declarative infrastructure and application configuration, enabling automated, auditable deployments.
GitOps Architecture Overview
graph TB
subgraph "Git Repositories"
AppRepo[Application Code]
ConfigRepo[Configuration Repo]
InfraRepo[Infrastructure Repo]
EnvRepo[Environment Configs]
end
subgraph "CI/CD Pipeline"
CI[CI Pipeline]
Registry[Container Registry]
Scanner[Security Scanner]
end
subgraph "GitOps Controllers"
ArgoCD[ArgoCD Controller]
Flux[Flux Controller]
Tekton[Tekton Pipelines]
end
subgraph "Kubernetes Clusters"
subgraph "Development"
DevCluster[Dev Cluster]
DevApps[Applications]
end
subgraph "Staging"
StageCluster[Staging Cluster]
StageApps[Applications]
end
subgraph "Production"
ProdCluster[Prod Cluster]
ProdApps[Applications]
end
end
subgraph "Observability"
Prometheus[Prometheus]
Grafana[Grafana]
AlertManager[Alert Manager]
end
AppRepo --> CI
CI --> Scanner
CI --> Registry
CI --> ConfigRepo
ConfigRepo --> ArgoCD
ConfigRepo --> Flux
InfraRepo --> ArgoCD
EnvRepo --> ArgoCD
ArgoCD --> DevCluster
ArgoCD --> StageCluster
ArgoCD --> ProdCluster
Flux --> DevCluster
Flux --> StageCluster
DevApps --> Prometheus
StageApps --> Prometheus
ProdApps --> Prometheus
Prometheus --> Grafana
Prometheus --> AlertManager
classDef git fill:#f9f9f9
classDef ci fill:#e3f2fd
classDef gitops fill:#e8f5e8
classDef cluster fill:#fff3e0
classDef monitor fill:#fce4ec
class AppRepo,ConfigRepo,InfraRepo,EnvRepo git
class CI,Registry,Scanner ci
class ArgoCD,Flux,Tekton gitops
class DevCluster,StageCluster,ProdCluster cluster
class Prometheus,Grafana,AlertManager monitor
ArgoCD Application Configuration
# ArgoCD Application for multi-environment deployment
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: user-service
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/company/k8s-configs
targetRevision: HEAD
path: applications/user-service/overlays/production
kustomize:
images:
- myregistry/user-service:v2.0.0
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- PruneLast=true
- ServerSideApply=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
revisionHistoryLimit: 10
---
# ArgoCD AppProject for RBAC and resource restrictions
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: user-services
namespace: argocd
spec:
description: User service applications
sourceRepos:
- "https://github.com/company/k8s-configs"
- "https://charts.bitnami.com/bitnami"
destinations:
- namespace: "user-*"
server: https://kubernetes.default.svc
- namespace: "production"
server: https://kubernetes.default.svc
clusterResourceWhitelist:
- group: ""
kind: Namespace
- group: rbac.authorization.k8s.io
kind: ClusterRole
- group: rbac.authorization.k8s.io
kind: ClusterRoleBinding
namespaceResourceWhitelist:
- group: ""
kind: Service
- group: ""
kind: ConfigMap
- group: ""
kind: Secret
- group: apps
kind: Deployment
- group: apps
kind: ReplicaSet
- group: networking.k8s.io
kind: Ingress
roles:
- name: developer
description: Developers can sync and view applications
policies:
- p, proj:user-services:developer, applications, sync, user-services/*, allow
- p, proj:user-services:developer, applications, get, user-services/*, allow
- p, proj:user-services:developer, applications, action/*, user-services/*, allow
groups:
- company:developers
- name: admin
description: Admins have full access
policies:
- p, proj:user-services:admin, applications, *, user-services/*, allow
- p, proj:user-services:admin, repositories, *, *, allow
groups:
- company:platform-team
---
# Multi-source application for complex deployments
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: user-service-complete
namespace: argocd
spec:
project: user-services
sources:
- repoURL: https://github.com/company/k8s-configs
targetRevision: HEAD
path: applications/user-service/base
- repoURL: https://github.com/company/helm-charts
targetRevision: HEAD
path: user-service
helm:
valueFiles:
- $values/applications/user-service/values-production.yaml
- repoURL: https://github.com/company/k8s-configs
targetRevision: HEAD
path: .
ref: values
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
Flux Configuration
# Flux GitRepository source
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
name: user-service-config
namespace: flux-system
spec:
interval: 1m
url: https://github.com/company/k8s-configs
ref:
branch: main
secretRef:
name: git-credentials
verify:
mode: head
secretRef:
name: git-gpg-keys
---
# Flux Kustomization
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: user-service
namespace: flux-system
spec:
interval: 5m
path: "./applications/user-service/overlays/production"
prune: true
sourceRef:
kind: GitRepository
name: user-service-config
targetNamespace: production
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: user-service
namespace: production
dependsOn:
- name: infrastructure
postBuild:
substitute:
cluster_name: "production"
cluster_region: "us-west-2"
patches:
- patch: |
- op: replace
path: /spec/replicas
value: 6
target:
kind: Deployment
name: user-service
---
# Flux HelmRepository
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: bitnami
namespace: flux-system
spec:
interval: 1h
url: https://charts.bitnami.com/bitnami
---
# Flux HelmRelease
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: postgresql
namespace: production
spec:
interval: 5m
chart:
spec:
chart: postgresql
version: "12.x.x"
sourceRef:
kind: HelmRepository
name: bitnami
namespace: flux-system
values:
auth:
postgresPassword: ${postgres_password}
database: userservice
primary:
persistence:
enabled: true
size: 100Gi
storageClass: ssd
metrics:
enabled: true
serviceMonitor:
enabled: true
dependsOn:
- name: user-service
namespace: flux-system
GitOps Workflow
sequenceDiagram
participant Dev as Developer
participant AppRepo as App Repository
participant CI as CI Pipeline
participant ConfigRepo as Config Repository
participant ArgoCD as ArgoCD
participant K8s as Kubernetes
participant Monitor as Monitoring
Dev->>AppRepo: Push application code
AppRepo->>CI: Trigger build
CI->>CI: Run tests
CI->>CI: Build container image
CI->>CI: Security scanning
CI->>CI: Push to registry
CI->>ConfigRepo: Update image tag
ConfigRepo->>ArgoCD: Detect changes
ArgoCD->>ArgoCD: Compare desired vs actual state
ArgoCD->>K8s: Apply changes
K8s->>K8s: Rolling update
K8s->>Monitor: Send metrics
alt Deployment Success
Monitor->>ArgoCD: Healthy status
ArgoCD->>ConfigRepo: Update sync status
else Deployment Failure
Monitor->>ArgoCD: Unhealthy status
ArgoCD->>ArgoCD: Trigger rollback
ArgoCD->>K8s: Revert to previous version
ArgoCD->>Dev: Send failure notification
end
CI/CD Pipeline Best Practices
Modern CI/CD pipelines emphasize security, efficiency, and reliability through automation, comprehensive testing, and progressive deployment strategies.
Comprehensive Pipeline Architecture
graph TB
subgraph "Source Control"
Git[Git Repository]
PR[Pull Request]
Main[Main Branch]
end
subgraph "CI Pipeline"
Trigger[Webhook Trigger]
Checkout[Code Checkout]
subgraph "Build Stage"
Test[Unit Tests]
Lint[Code Linting]
Build[Application Build]
Package[Container Build]
end
subgraph "Security Stage"
SAST[Static Analysis]
Deps[Dependency Scan]
Secrets[Secret Scan]
Container[Container Scan]
end
subgraph "Quality Stage"
Coverage[Code Coverage]
SonarQube[Quality Gate]
Performance[Performance Tests]
end
end
subgraph "Artifact Management"
Registry[Container Registry]
Artifacts[Artifact Store]
Signing[Image Signing]
end
subgraph "CD Pipeline"
subgraph "Deployment Stages"
Dev[Development]
Integration[Integration Tests]
Staging[Staging Deploy]
E2E[E2E Tests]
Production[Production Deploy]
end
subgraph "Deployment Strategies"
BlueGreen[Blue-Green]
Canary[Canary]
Rolling[Rolling Update]
end
end
subgraph "Observability"
Logs[Centralized Logging]
Metrics[Metrics Collection]
Traces[Distributed Tracing]
Alerts[Alert Management]
end
Git --> Trigger
PR --> Trigger
Trigger --> Checkout
Checkout --> Test
Checkout --> Lint
Test --> Build
Lint --> Build
Build --> Package
Package --> SAST
Package --> Deps
Package --> Secrets
Package --> Container
SAST --> Coverage
Deps --> Coverage
Secrets --> Coverage
Container --> Coverage
Coverage --> SonarQube
SonarQube --> Performance
Performance --> Registry
Performance --> Artifacts
Registry --> Signing
Signing --> Dev
Dev --> Integration
Integration --> Staging
Staging --> E2E
E2E --> Production
Production --> BlueGreen
Production --> Canary
Production --> Rolling
Dev --> Logs
Staging --> Logs
Production --> Logs
Logs --> Metrics
Metrics --> Traces
Traces --> Alerts
classDef source fill:#f9f9f9
classDef ci fill:#e3f2fd
classDef security fill:#ffebee
classDef cd fill:#e8f5e8
classDef observe fill:#fce4ec
class Git,PR,Main source
class Test,Lint,Build,Package,SAST,Deps,Secrets,Container,Coverage,SonarQube,Performance ci
class SAST,Deps,Secrets,Container security
class Dev,Integration,Staging,E2E,Production,BlueGreen,Canary,Rolling cd
class Logs,Metrics,Traces,Alerts observe
GitHub Actions Pipeline
# .github/workflows/ci-cd.yml
name: CI/CD Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
# Static analysis and testing
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [18, 20]
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for SonarQube
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
cache: "npm"
- name: Install dependencies
run: npm ci
- name: Run linting
run: npm run lint
- name: Run type checking
run: npm run type-check
- name: Run unit tests
run: npm run test:coverage
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
file: ./coverage/lcov.info
flags: unittests
name: codecov-umbrella
- name: SonarQube Scan
uses: sonarqube-quality-gate-action@master
env:
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
SONAR_HOST_URL: ${{ secrets.SONAR_HOST_URL }}
# Security scanning
security:
runs-on: ubuntu-latest
needs: test
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
scan-type: "fs"
scan-ref: "."
format: "sarif"
output: "trivy-results.sarif"
- name: Upload Trivy scan results
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: "trivy-results.sarif"
- name: Dependency Review
uses: actions/dependency-review-action@v3
- name: Secret Scan
uses: trufflesecurity/trufflehog@main
with:
path: ./
base: main
head: HEAD
# Build and push container image
build:
runs-on: ubuntu-latest
needs: [test, security]
if: github.ref == 'refs/heads/main'
outputs:
image: ${{ steps.image.outputs.image }}
digest: ${{ steps.build.outputs.digest }}
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=sha,prefix=sha-
- name: Build and push
id: build
uses: docker/build-push-action@v5
with:
context: .
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
provenance: true
sbom: true
- name: Set image output
id: image
run: echo "image=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:sha-${{ github.sha }}" >> $GITHUB_OUTPUT
- name: Install Cosign
uses: sigstore/cosign-installer@v3
- name: Sign container image
run: |
cosign sign --yes ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ steps.build.outputs.digest }}
# Deploy to development
deploy-dev:
runs-on: ubuntu-latest
needs: build
environment: development
steps:
- name: Checkout GitOps repo
uses: actions/checkout@v4
with:
repository: company/k8s-configs
token: ${{ secrets.GITOPS_TOKEN }}
path: gitops
- name: Update development image
run: |
cd gitops
yq eval '.images[0].newTag = "${{ github.sha }}"' -i applications/user-service/overlays/development/kustomization.yaml
git config user.name "GitHub Actions"
git config user.email "actions@github.com"
git add .
git commit -m "Update user-service dev image to ${{ github.sha }}"
git push
# Integration tests
integration-test:
runs-on: ubuntu-latest
needs: deploy-dev
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
- name: Install dependencies
run: npm ci
- name: Wait for deployment
run: |
timeout 300 bash -c 'until curl -f http://dev.api.company.com/health; do sleep 5; done'
- name: Run integration tests
run: npm run test:integration
env:
API_BASE_URL: http://dev.api.company.com
# Deploy to staging
deploy-staging:
runs-on: ubuntu-latest
needs: integration-test
environment: staging
steps:
- name: Checkout GitOps repo
uses: actions/checkout@v4
with:
repository: company/k8s-configs
token: ${{ secrets.GITOPS_TOKEN }}
path: gitops
- name: Update staging image
run: |
cd gitops
yq eval '.images[0].newTag = "${{ github.sha }}"' -i applications/user-service/overlays/staging/kustomization.yaml
git config user.name "GitHub Actions"
git config user.email "actions@github.com"
git add .
git commit -m "Update user-service staging image to ${{ github.sha }}"
git push
# E2E tests
e2e-test:
runs-on: ubuntu-latest
needs: deploy-staging
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
- name: Install dependencies
run: npm ci
- name: Install Playwright
run: npx playwright install
- name: Wait for deployment
run: |
timeout 300 bash -c 'until curl -f http://staging.api.company.com/health; do sleep 5; done'
- name: Run E2E tests
run: npm run test:e2e
env:
API_BASE_URL: http://staging.api.company.com
- name: Upload test results
uses: actions/upload-artifact@v3
if: failure()
with:
name: playwright-report
path: playwright-report/
# Deploy to production
deploy-production:
runs-on: ubuntu-latest
needs: e2e-test
environment: production
if: github.ref == 'refs/heads/main'
steps:
- name: Checkout GitOps repo
uses: actions/checkout@v4
with:
repository: company/k8s-configs
token: ${{ secrets.GITOPS_TOKEN }}
path: gitops
- name: Update production image
run: |
cd gitops
yq eval '.images[0].newTag = "${{ github.sha }}"' -i applications/user-service/overlays/production/kustomization.yaml
git config user.name "GitHub Actions"
git config user.email "actions@github.com"
git add .
git commit -m "Deploy user-service to production: ${{ github.sha }}"
git push
- name: Create GitHub Release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tag_name: v${{ github.run_number }}
release_name: Release v${{ github.run_number }}
body: |
Automated release of user-service
- **Commit**: ${{ github.sha }}
- **Image**: ${{ needs.build.outputs.image }}
- **Digest**: ${{ needs.build.outputs.digest }}
Deployed to production via GitOps.
draft: false
prerelease: false
# Smoke tests in production
smoke-test:
runs-on: ubuntu-latest
needs: deploy-production
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
- name: Install dependencies
run: npm ci
- name: Wait for deployment
run: |
timeout 600 bash -c 'until curl -f https://api.company.com/health; do sleep 10; done'
- name: Run smoke tests
run: npm run test:smoke
env:
API_BASE_URL: https://api.company.com
- name: Notify success
if: success()
uses: 8398a7/action-slack@v3
with:
status: success
text: "✅ User service v${{ github.run_number }} deployed successfully to production!"
webhook_url: ${{ secrets.SLACK_WEBHOOK }}
- name: Notify failure
if: failure()
uses: 8398a7/action-slack@v3
with:
status: failure
text: "❌ User service v${{ github.run_number }} deployment failed in production smoke tests!"
webhook_url: ${{ secrets.SLACK_WEBHOOK }}
Multi-Environment Strategies
Effective multi-environment strategies ensure consistent deployment processes while maintaining appropriate isolation and security boundaries between different stages of the software lifecycle.
Environment Architecture
graph TB
subgraph "Development Environments"
subgraph "Developer Workspaces"
Local[Local Development]
DevPod[Development Pods]
end
subgraph "Shared Development"
DevShared[Shared Dev Environment]
Feature[Feature Branches]
end
end
subgraph "Testing Environments"
subgraph "Automated Testing"
Integration[Integration Tests]
Performance[Performance Tests]
Security[Security Tests]
end
subgraph "Manual Testing"
QA[QA Environment]
UAT[User Acceptance Testing]
end
end
subgraph "Pre-Production"
Staging[Staging Environment]
LoadTest[Load Testing]
Rehearsal[Deployment Rehearsal]
end
subgraph "Production"
subgraph "Production Deployment"
Blue[Blue Environment]
Green[Green Environment]
end
subgraph "Production Traffic"
Canary[Canary Deployment]
MainTraffic[Main Traffic]
end
end
subgraph "Configuration Management"
ConfigRepo[Configuration Repository]
Secrets[Secret Management]
FeatureFlags[Feature Flags]
end
subgraph "Observability"
Monitoring[Monitoring Stack]
Logging[Centralized Logging]
Alerting[Alert Management]
end
Local --> DevShared
DevPod --> DevShared
DevShared --> Feature
Feature --> Integration
Integration --> Performance
Performance --> Security
Security --> QA
QA --> UAT
UAT --> Staging
Staging --> LoadTest
LoadTest --> Rehearsal
Rehearsal --> Blue
Rehearsal --> Green
Blue --> Canary
Green --> Canary
Canary --> MainTraffic
ConfigRepo --> DevShared
ConfigRepo --> QA
ConfigRepo --> Staging
ConfigRepo --> Blue
ConfigRepo --> Green
Secrets --> QA
Secrets --> Staging
Secrets --> Blue
Secrets --> Green
FeatureFlags --> QA
FeatureFlags --> Staging
FeatureFlags --> Blue
FeatureFlags --> Green
DevShared --> Monitoring
QA --> Monitoring
Staging --> Monitoring
Blue --> Monitoring
Green --> Monitoring
Monitoring --> Logging
Logging --> Alerting
classDef dev fill:#e3f2fd
classDef test fill:#fff3e0
classDef preprod fill:#f3e5f5
classDef prod fill:#e8f5e8
classDef config fill:#fce4ec
class Local,DevPod,DevShared,Feature dev
class Integration,Performance,Security,QA,UAT test
class Staging,LoadTest,Rehearsal preprod
class Blue,Green,Canary,MainTraffic prod
class ConfigRepo,Secrets,FeatureFlags,Monitoring,Logging,Alerting config
Environment Configuration Strategy
# Base configuration (kustomization.yaml)
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
metadata:
name: user-service-base
resources:
- deployment.yaml
- service.yaml
- configmap.yaml
- ingress.yaml
commonLabels:
app: user-service
component: api
images:
- name: user-service
newName: myregistry/user-service
newTag: latest
configMapGenerator:
- name: user-service-config
files:
- config/app.properties
- config/logging.properties
secretGenerator:
- name: user-service-secrets
env: secrets/.env
---
# Development overlay (overlays/development/kustomization.yaml)
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
metadata:
name: user-service-development
namespace: development
resources:
- ../../base
patchesStrategicMerge:
- deployment-patch.yaml
- ingress-patch.yaml
configMapGenerator:
- name: user-service-config
behavior: merge
literals:
- ENVIRONMENT=development
- LOG_LEVEL=debug
- DATABASE_POOL_SIZE=5
- CACHE_TTL=300
- RATE_LIMIT_ENABLED=false
secretGenerator:
- name: user-service-secrets
behavior: merge
literals:
- DATABASE_URL=postgresql://dev-db:5432/userservice_dev
- REDIS_URL=redis://dev-redis:6379
- JWT_SECRET=dev-secret-key
images:
- name: user-service
newTag: development
replicas:
- name: user-service
count: 2
---
# Staging overlay (overlays/staging/kustomization.yaml)
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
metadata:
name: user-service-staging
namespace: staging
resources:
- ../../base
patchesStrategicMerge:
- deployment-patch.yaml
- ingress-patch.yaml
- hpa-patch.yaml
configMapGenerator:
- name: user-service-config
behavior: merge
literals:
- ENVIRONMENT=staging
- LOG_LEVEL=info
- DATABASE_POOL_SIZE=10
- CACHE_TTL=600
- RATE_LIMIT_ENABLED=true
- METRICS_ENABLED=true
secretGenerator:
- name: user-service-secrets
behavior: merge
literals:
- DATABASE_URL=postgresql://staging-db:5432/userservice_staging
- REDIS_URL=redis://staging-redis:6379
images:
- name: user-service
newTag: staging
replicas:
- name: user-service
count: 4
---
# Production overlay (overlays/production/kustomization.yaml)
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
metadata:
name: user-service-production
namespace: production
resources:
- ../../base
patchesStrategicMerge:
- deployment-patch.yaml
- ingress-patch.yaml
- hpa-patch.yaml
- pdb-patch.yaml
- networkpolicy-patch.yaml
configMapGenerator:
- name: user-service-config
behavior: merge
literals:
- ENVIRONMENT=production
- LOG_LEVEL=warn
- DATABASE_POOL_SIZE=20
- CACHE_TTL=1800
- RATE_LIMIT_ENABLED=true
- METRICS_ENABLED=true
- TRACING_ENABLED=true
- SECURITY_HEADERS_ENABLED=true
secretGenerator:
- name: user-service-secrets
behavior: merge
literals:
- DATABASE_URL=postgresql://prod-db-cluster:5432/userservice_prod
- REDIS_URL=redis://prod-redis-cluster:6379
images:
- name: user-service
newTag: production
replicas:
- name: user-service
count: 8
# Production-specific patches
patches:
- target:
kind: Deployment
name: user-service
patch: |-
- op: add
path: /spec/template/spec/containers/0/resources
value:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
- op: add
path: /spec/template/spec/affinity
value:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- user-service
topologyKey: kubernetes.io/hostname
Database Migration Patterns
Zero-downtime database migrations are crucial for maintaining service availability during deployments. The expand-and-contract pattern provides a systematic approach to schema evolution.
Expand-and-Contract Migration Flow
graph TB
subgraph "Phase 1: Expand"
subgraph "Database Schema"
OldSchema[Old Schema v1]
NewColumns[Add New Columns]
NewTables[Add New Tables]
NewIndexes[Add New Indexes]
end
subgraph "Application"
AppV1[Application v1]
DualWrite[Dual Write Logic]
end
end
subgraph "Phase 2: Migrate"
subgraph "Data Migration"
Backfill[Backfill Data]
Validation[Data Validation]
Consistency[Consistency Checks]
end
subgraph "Application Update"
AppV2[Application v2]
ReadNew[Read New Schema]
WriteNew[Write New Schema]
end
end
subgraph "Phase 3: Contract"
subgraph "Cleanup"
RemoveOld[Remove Old Columns]
DropTables[Drop Old Tables]
CleanupCode[Remove Migration Code]
end
subgraph "Final State"
FinalSchema[Final Schema v2]
AppV3[Application v3]
end
end
OldSchema --> NewColumns
NewColumns --> NewTables
NewTables --> NewIndexes
AppV1 --> DualWrite
DualWrite --> Backfill
Backfill --> Validation
Validation --> Consistency
Consistency --> AppV2
AppV2 --> ReadNew
ReadNew --> WriteNew
WriteNew --> RemoveOld
RemoveOld --> DropTables
DropTables --> CleanupCode
CleanupCode --> FinalSchema
FinalSchema --> AppV3
classDef expand fill:#e3f2fd
classDef migrate fill:#fff3e0
classDef contract fill:#e8f5e8
class OldSchema,NewColumns,NewTables,NewIndexes,AppV1,DualWrite expand
class Backfill,Validation,Consistency,AppV2,ReadNew,WriteNew migrate
class RemoveOld,DropTables,CleanupCode,FinalSchema,AppV3 contract
Migration Implementation Example
-- Phase 1: Expand - Add new columns and tables
-- Migration 001: Add new user profile columns
ALTER TABLE users
ADD COLUMN profile_data JSONB,
ADD COLUMN last_login_at TIMESTAMP WITH TIME ZONE,
ADD COLUMN created_by_id UUID;
-- Create new user_profiles table for normalized data
CREATE TABLE user_profiles (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
first_name VARCHAR(100),
last_name VARCHAR(100),
bio TEXT,
avatar_url VARCHAR(500),
preferences JSONB DEFAULT '{}',
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
CONSTRAINT unique_user_profile UNIQUE(user_id)
);
-- Add indexes for performance
CREATE INDEX idx_user_profiles_user_id ON user_profiles(user_id);
CREATE INDEX idx_users_last_login_at ON users(last_login_at);
CREATE INDEX idx_users_profile_data_gin ON users USING GIN(profile_data);
-- Phase 2: Migrate - Backfill data and update application logic
-- Data migration script (run in batches)
DO $$
DECLARE
batch_size INT := 1000;
offset_val INT := 0;
user_record RECORD;
BEGIN
LOOP
-- Process users in batches
FOR user_record IN
SELECT id, email, full_name, bio, avatar_url
FROM users
WHERE profile_data IS NULL
ORDER BY id
LIMIT batch_size OFFSET offset_val
LOOP
-- Migrate data to new structure
UPDATE users SET
profile_data = jsonb_build_object(
'full_name', user_record.full_name,
'bio', user_record.bio,
'avatar_url', user_record.avatar_url,
'migrated_at', NOW()
),
last_login_at = COALESCE(last_login_at, created_at)
WHERE id = user_record.id;
-- Create normalized profile record
INSERT INTO user_profiles (user_id, first_name, last_name, bio, avatar_url)
SELECT
user_record.id,
split_part(user_record.full_name, ' ', 1),
split_part(user_record.full_name, ' ', 2),
user_record.bio,
user_record.avatar_url
ON CONFLICT (user_id) DO NOTHING;
END LOOP;
-- Check if we processed all records
IF NOT FOUND THEN
EXIT;
END IF;
offset_val := offset_val + batch_size;
-- Add delay to avoid overwhelming the database
PERFORM pg_sleep(0.1);
END LOOP;
END $$;
-- Phase 3: Contract - Remove old columns and cleanup
-- Migration 003: Remove old columns (after application deployment)
ALTER TABLE users
DROP COLUMN full_name,
DROP COLUMN bio,
DROP COLUMN avatar_url;
-- Add constraints that were deferred
ALTER TABLE user_profiles
ADD CONSTRAINT check_names_not_empty
CHECK (length(trim(first_name)) > 0 AND length(trim(last_name)) > 0);
Application Code Evolution
// Phase 1: Dual Write Implementation
class UserService {
async updateUserProfile(
userId: string,
profileData: UserProfile
): Promise<User> {
const transaction = await this.db.transaction();
try {
// Write to old format (backward compatibility)
await transaction.query(
`
UPDATE users SET
full_name = $2,
bio = $3,
avatar_url = $4,
updated_at = NOW()
WHERE id = $1
`,
[userId, profileData.fullName, profileData.bio, profileData.avatarUrl]
);
// Write to new format (dual write)
await transaction.query(
`
UPDATE users SET
profile_data = $2,
updated_at = NOW()
WHERE id = $1
`,
[userId, JSON.stringify(profileData)]
);
// Upsert to new normalized table
await transaction.query(
`
INSERT INTO user_profiles (user_id, first_name, last_name, bio, avatar_url, preferences)
VALUES ($1, $2, $3, $4, $5, $6)
ON CONFLICT (user_id) DO UPDATE SET
first_name = EXCLUDED.first_name,
last_name = EXCLUDED.last_name,
bio = EXCLUDED.bio,
avatar_url = EXCLUDED.avatar_url,
preferences = EXCLUDED.preferences,
updated_at = NOW()
`,
[
userId,
profileData.firstName,
profileData.lastName,
profileData.bio,
profileData.avatarUrl,
JSON.stringify(profileData.preferences),
]
);
await transaction.commit();
return this.getUserById(userId);
} catch (error) {
await transaction.rollback();
throw error;
}
}
// Phase 2: Read from new format, fallback to old
async getUserProfile(userId: string): Promise<UserProfile | null> {
const user = await this.db.query(
`
SELECT
u.id,
u.email,
u.profile_data,
u.full_name, -- Fallback for unmigrated records
u.bio, -- Fallback for unmigrated records
u.avatar_url, -- Fallback for unmigrated records
up.first_name,
up.last_name,
up.bio as profile_bio,
up.avatar_url as profile_avatar,
up.preferences
FROM users u
LEFT JOIN user_profiles up ON u.id = up.user_id
WHERE u.id = $1
`,
[userId]
);
if (!user) return null;
// Prefer new format, fallback to old format
if (user.profile_data) {
const profileData = JSON.parse(user.profile_data);
return {
firstName: user.first_name || profileData.first_name,
lastName: user.last_name || profileData.last_name,
bio: user.profile_bio || profileData.bio,
avatarUrl: user.profile_avatar || profileData.avatar_url,
preferences: user.preferences || profileData.preferences || {},
};
} else {
// Fallback to old format
return {
firstName: user.full_name?.split(" ")[0] || "",
lastName: user.full_name?.split(" ")[1] || "",
bio: user.bio,
avatarUrl: user.avatar_url,
preferences: {},
};
}
}
// Phase 3: Clean implementation using only new schema
async updateUserProfileFinal(
userId: string,
profileData: UserProfile
): Promise<User> {
await this.db.query(
`
UPDATE user_profiles SET
first_name = $2,
last_name = $3,
bio = $4,
avatar_url = $5,
preferences = $6,
updated_at = NOW()
WHERE user_id = $1
`,
[
userId,
profileData.firstName,
profileData.lastName,
profileData.bio,
profileData.avatarUrl,
JSON.stringify(profileData.preferences),
]
);
return this.getUserById(userId);
}
}
Migration Monitoring
# Kubernetes CronJob for migration monitoring
apiVersion: batch/v1
kind: CronJob
metadata:
name: migration-monitor
namespace: production
spec:
schedule: "*/15 * * * *" # Every 15 minutes
jobTemplate:
spec:
template:
spec:
containers:
- name: migration-monitor
image: postgres:15
env:
- name: PGHOST
value: "production-db.example.com"
- name: PGUSER
valueFrom:
secretKeyRef:
name: db-credentials
key: username
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
- name: PGDATABASE
value: "userservice"
command:
- /bin/bash
- -c
- |
# Check migration progress
TOTAL_USERS=$(psql -t -c "SELECT COUNT(*) FROM users;")
MIGRATED_USERS=$(psql -t -c "SELECT COUNT(*) FROM users WHERE profile_data IS NOT NULL;")
PROFILE_RECORDS=$(psql -t -c "SELECT COUNT(*) FROM user_profiles;")
MIGRATION_PERCENTAGE=$(( (MIGRATED_USERS * 100) / TOTAL_USERS ))
echo "Migration Progress Report:"
echo "Total Users: $TOTAL_USERS"
echo "Migrated Users: $MIGRATED_USERS"
echo "Profile Records: $PROFILE_RECORDS"
echo "Migration Percentage: $MIGRATION_PERCENTAGE%"
# Check for data consistency
INCONSISTENT_RECORDS=$(psql -t -c "
SELECT COUNT(*) FROM users u
LEFT JOIN user_profiles up ON u.id = up.user_id
WHERE u.profile_data IS NOT NULL AND up.user_id IS NULL;
")
if [ "$INCONSISTENT_RECORDS" -gt 0 ]; then
echo "WARNING: Found $INCONSISTENT_RECORDS inconsistent records!"
# Send alert to monitoring system
curl -X POST "$SLACK_WEBHOOK" -d "{\"text\": \"Migration inconsistency detected: $INCONSISTENT_RECORDS records\"}"
fi
# Export metrics to Prometheus
cat << EOF > /tmp/migration-metrics.prom
# HELP user_migration_total Total number of users
# TYPE user_migration_total gauge
user_migration_total $TOTAL_USERS
# HELP user_migration_completed Number of migrated users
# TYPE user_migration_completed gauge
user_migration_completed $MIGRATED_USERS
# HELP user_migration_percentage Percentage of migration completion
# TYPE user_migration_percentage gauge
user_migration_percentage $MIGRATION_PERCENTAGE
# HELP user_migration_inconsistent Number of inconsistent records
# TYPE user_migration_inconsistent gauge
user_migration_inconsistent $INCONSISTENT_RECORDS
EOF
# Push metrics to Pushgateway
curl -X POST "http://pushgateway.monitoring:9091/metrics/job/migration-monitor" \
--data-binary @/tmp/migration-metrics.prom
restartPolicy: OnFailure
Conclusion
Modern deployment and release patterns have evolved to address the complex requirements of cloud-native applications: zero downtime, rapid iteration, risk mitigation, and operational simplicity. The strategies outlined in this guide provide a comprehensive toolkit for implementing robust deployment pipelines.
Key Takeaways
- Choose the Right Pattern: Blue-Green for instant rollback, Canary for risk mitigation, Rolling Updates for resource efficiency
- Embrace GitOps: Declarative configurations provide audit trails, consistency, and automated reconciliation
- Implement Progressive Delivery: Feature flags and gradual rollouts reduce blast radius and enable data-driven decisions
- Prioritize Observability: Comprehensive monitoring, logging, and alerting are essential for confident deployments
- Automate Everything: From testing to deployment to rollback, automation reduces human error and accelerates delivery
- Plan for Data: Database migrations require careful planning and execution to maintain zero downtime
Future Trends
As we move forward, emerging patterns like service mesh-native deployments, AI-driven canary analysis, and platform engineering approaches will further enhance deployment capabilities. The foundation built with these proven patterns will enable teams to adopt new technologies while maintaining reliability and operational excellence.
The journey to mastering deployment patterns is ongoing, but with these tools and techniques, teams can build resilient, scalable systems that deliver value to users while maintaining the agility needed in today’s competitive landscape.
This guide provides practical, production-ready examples for implementing modern deployment patterns. Remember to adapt these patterns to your specific requirements, constraints, and organizational context.