Building a Secure Service Mesh with SPIFFE/SPIRE - Complete Implementation Guide
In the era of microservices and distributed systems, securing service-to-service communication has become paramount. This guide provides a comprehensive implementation of a secure service mesh using SPIFFE (Secure Production Identity Framework For Everyone) and SPIRE (SPIFFE Runtime Environment), complete with detailed architecture diagrams and production-ready configurations.
Table of Contents
Service Mesh Architecture Overview
A service mesh provides a dedicated infrastructure layer for managing service-to-service communication. When combined with SPIFFE/SPIRE, it creates a zero-trust security model where every workload has a cryptographically verifiable identity.
graph TB
subgraph "Service Mesh Architecture"
subgraph "Control Plane"
SPIRE_Server[SPIRE Server]
Policy_Engine[Policy Engine]
Config_Manager[Configuration Manager]
Cert_Authority[Certificate Authority]
Telemetry[Telemetry Collector]
end
subgraph "Data Plane"
subgraph "Service A"
A_App[Application A]
A_Proxy[Envoy Proxy]
A_Agent[SPIRE Agent]
A_Workload[Workload API]
end
subgraph "Service B"
B_App[Application B]
B_Proxy[Envoy Proxy]
B_Agent[SPIRE Agent]
B_Workload[Workload API]
end
subgraph "Service C"
C_App[Application C]
C_Proxy[Envoy Proxy]
C_Agent[SPIRE Agent]
C_Workload[Workload API]
end
end
subgraph "Infrastructure"
K8s[Kubernetes API]
Registry[Service Registry]
KV_Store[Key-Value Store]
end
end
SPIRE_Server --> Cert_Authority
SPIRE_Server --> Policy_Engine
SPIRE_Server --> KV_Store
A_Agent --> SPIRE_Server
B_Agent --> SPIRE_Server
C_Agent --> SPIRE_Server
A_Agent --> A_Workload
B_Agent --> B_Workload
C_Agent --> C_Workload
A_App --> A_Proxy
B_App --> B_Proxy
C_App --> C_Proxy
A_Proxy -.-> B_Proxy
B_Proxy -.-> C_Proxy
A_Proxy -.-> C_Proxy
Config_Manager --> Registry
Registry --> K8s
Telemetry --> A_Proxy
Telemetry --> B_Proxy
Telemetry --> C_Proxy
style SPIRE_Server fill:#f96,stroke:#333,stroke-width:4px
style A_Proxy fill:#9f9,stroke:#333,stroke-width:2px
style B_Proxy fill:#9f9,stroke:#333,stroke-width:2px
style C_Proxy fill:#9f9,stroke:#333,stroke-width:2px
Key Components
- SPIRE Server: Central authority for workload attestation and SVID issuance
- SPIRE Agent: Node-level component that attests workloads and manages SVIDs
- Envoy Proxy: Data plane proxy handling mTLS and traffic management
- Workload API: Unix domain socket for workload-to-SPIRE communication
- Policy Engine: Centralized policy management and enforcement
SPIFFE/SPIRE Identity Flow
Understanding how SPIFFE identities (SVIDs) are created, distributed, and verified is crucial for implementing a secure service mesh.
sequenceDiagram
participant W as Workload
participant A as SPIRE Agent
participant S as SPIRE Server
participant CA as Certificate Authority
participant R as Registration API
Note over W,CA: Initial Workload Registration
R->>S: Register workload entry
S->>S: Store registration
Note over W,CA: Workload Attestation & SVID Issuance
W->>A: Connect to Workload API
A->>A: Perform workload attestation
A->>A: Verify workload selectors
A->>S: Request SVID for workload
S->>S: Verify agent identity
S->>S: Check workload registration
S->>CA: Generate key pair & CSR
CA->>CA: Sign certificate
CA->>S: Return X.509 SVID
S->>A: Send SVID bundle
A->>W: Provide SVID via Workload API
W->>W: Configure TLS with SVID
Note over W,CA: SVID Rotation
loop Every 30 minutes
A->>S: Check SVID expiration
alt SVID expiring soon
A->>S: Request SVID renewal
S->>CA: Generate new SVID
CA->>S: Return new SVID
S->>A: Send updated SVID
A->>W: Hot-reload new SVID
end
end
Note over W,CA: Service-to-Service Communication
W->>W: Initiate TLS connection
W->>W: Present SVID
W->>A: Validate peer SVID
A->>A: Check trust bundle
A->>W: Validation result
SPIFFE Identity Structure
# SPIFFE ID Formatspiffe://trust-domain/path/to/workload
# Example Identitiesspiffe://production.company.com/ns/default/sa/frontendspiffe://production.company.com/ns/payments/sa/processorspiffe://production.company.com/region/us-east/service/api-gatewayNetwork Policy Enforcement Flow
The service mesh enforces network policies at multiple levels, providing defense in depth:
graph TB
subgraph "Policy Enforcement Layers"
subgraph "Layer 1: Network Policies"
NP_Ingress[Ingress Rules]
NP_Egress[Egress Rules]
NP_CIDR[CIDR Blocks]
end
subgraph "Layer 2: Service Mesh Policies"
SM_Auth[Authentication Policy]
SM_Authz[Authorization Policy]
SM_Traffic[Traffic Policy]
end
subgraph "Layer 3: Application Policies"
APP_RBAC[RBAC Rules]
APP_Custom[Custom Logic]
APP_Rate[Rate Limiting]
end
end
subgraph "Enforcement Points"
subgraph "Network Level"
CNI[CNI Plugin]
IPTables[iptables/nftables]
eBPF[eBPF Programs]
end
subgraph "Proxy Level"
Envoy[Envoy Proxy]
WASM[WASM Filters]
Lua[Lua Scripts]
end
subgraph "Application Level"
SDK[Service Mesh SDK]
Middleware[Middleware]
Interceptors[gRPC Interceptors]
end
end
NP_Ingress --> CNI
NP_Egress --> IPTables
NP_CIDR --> eBPF
SM_Auth --> Envoy
SM_Authz --> WASM
SM_Traffic --> Lua
APP_RBAC --> SDK
APP_Custom --> Middleware
APP_Rate --> Interceptors
style SM_Auth fill:#f96,stroke:#333,stroke-width:2px
style Envoy fill:#9f9,stroke:#333,stroke-width:2px
Policy Decision Flow
sequenceDiagram
participant Client
participant Envoy as Envoy Proxy
participant OPA as Open Policy Agent
participant SPIRE as SPIRE Agent
participant Service
Client->>Envoy: HTTPS Request with SVID
Envoy->>Envoy: Validate TLS/SVID
Envoy->>SPIRE: Verify SVID
SPIRE->>Envoy: SVID Valid
Envoy->>Envoy: Extract request context
Note over Envoy: Method, Path, Headers, SPIFFE ID
Envoy->>OPA: Authorization check
Note over OPA: {
Note over OPA: "subject": "spiffe://...",
Note over OPA: "resource": "/api/users",
Note over OPA: "action": "GET"
Note over OPA: }
OPA->>OPA: Evaluate policies
OPA->>Envoy: Decision (Allow/Deny)
alt Allowed
Envoy->>Service: Forward request
Service->>Envoy: Response
Envoy->>Client: Response
else Denied
Envoy->>Client: 403 Forbidden
end
Implementation Guide
Prerequisites
Before implementing the secure service mesh, ensure you have:
- Kubernetes cluster (1.19+)
- Helm 3.x installed
- kubectl configured
- Storage class for persistent volumes
- Load balancer or ingress controller
Step 1: Install SPIRE
# Add SPIRE Helm repositoryhelm repo add spiffe https://spiffe.github.io/helm-chartshelm repo update
# Create SPIRE namespacekubectl create namespace spire
# Install SPIRE with custom valuescat > spire-values.yaml << EOFspire-server: image: tag: 1.8.0
controllerManager: enabled: true
notifier: k8sbundle: enabled: true
dataStore: sql: databaseType: postgres connectionString: "postgresql://spire:password@postgres:5432/spire"
trustDomain: production.company.com
ca_subject: country: US organization: Company common_name: SPIRE CA
persistence: enabled: true size: 10Gi
nodeAttestor: k8sPsat: enabled: true
spire-agent: image: tag: 1.8.0
workloadAttestors: k8s: enabled: true unix: enabled: true
sockets: admin: enabled: trueEOF
helm install spire spiffe/spire \ --namespace spire \ --values spire-values.yamlStep 2: Deploy Service Mesh Control Plane
apiVersion: install.istio.io/v1alpha1kind: IstioOperatormetadata: name: control-planespec: values: pilot: env: PILOT_ENABLE_WORKLOAD_ENTRY_AUTOREGISTRATION: true PILOT_ENABLE_CROSS_CLUSTER_WORKLOAD_ENTRY: true
telemetry: v2: prometheus: configOverride: inboundSidecar: disable_host_header_fallback: true outboundSidecar: disable_host_header_fallback: true
meshConfig: defaultConfig: proxyStatsMatcher: inclusionRegexps: - ".*outlier_detection.*" - ".*circuit_breakers.*" - ".*upstream_rq_retry.*" - ".*upstream_rq_pending.*"
trustDomain: production.company.com
extensionProviders: - name: spire envoyExtAuthzGrpc: service: spire-server.spire.svc.cluster.local port: 8081
defaultProviders: accessLogging: - otelStep 3: Configure Workload Registration
graph LR
subgraph "Registration Flow"
K8s[Kubernetes Controller]
Reg[Registration Controller]
SPIRE[SPIRE Server]
DB[(Registration DB)]
end
K8s -->|Watch Events| Reg
Reg -->|Create Entry| SPIRE
SPIRE -->|Store| DB
style Reg fill:#9f9,stroke:#333,stroke-width:2px
apiVersion: spire.spiffe.io/v1alpha1kind: ClusterSPIFFEIDmetadata: name: default-workloadsspec: spiffeIDTemplate: "spiffe://{{ .TrustDomain }}/ns/{{ .PodMeta.Namespace }}/sa/{{ .PodSpec.ServiceAccountName }}" podSelector: matchLabels: spiffe.io/enabled: "true" workloadSelectorTemplates: - "k8s:ns:{{ .PodMeta.Namespace }}" - "k8s:sa:{{ .PodSpec.ServiceAccountName }}" - "k8s:pod-name:{{ .PodMeta.Name }}"---apiVersion: v1kind: ServiceAccountmetadata: name: frontend namespace: production labels: spiffe.io/enabled: "true"---apiVersion: apps/v1kind: Deploymentmetadata: name: frontend namespace: productionspec: replicas: 3 selector: matchLabels: app: frontend template: metadata: labels: app: frontend spiffe.io/enabled: "true" spec: serviceAccountName: frontend containers: - name: app image: frontend:latest env: - name: SPIFFE_ENDPOINT_SOCKET value: unix:///spiffe-workload-api/spire-agent.sock volumeMounts: - name: spiffe-workload-api mountPath: /spiffe-workload-api readOnly: true - name: envoy image: envoyproxy/envoy:v1.28-latest args: - -c - /etc/envoy/envoy.yaml volumeMounts: - name: envoy-config mountPath: /etc/envoy - name: spiffe-workload-api mountPath: /spiffe-workload-api readOnly: true volumes: - name: spiffe-workload-api csi: driver: "csi.spiffe.io" readOnly: true - name: envoy-config configMap: name: envoy-configStep 4: Implement mTLS Configuration
apiVersion: v1kind: ConfigMapmetadata: name: envoy-config namespace: productiondata: envoy.yaml: | node: id: frontend cluster: frontend-cluster
static_resources: listeners: - name: ingress address: socket_address: address: 0.0.0.0 port_value: 8080 filter_chains: - filters: - name: envoy.filters.network.http_connection_manager typed_config: "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager stat_prefix: ingress_http route_config: name: local_route virtual_hosts: - name: backend domains: ["*"] routes: - match: prefix: "/" route: cluster: backend_cluster http_filters: - name: envoy.filters.http.ext_authz typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz grpc_service: envoy_grpc: cluster_name: opa_cluster - name: envoy.filters.http.router typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router transport_socket: name: envoy.transport_sockets.tls typed_config: "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext common_tls_context: tls_certificate_sds_secret_configs: - name: "spiffe://production.company.com/ns/production/sa/frontend" sds_config: resource_api_version: V3 api_config_source: api_type: GRPC transport_api_version: V3 grpc_services: - envoy_grpc: cluster_name: spire_agent validation_context_sds_secret_config: name: "spiffe://production.company.com" sds_config: resource_api_version: V3 api_config_source: api_type: GRPC transport_api_version: V3 grpc_services: - envoy_grpc: cluster_name: spire_agent
clusters: - name: backend_cluster connect_timeout: 30s type: STRICT_DNS lb_policy: ROUND_ROBIN load_assignment: cluster_name: backend_cluster endpoints: - lb_endpoints: - endpoint: address: socket_address: address: backend-service port_value: 8080 transport_socket: name: envoy.transport_sockets.tls typed_config: "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext common_tls_context: tls_certificate_sds_secret_configs: - name: "spiffe://production.company.com/ns/production/sa/frontend" sds_config: resource_api_version: V3 api_config_source: api_type: GRPC transport_api_version: V3 grpc_services: - envoy_grpc: cluster_name: spire_agent validation_context_sds_secret_config: name: "spiffe://production.company.com" sds_config: resource_api_version: V3 api_config_source: api_type: GRPC transport_api_version: V3 grpc_services: - envoy_grpc: cluster_name: spire_agent
- name: spire_agent connect_timeout: 1s type: STATIC lb_policy: ROUND_ROBIN load_assignment: cluster_name: spire_agent endpoints: - lb_endpoints: - endpoint: address: pipe: path: /spiffe-workload-api/spire-agent.sockStep 5: Deploy Policy Engine
graph TB
subgraph "Policy Architecture"
subgraph "Policy Sources"
Git[Git Repository]
API[Policy API]
ConfigMap[K8s ConfigMap]
end
subgraph "Policy Engine"
OPA[Open Policy Agent]
Bundle[Bundle Server]
Cache[Policy Cache]
end
subgraph "Enforcement Points"
Envoy1[Envoy Proxy 1]
Envoy2[Envoy Proxy 2]
Envoy3[Envoy Proxy 3]
end
end
Git --> Bundle
API --> Bundle
ConfigMap --> OPA
Bundle --> Cache
Cache --> OPA
Envoy1 --> OPA
Envoy2 --> OPA
Envoy3 --> OPA
style OPA fill:#f96,stroke:#333,stroke-width:2px
apiVersion: v1kind: ConfigMapmetadata: name: opa-policy namespace: productiondata: policy.rego: | package envoy.authz
import input.attributes.request.http as http_request import input.attributes.source.address as source_address
default allow = false
# Extract SPIFFE ID from certificate spiffe_id = id { [_, id] := split(http_request.headers["x-forwarded-client-cert"], "URI=") }
# Allow health checks allow { http_request.path == "/health" }
# Service-to-service authorization rules allow { http_request.method == "GET" http_request.path == "/api/users" spiffe_id == "spiffe://production.company.com/ns/production/sa/frontend" }
allow { http_request.method == "POST" http_request.path == "/api/orders" spiffe_id == "spiffe://production.company.com/ns/production/sa/order-service" }
# Rate limiting rules rate_limit[decision] { service := split(spiffe_id, "/")[4] limits := { "frontend": 1000, "backend": 500, "database": 100 } decision := { "allowed": true, "headers": { "X-RateLimit-Limit": limits[service] } } }---apiVersion: apps/v1kind: Deploymentmetadata: name: opa namespace: productionspec: replicas: 3 selector: matchLabels: app: opa template: metadata: labels: app: opa spec: containers: - name: opa image: openpolicyagent/opa:0.59.0-envoy ports: - containerPort: 9191 args: - "run" - "--server" - "--config-file=/config/config.yaml" - "/policies" volumeMounts: - name: opa-policy mountPath: /policies - name: opa-config mountPath: /config livenessProbe: httpGet: path: /health port: 8181 initialDelaySeconds: 5 periodSeconds: 5 readinessProbe: httpGet: path: /health?bundle=true port: 8181 initialDelaySeconds: 5 periodSeconds: 5 volumes: - name: opa-policy configMap: name: opa-policy - name: opa-config configMap: name: opa-configAdvanced Security Features
Zero Trust Network Architecture
graph TB
subgraph "Zero Trust Principles"
subgraph "Never Trust"
NT1[No Implicit Trust]
NT2[Verify Every Request]
NT3[Assume Breach]
end
subgraph "Always Verify"
AV1[Identity Verification]
AV2[Device Verification]
AV3[Context Verification]
end
subgraph "Least Privilege"
LP1[Minimal Access]
LP2[Just-In-Time Access]
LP3[Adaptive Access]
end
end
subgraph "Implementation"
subgraph "Identity"
SPIFFE[SPIFFE IDs]
mTLS[Mutual TLS]
Tokens[JWT Tokens]
end
subgraph "Policy"
RBAC[Role-Based Access]
ABAC[Attribute-Based Access]
Context[Contextual Policies]
end
subgraph "Monitoring"
Audit[Audit Logs]
Metrics[Security Metrics]
Alerts[Real-time Alerts]
end
end
NT1 --> SPIFFE
NT2 --> mTLS
NT3 --> Audit
AV1 --> SPIFFE
AV2 --> Context
AV3 --> ABAC
LP1 --> RBAC
LP2 --> Context
LP3 --> ABAC
style SPIFFE fill:#f96,stroke:#333,stroke-width:2px
style mTLS fill:#9f9,stroke:#333,stroke-width:2px
Secret Management Integration
apiVersion: v1kind: ConfigMapmetadata: name: spire-server-config namespace: spiredata: server.conf: | server { bind_address = "0.0.0.0" bind_port = "8081" trust_domain = "production.company.com" data_dir = "/run/spire/data" log_level = "INFO"
ca_key_type = "rsa-2048" ca_ttl = "24h"
jwt_issuer = "https://spire.production.company.com" }
plugins { DataStore "sql" { plugin_data { database_type = "postgres" connection_string = "${SPIRE_DB_CONNECTION_STRING}" } }
KeyManager "disk" { plugin_data { keys_path = "/run/spire/data/keys.json" } }
UpstreamAuthority "vault" { plugin_data { vault_addr = "https://vault.production.company.com" pki_mount_point = "pki/spire" ca_cert_path = "/run/secrets/vault-ca.crt"
token_auth { token = "${VAULT_TOKEN}" }
# Or use AppRole auth # approle_auth { # approle_id = "${VAULT_APPROLE_ID}" # approle_secret_id = "${VAULT_APPROLE_SECRET_ID}" # } } }
NodeAttestor "k8s_psat" { plugin_data { clusters = { "production" = { service_account_allow_list = ["spire:spire-agent"] } } } } }Advanced Monitoring and Observability
graph LR
subgraph "Data Collection"
Envoy[Envoy Metrics]
SPIRE[SPIRE Metrics]
Apps[Application Metrics]
Traces[Distributed Traces]
end
subgraph "Processing"
Prometheus[Prometheus]
Jaeger[Jaeger]
FluentBit[Fluent Bit]
end
subgraph "Storage"
TSDB[Time Series DB]
TraceDB[Trace Storage]
LogDB[Log Storage]
end
subgraph "Visualization"
Grafana[Grafana]
Kibana[Kibana]
Custom[Custom Dashboards]
end
Envoy --> Prometheus
SPIRE --> Prometheus
Apps --> Prometheus
Traces --> Jaeger
Prometheus --> TSDB
Jaeger --> TraceDB
FluentBit --> LogDB
TSDB --> Grafana
TraceDB --> Grafana
LogDB --> Kibana
Grafana --> Custom
Kibana --> Custom
style Prometheus fill:#f96,stroke:#333,stroke-width:2px
style Grafana fill:#9f9,stroke:#333,stroke-width:2px
Security Dashboard Configuration
{ "dashboard": { "title": "Service Mesh Security Dashboard", "panels": [ { "title": "mTLS Adoption Rate", "targets": [ { "expr": "sum(rate(envoy_http_downstream_cx_ssl_total[5m])) / sum(rate(envoy_http_downstream_cx_total[5m])) * 100", }, ], }, { "title": "Authorization Denials", "targets": [ { "expr": "sum(rate(envoy_http_ext_authz_denied[5m])) by (service)", }, ], }, { "title": "SVID Rotation Events", "targets": [ { "expr": "sum(rate(spire_agent_svid_rotations_total[5m])) by (trust_domain)", }, ], }, { "title": "Policy Violations", "targets": [ { "expr": 'sum(rate(opa_decisions_total{decision="deny"}[5m])) by (policy)', }, ], }, ], },}Production Deployment Considerations
High Availability Configuration
graph TB
subgraph "HA Architecture"
subgraph "Region 1"
LB1[Load Balancer]
SPIRE1[SPIRE Server 1]
SPIRE2[SPIRE Server 2]
DB1[(Primary DB)]
end
subgraph "Region 2"
LB2[Load Balancer]
SPIRE3[SPIRE Server 3]
SPIRE4[SPIRE Server 4]
DB2[(Replica DB)]
end
subgraph "Global"
GLB[Global Load Balancer]
GSLB[Global Service LB]
end
end
GLB --> LB1
GLB --> LB2
LB1 --> SPIRE1
LB1 --> SPIRE2
LB2 --> SPIRE3
LB2 --> SPIRE4
SPIRE1 --> DB1
SPIRE2 --> DB1
SPIRE3 --> DB2
SPIRE4 --> DB2
DB1 -.->|Replication| DB2
style GLB fill:#f96,stroke:#333,stroke-width:2px
style DB1 fill:#9f9,stroke:#333,stroke-width:2px
Disaster Recovery Plan
apiVersion: batch/v1kind: CronJobmetadata: name: spire-backup namespace: spirespec: schedule: "0 */6 * * *" # Every 6 hours jobTemplate: spec: template: spec: containers: - name: backup image: postgres:15-alpine env: - name: PGPASSWORD valueFrom: secretKeyRef: name: postgres-secret key: password command: - /bin/sh - -c - | # Backup SPIRE database pg_dump -h postgres -U spire -d spire > /backup/spire-$(date +%Y%m%d-%H%M%S).sql
# Backup SPIRE Server data kubectl exec -n spire spire-server-0 -- tar czf - /run/spire/data > /backup/spire-data-$(date +%Y%m%d-%H%M%S).tar.gz
# Upload to S3 aws s3 cp /backup/ s3://company-backups/spire/ --recursive
# Cleanup old backups (keep last 30 days) find /backup -name "*.sql" -mtime +30 -delete find /backup -name "*.tar.gz" -mtime +30 -delete volumeMounts: - name: backup mountPath: /backup restartPolicy: OnFailure volumes: - name: backup persistentVolumeClaim: claimName: backup-pvcPerformance Tuning
apiVersion: v1kind: ConfigMapmetadata: name: envoy-performance namespace: productiondata: envoy.yaml: | static_resources: clusters: - name: service_cluster connect_timeout: 0.25s type: STRICT_DNS lb_policy: LEAST_REQUEST
# Circuit breaker configuration circuit_breakers: thresholds: - priority: DEFAULT max_connections: 1000 max_pending_requests: 1000 max_requests: 1000 max_retries: 3
# Health checking health_checks: - timeout: 5s interval: 10s unhealthy_threshold: 2 healthy_threshold: 2 path: /health
# Connection pooling upstream_connection_options: tcp_keepalive: keepalive_probes: 3 keepalive_time: 10 keepalive_interval: 5
# HTTP/2 optimization typed_extension_protocol_options: envoy.extensions.upstreams.http.v3.HttpProtocolOptions: "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions explicit_http_config: http2_protocol_options: max_concurrent_streams: 100 initial_stream_window_size: 65536 initial_connection_window_size: 1048576Troubleshooting Guide
Common Issues and Solutions
| Issue | Symptoms | Root Cause | Solution |
|---|---|---|---|
| SVID Not Issued | no identity issued | Workload not registered | Check workload registration and selectors |
| mTLS Handshake Failure | tls: bad certificate | Certificate validation failed | Verify trust bundle distribution |
| Policy Denial | 403 Forbidden | Authorization policy mismatch | Review OPA logs and policy rules |
| High Latency | Slow response times | Policy evaluation overhead | Optimize policy rules, enable caching |
| Memory Pressure | OOM kills | Large policy bundles | Implement policy sharding |
Debug Commands
# Check SPIRE Server healthkubectl exec -n spire spire-server-0 -- \ /opt/spire/bin/spire-server healthcheck
# List registered workloadskubectl exec -n spire spire-server-0 -- \ /opt/spire/bin/spire-server entry list
# Debug workload attestationkubectl exec -n production frontend-pod -- \ /opt/spire/bin/spire-agent api fetch x509 \ -socketPath /spiffe-workload-api/spire-agent.sock
# Check Envoy configurationkubectl exec -n production frontend-pod -c envoy -- \ curl -s localhost:15000/config_dump | jq .
# Validate OPA policieskubectl exec -n production opa-pod -- \ opa test /policiesSecurity Best Practices
Defense in Depth Strategy
graph TB
subgraph "Security Layers"
L1[Network Security]
L2[Transport Security]
L3[Application Security]
L4[Data Security]
L5[Operational Security]
end
subgraph "Controls"
C1[Firewalls & Network Policies]
C2[mTLS & Encryption]
C3[Authentication & Authorization]
C4[Encryption at Rest]
C5[Audit & Monitoring]
end
L1 --> C1
L2 --> C2
L3 --> C3
L4 --> C4
L5 --> C5
style L2 fill:#f96,stroke:#333,stroke-width:2px
style C2 fill:#9f9,stroke:#333,stroke-width:2px
Security Checklist
- Enable mTLS for all service communication
- Implement strict workload identity verification
- Configure least-privilege authorization policies
- Enable comprehensive audit logging
- Implement rate limiting and circuit breaking
- Regular security scanning of container images
- Automated certificate rotation (< 24 hours)
- Network segmentation with policies
- Encrypted secrets management
- Regular security audits and penetration testing
Conclusion
Implementing a secure service mesh with SPIFFE/SPIRE provides a robust foundation for zero-trust security in microservices architectures. The combination of cryptographic workload identity, policy-based authorization, and comprehensive observability creates a defense-in-depth strategy that significantly enhances your security posture.
Key takeaways:
- Identity-First Security: Every workload has a cryptographically verifiable identity
- Policy as Code: Authorization rules are version-controlled and auditable
- Automated Security: Certificate rotation and policy updates happen automatically
- Observable Security: Rich metrics and logs provide security visibility
- Scalable Architecture: Designed for high availability and performance
By following this implementation guide and adapting it to your specific requirements, you can build a production-ready secure service mesh that provides both strong security guarantees and operational flexibility.