Introduction: From VMs to Kubernetes-Native Zero Trust
In my previous post on building a secure service mesh without Kubernetes, I demonstrated how to implement SPIFFE/SPIRE on traditional VMs. Today, we’re taking that knowledge to the cloud-native world with a comprehensive guide to deploying SPIFFE/SPIRE natively on Kubernetes.
While the core concepts remain the same—cryptographic workload identities, attestation, and zero-trust networking—the Kubernetes implementation brings unique advantages: dynamic workload management, native integration with Kubernetes primitives, and seamless scaling. This guide bridges the gap between traditional infrastructure and cloud-native security.
Why SPIFFE/SPIRE on Kubernetes?
Before diving into implementation, let’s understand why SPIFFE/SPIRE has become the de facto standard for workload identity in Kubernetes:
The Identity Challenge in Kubernetes
graph TD subgraph "Traditional Approach" A[Pod A] -->|IP-based trust| B[Pod B] B -->|Static credentials| C[Database] A -->|Hardcoded secrets| D[External API] end
subgraph "SPIFFE/SPIRE Approach" E[Pod A<br/>SPIFFE ID: spiffe://domain/ns/prod/sa/frontend] F[Pod B<br/>SPIFFE ID: spiffe://domain/ns/prod/sa/backend] G[Database<br/>SPIFFE ID: spiffe://domain/ns/data/sa/postgres] H[External API]
E -->|mTLS with SVID| F F -->|mTLS with SVID| G E -->|JWT SVID| H end
style A fill:#ff9999 style B fill:#ff9999 style E fill:#99ff99 style F fill:#99ff99Key Benefits
- Dynamic Identity Management: Automatic identity issuance and rotation
- Platform Agnostic: Works across clouds, on-premises, and hybrid environments
- Kubernetes Native: Leverages Service Accounts, Namespaces, and other K8s primitives
- Zero Trust by Default: No implicit trust based on network location
- Attestation Flexibility: Multiple methods from K8s tokens to hardware TPMs
Architecture Overview
Let’s understand the SPIFFE/SPIRE architecture in a Kubernetes context:
graph TB subgraph "Control Plane Node" SS[SPIRE Server] ETCD[etcd/PostgreSQL] REG[Registration API] SS --> ETCD SS --> REG end
subgraph "Worker Node 1" SA1[SPIRE Agent] CSI1[SPIFFE CSI Driver] WL1[Workload 1] WL2[Workload 2]
SA1 --> CSI1 CSI1 --> WL1 CSI1 --> WL2 end
subgraph "Worker Node 2" SA2[SPIRE Agent] CSI2[SPIFFE CSI Driver] WL3[Workload 3] WL4[Workload 4]
SA2 --> CSI2 CSI2 --> WL3 CSI2 --> WL4 end
SA1 -.->|Node Attestation| SS SA2 -.->|Node Attestation| SS WL1 -.->|Workload API| SA1 WL2 -.->|Workload API| SA1 WL3 -.->|Workload API| SA2 WL4 -.->|Workload API| SA2Core Components
- SPIRE Server: Central authority that issues SPIFFE IDs and manages trust bundles
- SPIRE Agent: Runs on each node, performs workload attestation
- SPIFFE CSI Driver: Mounts the Workload API socket into pods
- Registration Entries: Define which workloads get which identities
Prerequisites
Before we begin, ensure you have:
# Kubernetes cluster (1.19+)kubectl version --short
# Helm 3helm version --short
# cert-manager (for TLS certificates)kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.3/cert-manager.yaml
# Verify cert-manager is readykubectl wait --for=condition=ready --timeout=300s -n cert-manager pod -l app.kubernetes.io/instance=cert-managerStep 1: Install SPIRE Using Helm
First, let’s add the SPIFFE Helm repository and install SPIRE:
# Add SPIFFE Helm repositoryhelm repo add spiffe https://spiffe.github.io/helm-charts-hardened/helm repo update
# Create namespacekubectl create namespace spire-system
# Install SPIRE with production-ready configurationcat <<EOF > spire-values.yamlglobal: spire: # Your trust domain - change this! trustDomain: "prod.example.com" # Bundle endpoint for federation bundleEndpoint: address: "0.0.0.0" port: 8443
spire-server: # High availability configuration replicaCount: 1 # Increase for HA
controllerManager: enabled: true
nodeAttestor: k8sPsat: enabled: true
dataStore: sql: databaseType: sqlite3 connectionString: "/run/spire/data/datastore.sqlite3"
# For production, use PostgreSQL: # dataStore: # sql: # databaseType: postgres # connectionString: "dbname=spire user=spire host=postgres password=\${DBPASSWORD}"
keyManager: disk: enabled: true
upstreamAuthority: disk: enabled: true cert: "/run/spire/ca/ca.crt" key: "/run/spire/ca/ca.key"
ca: subject: country: ["US"] organization: ["Example Corp"] commonName: "SPIRE Server CA"
spire-agent: # Run on all nodes nodeSelector: {}
server: address: "spire-server.spire-system" port: 8081
# Enable Workload API for all pods socketPath: "/run/spire/agent-sockets/spire-agent.sock"
# Health checks healthChecks: enabled: true port: 9982
# SPIFFE CSI Driverspiffe-csi-driver: enabled: true
# SPIFFE OIDC Discovery Providerspiffe-oidc-discovery-provider: enabled: true
config: domains: - "oidc-discovery.example.com"EOF
# Install SPIREhelm upgrade --install spire spiffe/spire \ --namespace spire-system \ --values spire-values.yaml \ --waitStep 2: Verify SPIRE Installation
Let’s verify that SPIRE is running correctly:
# Check podskubectl get pods -n spire-system
# Expected output:# NAME READY STATUS RESTARTS AGE# spire-server-0 2/2 Running 0 2m# spire-agent-xxxxx 1/1 Running 0 2m# spiffe-csi-driver-xxxxx 1/1 Running 0 2m# spiffe-oidc-discovery-provider-xxx 1/1 Running 0 2m
# Check SPIRE Server healthkubectl exec -n spire-system spire-server-0 -c spire-server -- \ /opt/spire/bin/spire-server healthcheck
# Check SPIRE Agent health on a nodekubectl exec -n spire-system -it $(kubectl get pods -n spire-system -l app=spire-agent -o jsonpath='{.items[0].metadata.name}') -- \ /opt/spire/bin/spire-agent healthcheckStep 3: Configure Workload Registration
Now let’s register workloads. We’ll use the Kubernetes Workload Registrar for automatic registration:
apiVersion: spire.spiffe.io/v1alpha1kind: ClusterSPIFFEIDmetadata: name: default-workloadsspec: # SPIFFE ID template spiffeIDTemplate: "spiffe://{{ .TrustDomain }}/ns/{{ .PodMeta.Namespace }}/sa/{{ .PodSpec.ServiceAccountName }}"
# Pod selector podSelector: matchLabels: spiffe: "enabled"
# Workload selector for the agent workloadSelectorTemplates: - "k8s:ns:{{ .PodMeta.Namespace }}" - "k8s:sa:{{ .PodSpec.ServiceAccountName }}"
# Optional: DNS names for the SVID dnsNameTemplates: - "{{ .PodMeta.Name }}.{{ .PodMeta.Namespace }}.svc.cluster.local"
# TTL for the SVID ttl: 3600---# More specific registration for critical workloadsapiVersion: spire.spiffe.io/v1alpha1kind: ClusterSPIFFEIDmetadata: name: database-workloadsspec: spiffeIDTemplate: "spiffe://{{ .TrustDomain }}/ns/{{ .PodMeta.Namespace }}/sa/{{ .PodSpec.ServiceAccountName }}/{{ .PodMeta.Name }}"
namespaceSelector: matchNames: - "production" - "staging"
podSelector: matchLabels: app: "postgresql"
workloadSelectorTemplates: - "k8s:ns:{{ .PodMeta.Namespace }}" - "k8s:sa:{{ .PodSpec.ServiceAccountName }}" - "k8s:pod-name:{{ .PodMeta.Name }}"
# Federates with these trust domains federatesWith: - "partner.example.com" - "cloud.example.com"Apply the registration:
kubectl apply -f workload-registration.yamlStep 4: Deploy a Sample Application with SPIFFE Identity
Let’s deploy a sample application that uses SPIFFE identities:
apiVersion: v1kind: ServiceAccountmetadata: name: frontend namespace: default---apiVersion: v1kind: ServiceAccountmetadata: name: backend namespace: default---apiVersion: apps/v1kind: Deploymentmetadata: name: frontend namespace: defaultspec: replicas: 1 selector: matchLabels: app: frontend template: metadata: labels: app: frontend spiffe: enabled spec: serviceAccountName: frontend containers: - name: frontend image: spiffe/spire-examples:latest command: ["/opt/spire-examples/spiffe-workload"] env: - name: SPIFFE_ENDPOINT_SOCKET value: "unix:///spiffe-workload-api/spire-agent.sock" volumeMounts: - name: spiffe-workload-api mountPath: /spiffe-workload-api readOnly: true volumes: - name: spiffe-workload-api csi: driver: "csi.spiffe.io" readOnly: true---apiVersion: apps/v1kind: Deploymentmetadata: name: backend namespace: defaultspec: replicas: 1 selector: matchLabels: app: backend template: metadata: labels: app: backend spiffe: enabled spec: serviceAccountName: backend containers: - name: backend image: spiffe/spire-examples:latest command: ["/opt/spire-examples/spiffe-workload"] env: - name: SPIFFE_ENDPOINT_SOCKET value: "unix:///spiffe-workload-api/spire-agent.sock" volumeMounts: - name: spiffe-workload-api mountPath: /spiffe-workload-api readOnly: true volumes: - name: spiffe-workload-api csi: driver: "csi.spiffe.io" readOnly: trueDeploy the application:
kubectl apply -f sample-app.yaml
# Wait for pods to be readykubectl wait --for=condition=ready pod -l app=frontend --timeout=60skubectl wait --for=condition=ready pod -l app=backend --timeout=60sStep 5: Verify Workload Identity
Let’s verify that our workloads have received their SPIFFE identities:
# Check frontend identitykubectl exec -it $(kubectl get pod -l app=frontend -o jsonpath='{.items[0].metadata.name}') -- \ /opt/spire-examples/spiffe-workload get-svid
# Expected output:# SPIFFE ID: spiffe://prod.example.com/ns/default/sa/frontend# SVID Valid After: 2025-01-27 10:00:00 +0000 UTC# SVID Valid Until: 2025-01-27 11:00:00 +0000 UTC# CA #1 Valid After: 2025-01-27 00:00:00 +0000 UTC# CA #1 Valid Until: 2026-01-27 00:00:00 +0000 UTC
# Check backend identitykubectl exec -it $(kubectl get pod -l app=backend -o jsonpath='{.items[0].metadata.name}') -- \ /opt/spire-examples/spiffe-workload get-svidStep 6: Implement mTLS Between Workloads
Now let’s implement mutual TLS between our workloads using SPIFFE identities:
// mtls-client.go - Frontend calling Backendpackage main
import ( "context" "crypto/tls" "fmt" "io" "net/http"
"github.com/spiffe/go-spiffe/v2/spiffeid" "github.com/spiffe/go-spiffe/v2/spiffetls" "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig" "github.com/spiffe/go-spiffe/v2/workloadapi")
func main() { ctx := context.Background()
// Create a Workload API client client, err := workloadapi.New(ctx, workloadapi.WithAddr("unix:///spiffe-workload-api/spire-agent.sock")) if err != nil { panic(err) } defer client.Close()
// Create TLS config for mTLS backendID := spiffeid.Must("prod.example.com", "ns", "default", "sa", "backend") tlsConfig := tlsconfig.MTLSClientConfig(client, client, tlsconfig.AuthorizeID(backendID))
// Create HTTP client with SPIFFE TLS httpClient := &http.Client{ Transport: &http.Transport{ TLSClientConfig: tlsConfig, }, }
// Make authenticated request resp, err := httpClient.Get("https://backend.default.svc.cluster.local:8443/api/data") if err != nil { panic(err) } defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body) fmt.Printf("Response: %s\n", body)}// mtls-server.go - Backend serverpackage main
import ( "context" "fmt" "net/http"
"github.com/spiffe/go-spiffe/v2/spiffeid" "github.com/spiffe/go-spiffe/v2/spiffetls" "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig" "github.com/spiffe/go-spiffe/v2/workloadapi")
func main() { ctx := context.Background()
// Create Workload API client client, err := workloadapi.New(ctx, workloadapi.WithAddr("unix:///spiffe-workload-api/spire-agent.sock")) if err != nil { panic(err) } defer client.Close()
// Create TLS config that only accepts frontend frontendID := spiffeid.Must("prod.example.com", "ns", "default", "sa", "frontend") tlsConfig := tlsconfig.MTLSServerConfig(client, client, tlsconfig.AuthorizeID(frontendID))
// Create HTTPS server server := &http.Server{ Addr: ":8443", TLSConfig: tlsConfig, Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { // Extract peer identity if r.TLS != nil && len(r.TLS.PeerCertificates) > 0 { id, err := spiffeid.FromURI(r.TLS.PeerCertificates[0].URIs[0]) if err == nil { fmt.Fprintf(w, "Hello %s! Here's your secure data.\n", id) return } } http.Error(w, "Unauthorized", http.StatusUnauthorized) }), }
fmt.Println("Server listening on :8443...") if err := server.ListenAndServeTLS("", ""); err != nil { panic(err) }}Step 7: Advanced Configuration
High Availability Setup
For production environments, configure SPIRE Server for high availability:
spire-server: replicaCount: 3
dataStore: sql: databaseType: postgres connectionString: "host=postgres-ha.data.svc.cluster.local dbname=spire user=spire sslmode=require"
notifier: k8sbundle: enabled: true namespace: "spire-system"
# Anti-affinity for spreading across nodes affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - spire-server topologyKey: kubernetes.io/hostnameFederation Configuration
To enable federation between trust domains:
apiVersion: v1kind: ConfigMapmetadata: name: spire-federation namespace: spire-systemdata: federation.conf: | federates_with { "partner.example.com" { bundle_endpoint_address = "spire-bundle.partner.example.com" bundle_endpoint_port = 8443 bundle_endpoint_spiffe_id = "spiffe://partner.example.com/spire/server" } }Custom Node Attestor
For cloud environments, use platform-specific attestors:
# AWS node attestorspire-server: nodeAttestor: aws_iid: enabled: true access_key_id: "${AWS_ACCESS_KEY_ID}" secret_access_key: "${AWS_SECRET_ACCESS_KEY}" skip_block_device: true
spire-agent: nodeAttestor: aws_iid: enabled: true
# GCP node attestorspire-server: nodeAttestor: gcp_iit: enabled: true projectid_allow_list: ["my-project-1", "my-project-2"]
spire-agent: nodeAttestor: gcp_iit: enabled: trueStep 8: Integration with Service Mesh
Istio Integration
To use SPIRE as Istio’s certificate provider:
# Install Istio with custom CAistioctl install --set values.pilot.env.EXTERNAL_CA=ISTIOD_RA_KUBERNETES_API \ --set values.global.caAddress=spire-server.spire-system.svc:8081 \ --set values.global.meshID=spiffe://prod.example.com \ --set values.pilot.env.PILOT_ENABLE_WORKLOAD_ENTRY_AUTOREGISTRATION=trueLinkerd Integration
For Linkerd, configure the identity issuer:
apiVersion: cert-manager.io/v1kind: Certificatemetadata: name: linkerd-identity-issuer namespace: linkerdspec: secretName: linkerd-identity-issuer duration: 48h renewBefore: 25h issuerRef: name: spire-ca kind: ClusterIssuer commonName: identity.linkerd.cluster.local dnsNames: - identity.linkerd.cluster.local isCA: true privateKey: algorithm: ECDSA usages: - cert sign - crl sign - server auth - client authStep 9: Monitoring and Observability
Configure Prometheus metrics for SPIRE:
apiVersion: v1kind: ConfigMapmetadata: name: prometheus-config namespace: monitoringdata: prometheus.yml: | scrape_configs: - job_name: 'spire-server' static_configs: - targets: ['spire-server.spire-system:9988'] metric_relabel_configs: - source_labels: [__name__] regex: 'spire_server_.*' action: keep
- job_name: 'spire-agent' kubernetes_sd_configs: - role: pod namespaces: names: ['spire-system'] relabel_configs: - source_labels: [__meta_kubernetes_pod_label_app] regex: spire-agent action: keep - source_labels: [__meta_kubernetes_pod_ip] target_label: __address__ replacement: '${1}:9988'Step 10: Production Best Practices
Security Hardening
- Network Policies: Restrict SPIRE Server access
apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: spire-server-ingress namespace: spire-systemspec: podSelector: matchLabels: app: spire-server policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: name: spire-system - podSelector: matchLabels: app: spire-agent ports: - protocol: TCP port: 8081- RBAC Configuration: Limit SPIRE permissions
apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: spire-server-limitedrules: - apiGroups: [""] resources: ["nodes", "pods"] verbs: ["get", "list", "watch"] - apiGroups: ["authentication.k8s.io"] resources: ["tokenreviews"] verbs: ["create"]Performance Tuning
spire-server: resources: requests: memory: "512Mi" cpu: "100m" limits: memory: "2Gi" cpu: "1000m"
# Cache configuration config: server: cache_size: 10000
spire-agent: resources: requests: memory: "128Mi" cpu: "50m" limits: memory: "512Mi" cpu: "500m"
# Sync interval optimization config: agent: sync_interval: "10s"Troubleshooting Common Issues
Issue 1: Workload Not Receiving SVID
# Check registration entrieskubectl exec -n spire-system spire-server-0 -c spire-server -- \ /opt/spire/bin/spire-server entry list
# Check agent logskubectl logs -n spire-system $(kubectl get pods -n spire-system -l app=spire-agent -o jsonpath='{.items[0].metadata.name}')
# Verify CSI driverkubectl get csidriver csi.spiffe.iokubectl get csinodesIssue 2: Federation Not Working
# Check bundle endpointkubectl exec -n spire-system spire-server-0 -c spire-server -- \ /opt/spire/bin/spire-server bundle show -format spiffe
# Test bundle endpoint connectivitykubectl exec -n spire-system spire-server-0 -c spire-server -- \ curl -k https://localhost:8443/bundleIssue 3: Performance Issues
# Check metricskubectl exec -n spire-system spire-server-0 -c spire-server -- \ curl -s localhost:9988/metrics | grep spire_server_
# Analyze datastore performancekubectl exec -n spire-system spire-server-0 -c spire-server -- \ /opt/spire/bin/spire-server entry countMigration from VM-Based SPIRE
If you’re migrating from my VM-based SPIFFE/SPIRE setup, here are the key differences:
- Node Attestation: Use Kubernetes PSAT instead of join tokens
- Workload Attestation: Kubernetes selectors instead of Unix selectors
- Registration: Automatic via CRDs instead of manual CLI
- Scaling: Horizontal pod autoscaling instead of manual node addition
- Storage: Consider managed databases instead of local SQLite
Migration checklist:
- Export existing registration entries
- Update SPIFFE ID structure for Kubernetes namespaces
- Migrate trust bundles if keeping the same trust domain
- Update workload code to use CSI driver paths
- Test federation with existing infrastructure
Conclusion and Next Steps
You now have a production-ready SPIFFE/SPIRE deployment on Kubernetes that provides:
- ✅ Cryptographic workload identities
- ✅ Automatic certificate rotation
- ✅ Zero-trust pod-to-pod communication
- ✅ Integration with Kubernetes primitives
- ✅ Foundation for service mesh adoption
In upcoming posts, we’ll explore:
- Deep dive into SPIRE Controller Manager and CRDs
- Multi-cluster federation patterns
- Advanced attestation with TPM and cloud providers
- Performance optimization for large-scale deployments
- Integration with Istio and Linkerd service meshes
The journey from traditional PKI to cloud-native workload identity is complete. Your Kubernetes cluster now has a robust, scalable, and secure identity foundation ready for zero-trust networking.
Additional Resources
Have questions or ran into issues? Feel free to reach out or check the SPIFFE Slack community for help.