Kubernetes Workload Identity with SPIFFE: Pod-to-Pod mTLS Implementation Guide

Introduction: Beyond Network Policies to Cryptographic Identity#

Traditional Kubernetes security relies on network policies and service accounts, but these approaches fall short in truly zero-trust environments. Network policies depend on IP addresses that change dynamically, while service accounts were designed for cloud provider authentication, not workload-to-workload communication.

Enter SPIFFE (Secure Production Identity Framework For Everyone) and its implementation SPIRE, which provide cryptographic identities to every workload. In this comprehensive guide, we’ll implement pod-to-pod mutual TLS (mTLS) using SPIFFE identities, moving from theory to production-ready code.

The Problem with Traditional Pod Communication#

Let’s visualize the security challenges in Kubernetes:

1
graph LR
2
    subgraph "Traditional Approach"
3
        A[Frontend Pod<br/>IP: 10.0.1.5] -->|Plain HTTP| B[Backend Pod<br/>IP: 10.0.2.8]
4
        B -->|Plain HTTP| C[Database Pod<br/>IP: 10.0.3.2]
5

6
        D[Attacker Pod<br/>IP: 10.0.1.9] -.->|Can intercept| B
7
        D -.->|Can impersonate| A
8
    end
9

10
    subgraph "SPIFFE/SPIRE Approach"
11
        E[Frontend Pod<br/>ID: spiffe://prod/frontend] -->|mTLS| F[Backend Pod<br/>ID: spiffe://prod/backend]
12
        F -->|mTLS| G[Database Pod<br/>ID: spiffe://prod/db]
13

14
        H[Attacker Pod<br/>No SPIFFE ID] -.->|Rejected| F
15
    end
16

17
    style D fill:#ff6666
18
    style H fill:#ff6666
19
    style E fill:#66ff66
20
    style F fill:#66ff66
21
    style G fill:#66ff66

Prerequisites and Setup#

Before implementing mTLS, ensure you have SPIFFE/SPIRE installed (covered in my previous post). Additionally, we’ll need:

1
# Verify SPIRE is running
2
kubectl get pods -n spire-system
3

4
# Check SPIFFE CSI Driver
5
kubectl get csidriver csi.spiffe.io
6

7
# Create a demo namespace
8
kubectl create namespace spiffe-demo
9
kubectl label namespace spiffe-demo pod-security.kubernetes.io/enforce=restricted

Understanding the SPIFFE Workload API#

The Workload API is the interface between workloads and SPIRE:

1
sequenceDiagram
2
    participant W as Workload
3
    participant CSI as CSI Driver
4
    participant SA as SPIRE Agent
5
    participant SS as SPIRE Server
6

7
    W->>CSI: Mount /spiffe-workload-api
8
    CSI->>SA: Connect to Unix Socket
9
    SA->>SS: Request SVID
10
    SS-->>SA: Issue SVID
11
    SA-->>W: Deliver SVID via API
12
    W->>W: Use SVID for mTLS
13

14
    Note over W,SS: SVIDs auto-rotate before expiry

Step 1: Deploy Workloads with SPIFFE CSI Driver#

Let’s create two services that will communicate via mTLS:

Frontend Service#

1
apiVersion: v1
2
kind: ServiceAccount
3
metadata:
4
  name: frontend
5
  namespace: spiffe-demo
6
---
7
apiVersion: v1
8
kind: ConfigMap
9
metadata:
10
  name: frontend-config
11
  namespace: spiffe-demo
12
data:
13
  main.go: |
14
    package main
15

16
    import (
17
        "context"
18
        "crypto/tls"
19
        "encoding/json"
20
        "fmt"
21
        "io"
22
        "log"
23
        "net/http"
24
        "os"
25
        "time"
26

27
        "github.com/spiffe/go-spiffe/v2/spiffeid"
28
        "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig"
29
        "github.com/spiffe/go-spiffe/v2/workloadapi"
30
    )
31

32
    type Response struct {
33
        Message    string    `json:"message"`
34
        ClientID   string    `json:"client_id"`
35
        ServerID   string    `json:"server_id"`
36
        Timestamp  time.Time `json:"timestamp"`
37
    }
38

39
    func main() {
40
        ctx := context.Background()
41

42
        // Create Workload API client using SPIFFE CSI Driver socket
43
        socketPath := "unix:///spiffe-workload-api/spire-agent.sock"
44
        client, err := workloadapi.New(ctx, workloadapi.WithAddr(socketPath))
45
        if err != nil {
46
            log.Fatalf("Unable to create workload API client: %v", err)
47
        }
48
        defer client.Close()
49

50
        // Get our own SPIFFE ID
51
        x509Context, err := client.FetchX509Context(ctx)
52
        if err != nil {
53
            log.Fatalf("Failed to fetch X509 context: %v", err)
54
        }
55

56
        myID := x509Context.DefaultSVID().ID.String()
57
        log.Printf("Frontend service started with SPIFFE ID: %s", myID)
58

59
        // Create HTTP client with mTLS
60
        backendID := spiffeid.RequireFromString("spiffe://prod.example.com/ns/spiffe-demo/sa/backend")
61
        tlsConfig := tlsconfig.MTLSClientConfig(client, client, tlsconfig.AuthorizeID(backendID))
62

63
        httpClient := &http.Client{
64
            Transport: &http.Transport{
65
                TLSClientConfig: tlsConfig,
66
            },
67
        }
68

69
        // Serve frontend API
70
        mux := http.NewServeMux()
71
        mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
72
            // Call backend service
73
            backendURL := os.Getenv("BACKEND_URL")
74
            if backendURL == "" {
75
                backendURL = "https://backend.spiffe-demo.svc.cluster.local:8443/data"
76
            }
77

78
            resp, err := httpClient.Get(backendURL)
79
            if err != nil {
80
                http.Error(w, fmt.Sprintf("Backend call failed: %v", err), http.StatusInternalServerError)
81
                return
82
            }
83
            defer resp.Body.Close()
84

85
            body, _ := io.ReadAll(resp.Body)
86

87
            response := map[string]interface{}{
88
                "frontend_id": myID,
89
                "backend_response": json.RawMessage(body),
90
                "timestamp": time.Now(),
91
            }
92

93
            w.Header().Set("Content-Type", "application/json")
94
            json.NewEncoder(w).Encode(response)
95
        })
96

97
        mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
98
            w.WriteHeader(http.StatusOK)
99
            w.Write([]byte("healthy"))
100
        })
101

102
        log.Println("Frontend listening on :8080")
103
        if err := http.ListenAndServe(":8080", mux); err != nil {
104
            log.Fatalf("Failed to start server: %v", err)
105
        }
106
    }
107
---
108
apiVersion: apps/v1
109
kind: Deployment
110
metadata:
111
  name: frontend
112
  namespace: spiffe-demo
113
spec:
114
  replicas: 2
115
  selector:
116
    matchLabels:
117
      app: frontend
118
  template:
119
    metadata:
120
      labels:
121
        app: frontend
122
        spiffe: enabled
123
    spec:
124
      serviceAccountName: frontend
125
      containers:
126
        - name: frontend
127
          image: golang:1.21-alpine
128
          command: ["sh", "-c"]
129
          args:
130
            - |
131
              apk add --no-cache git
132
              go mod init frontend
133
              go get github.com/spiffe/go-spiffe/v2
134
              go run /app/main.go
135
          env:
136
            - name: BACKEND_URL
137
              value: "https://backend.spiffe-demo.svc.cluster.local:8443/data"
138
          ports:
139
            - containerPort: 8080
140
              name: http
141
          volumeMounts:
142
            - name: app-code
143
              mountPath: /app
144
            - name: spiffe-workload-api
145
              mountPath: /spiffe-workload-api
146
              readOnly: true
147
          resources:
148
            requests:
149
              memory: "128Mi"
150
              cpu: "100m"
151
            limits:
152
              memory: "256Mi"
153
              cpu: "200m"
154
          livenessProbe:
155
            httpGet:
156
              path: /health
157
              port: 8080
158
            initialDelaySeconds: 30
159
            periodSeconds: 10
160
          readinessProbe:
161
            httpGet:
162
              path: /health
163
              port: 8080
164
            initialDelaySeconds: 20
165
            periodSeconds: 5
166
      volumes:
167
        - name: app-code
168
          configMap:
169
            name: frontend-config
170
        - name: spiffe-workload-api
171
          csi:
172
            driver: "csi.spiffe.io"
173
            readOnly: true
174
---
175
apiVersion: v1
176
kind: Service
177
metadata:
178
  name: frontend
179
  namespace: spiffe-demo
180
spec:
181
  selector:
182
    app: frontend
183
  ports:
184
    - port: 80
185
      targetPort: 8080
186
      name: http

Backend Service#

1
apiVersion: v1
2
kind: ServiceAccount
3
metadata:
4
  name: backend
5
  namespace: spiffe-demo
6
---
7
apiVersion: v1
8
kind: ConfigMap
9
metadata:
10
  name: backend-config
11
  namespace: spiffe-demo
12
data:
13
  main.go: |
14
    package main
15

16
    import (
17
        "context"
18
        "encoding/json"
19
        "fmt"
20
        "log"
21
        "net/http"
22
        "time"
23

24
        "github.com/spiffe/go-spiffe/v2/spiffeid"
25
        "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig"
26
        "github.com/spiffe/go-spiffe/v2/workloadapi"
27
    )
28

29
    type DataResponse struct {
30
        Data      string    `json:"data"`
31
        ServerID  string    `json:"server_id"`
32
        ClientID  string    `json:"client_id"`
33
        Timestamp time.Time `json:"timestamp"`
34
        Metadata  map[string]string `json:"metadata"`
35
    }
36

37
    func main() {
38
        ctx := context.Background()
39

40
        // Create Workload API client
41
        socketPath := "unix:///spiffe-workload-api/spire-agent.sock"
42
        client, err := workloadapi.New(ctx, workloadapi.WithAddr(socketPath))
43
        if err != nil {
44
            log.Fatalf("Unable to create workload API client: %v", err)
45
        }
46
        defer client.Close()
47

48
        // Get our SPIFFE ID
49
        x509Context, err := client.FetchX509Context(ctx)
50
        if err != nil {
51
            log.Fatalf("Failed to fetch X509 context: %v", err)
52
        }
53

54
        myID := x509Context.DefaultSVID().ID.String()
55
        log.Printf("Backend service started with SPIFFE ID: %s", myID)
56

57
        // Create mTLS server config - only accept frontend service
58
        frontendID := spiffeid.RequireFromString("spiffe://prod.example.com/ns/spiffe-demo/sa/frontend")
59
        tlsConfig := tlsconfig.MTLSServerConfig(client, client, tlsconfig.AuthorizeID(frontendID))
60

61
        // Create HTTPS server
62
        mux := http.NewServeMux()
63

64
        mux.HandleFunc("/data", func(w http.ResponseWriter, r *http.Request) {
65
            // Extract client identity from TLS connection
66
            var clientID string
67
            if r.TLS != nil && len(r.TLS.PeerCertificates) > 0 {
68
                cert := r.TLS.PeerCertificates[0]
69
                if len(cert.URIs) > 0 {
70
                    id, err := spiffeid.FromURI(cert.URIs[0])
71
                    if err == nil {
72
                        clientID = id.String()
73
                    }
74
                }
75
            }
76

77
            log.Printf("Request from client: %s", clientID)
78

79
            response := DataResponse{
80
                Data:      "Secure data from backend service",
81
                ServerID:  myID,
82
                ClientID:  clientID,
83
                Timestamp: time.Now(),
84
                Metadata: map[string]string{
85
                    "version": "1.0",
86
                    "environment": "production",
87
                    "tls_version": r.TLS.Version,
88
                    "cipher_suite": tls.CipherSuiteName(r.TLS.CipherSuite),
89
                },
90
            }
91

92
            w.Header().Set("Content-Type", "application/json")
93
            json.NewEncoder(w).Encode(response)
94
        })
95

96
        mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
97
            w.WriteHeader(http.StatusOK)
98
            json.NewEncoder(w).Encode(map[string]string{"status": "healthy"})
99
        })
100

101
        server := &http.Server{
102
            Addr:      ":8443",
103
            Handler:   mux,
104
            TLSConfig: tlsConfig,
105
        }
106

107
        log.Println("Backend listening on :8443 with mTLS")
108
        // ListenAndServeTLS with empty cert/key paths because TLS config provides them
109
        if err := server.ListenAndServeTLS("", ""); err != nil {
110
            log.Fatalf("Failed to start server: %v", err)
111
        }
112
    }
113
---
114
apiVersion: apps/v1
115
kind: Deployment
116
metadata:
117
  name: backend
118
  namespace: spiffe-demo
119
spec:
120
  replicas: 3
121
  selector:
122
    matchLabels:
123
      app: backend
124
  template:
125
    metadata:
126
      labels:
127
        app: backend
128
        spiffe: enabled
129
    spec:
130
      serviceAccountName: backend
131
      containers:
132
        - name: backend
133
          image: golang:1.21-alpine
134
          command: ["sh", "-c"]
135
          args:
136
            - |
137
              apk add --no-cache git
138
              go mod init backend
139
              go get github.com/spiffe/go-spiffe/v2
140
              go run /app/main.go
141
          ports:
142
            - containerPort: 8443
143
              name: https
144
          volumeMounts:
145
            - name: app-code
146
              mountPath: /app
147
            - name: spiffe-workload-api
148
              mountPath: /spiffe-workload-api
149
              readOnly: true
150
          resources:
151
            requests:
152
              memory: "128Mi"
153
              cpu: "100m"
154
            limits:
155
              memory: "256Mi"
156
              cpu: "200m"
157
          livenessProbe:
158
            httpGet:
159
              path: /health
160
              port: 8443
161
              scheme: HTTPS
162
            initialDelaySeconds: 30
163
            periodSeconds: 10
164
          readinessProbe:
165
            httpGet:
166
              path: /health
167
              port: 8443
168
              scheme: HTTPS
169
            initialDelaySeconds: 20
170
            periodSeconds: 5
171
      volumes:
172
        - name: app-code
173
          configMap:
174
            name: backend-config
175
        - name: spiffe-workload-api
176
          csi:
177
            driver: "csi.spiffe.io"
178
            readOnly: true
179
---
180
apiVersion: v1
181
kind: Service
182
metadata:
183
  name: backend
184
  namespace: spiffe-demo
185
spec:
186
  selector:
187
    app: backend
188
  ports:
189
    - port: 8443
190
      targetPort: 8443
191
      name: https

Step 2: Register Workloads with SPIRE#

Create ClusterSPIFFEID resources for automatic registration:

1
apiVersion: spire.spiffe.io/v1alpha1
2
kind: ClusterSPIFFEID
3
metadata:
4
  name: frontend-workload
5
spec:
6
  spiffeIDTemplate: "spiffe://{{ .TrustDomain }}/ns/{{ .PodMeta.Namespace }}/sa/{{ .PodSpec.ServiceAccountName }}"
7
  podSelector:
8
    matchLabels:
9
      app: frontend
10
  namespaceSelector:
11
    matchNames:
12
      - spiffe-demo
13
  workloadSelectorTemplates:
14
    - "k8s:ns:{{ .PodMeta.Namespace }}"
15
    - "k8s:sa:{{ .PodSpec.ServiceAccountName }}"
16
    - "k8s:pod-label:app:frontend"
17
  dnsNameTemplates:
18
    - "{{ .PodMeta.Name }}.{{ .PodMeta.Namespace }}.svc.cluster.local"
19
    - "frontend.{{ .PodMeta.Namespace }}.svc.cluster.local"
20
  ttl: 3600
21
---
22
apiVersion: spire.spiffe.io/v1alpha1
23
kind: ClusterSPIFFEID
24
metadata:
25
  name: backend-workload
26
spec:
27
  spiffeIDTemplate: "spiffe://{{ .TrustDomain }}/ns/{{ .PodMeta.Namespace }}/sa/{{ .PodSpec.ServiceAccountName }}"
28
  podSelector:
29
    matchLabels:
30
      app: backend
31
  namespaceSelector:
32
    matchNames:
33
      - spiffe-demo
34
  workloadSelectorTemplates:
35
    - "k8s:ns:{{ .PodMeta.Namespace }}"
36
    - "k8s:sa:{{ .PodSpec.ServiceAccountName }}"
37
    - "k8s:pod-label:app:backend"
38
  dnsNameTemplates:
39
    - "{{ .PodMeta.Name }}.{{ .PodMeta.Namespace }}.svc.cluster.local"
40
    - "backend.{{ .PodMeta.Namespace }}.svc.cluster.local"
41
  ttl: 3600

Deploy everything:

1
# Apply workload registrations
2
kubectl apply -f workload-registration.yaml
3

4
# Deploy services
5
kubectl apply -f frontend-deployment.yaml
6
kubectl apply -f backend-deployment.yaml
7

8
# Wait for pods to be ready
9
kubectl wait --for=condition=ready pod -l app=frontend -n spiffe-demo --timeout=300s
10
kubectl wait --for=condition=ready pod -l app=backend -n spiffe-demo --timeout=300s

Step 3: Verify mTLS Communication#

Test the secure communication:

1
# Port-forward to frontend
2
kubectl port-forward -n spiffe-demo svc/frontend 8080:80 &
3

4
# Test the frontend endpoint
5
curl http://localhost:8080 | jq .
6

7
# Expected output:
8
{
9
  "frontend_id": "spiffe://prod.example.com/ns/spiffe-demo/sa/frontend",
10
  "backend_response": {
11
    "data": "Secure data from backend service",
12
    "server_id": "spiffe://prod.example.com/ns/spiffe-demo/sa/backend",
13
    "client_id": "spiffe://prod.example.com/ns/spiffe-demo/sa/frontend",
14
    "timestamp": "2025-01-29T10:30:45Z",
15
    "metadata": {
16
      "version": "1.0",
17
      "environment": "production",
18
      "tls_version": "771",
19
      "cipher_suite": "TLS_AES_128_GCM_SHA256"
20
    }
21
  },
22
  "timestamp": "2025-01-29T10:30:45Z"
23
}

Step 4: Advanced mTLS Patterns#

Pattern 1: Service-to-Service with Multiple Backends#

1
// advanced-client.go - Load balancing across multiple backends
2
package main
3

4
import (
5
    "context"
6
    "crypto/tls"
7
    "net/http"
8
    "sync"
9
    "time"
10

11
    "github.com/spiffe/go-spiffe/v2/spiffeid"
12
    "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig"
13
    "github.com/spiffe/go-spiffe/v2/workloadapi"
14
)
15

16
type SPIFFEClient struct {
17
    client      *workloadapi.Client
18
    httpClients map[string]*http.Client
19
    mu          sync.RWMutex
20
}
21

22
func NewSPIFFEClient(ctx context.Context) (*SPIFFEClient, error) {
23
    client, err := workloadapi.New(ctx, workloadapi.WithAddr("unix:///spiffe-workload-api/spire-agent.sock"))
24
    if err != nil {
25
        return nil, err
26
    }
27

28
    return &SPIFFEClient{
29
        client:      client,
30
        httpClients: make(map[string]*http.Client),
31
    }, nil
32
}
33

34
func (s *SPIFFEClient) GetHTTPClient(targetID string) (*http.Client, error) {
35
    s.mu.RLock()
36
    if client, ok := s.httpClients[targetID]; ok {
37
        s.mu.RUnlock()
38
        return client, nil
39
    }
40
    s.mu.RUnlock()
41

42
    s.mu.Lock()
43
    defer s.mu.Unlock()
44

45
    // Double-check after acquiring write lock
46
    if client, ok := s.httpClients[targetID]; ok {
47
        return client, nil
48
    }
49

50
    // Create new client
51
    id := spiffeid.RequireFromString(targetID)
52
    tlsConfig := tlsconfig.MTLSClientConfig(s.client, s.client, tlsconfig.AuthorizeID(id))
53

54
    client := &http.Client{
55
        Transport: &http.Transport{
56
            TLSClientConfig: tlsConfig,
57
            MaxIdleConns:    10,
58
            IdleConnTimeout: 30 * time.Second,
59
        },
60
        Timeout: 10 * time.Second,
61
    }
62

63
    s.httpClients[targetID] = client
64
    return client, nil
65
}
66

67
// Load balancer implementation
68
type LoadBalancer struct {
69
    spiffeClient *SPIFFEClient
70
    backends     []string
71
    current      int
72
    mu           sync.Mutex
73
}
74

75
func (lb *LoadBalancer) RoundRobinRequest(path string) (*http.Response, error) {
76
    lb.mu.Lock()
77
    backend := lb.backends[lb.current]
78
    lb.current = (lb.current + 1) % len(lb.backends)
79
    lb.mu.Unlock()
80

81
    client, err := lb.spiffeClient.GetHTTPClient("spiffe://prod.example.com/ns/spiffe-demo/sa/backend")
82
    if err != nil {
83
        return nil, err
84
    }
85

86
    return client.Get(backend + path)
87
}

Pattern 2: JWT SVIDs for External Services#

1
// jwt-svid-client.go - Using JWT SVIDs for external APIs
2
package main
3

4
import (
5
    "context"
6
    "encoding/json"
7
    "fmt"
8
    "net/http"
9

10
    "github.com/spiffe/go-spiffe/v2/spiffeid"
11
    "github.com/spiffe/go-spiffe/v2/svid/jwtsvid"
12
    "github.com/spiffe/go-spiffe/v2/workloadapi"
13
)
14

15
func callExternalAPI(ctx context.Context) error {
16
    client, err := workloadapi.New(ctx, workloadapi.WithAddr("unix:///spiffe-workload-api/spire-agent.sock"))
17
    if err != nil {
18
        return err
19
    }
20
    defer client.Close()
21

22
    // Fetch JWT SVID for external service
23
    audience := []string{"https://api.external.com"}
24
    jwtSVID, err := client.FetchJWTSVID(ctx, jwtsvid.Params{
25
        Audience: audience[0],
26
    })
27
    if err != nil {
28
        return err
29
    }
30

31
    // Use JWT in Authorization header
32
    req, _ := http.NewRequest("GET", "https://api.external.com/data", nil)
33
    req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", jwtSVID.Marshal()))
34

35
    resp, err := http.DefaultClient.Do(req)
36
    if err != nil {
37
        return err
38
    }
39
    defer resp.Body.Close()
40

41
    return nil
42
}

Pattern 3: SPIFFE Helper for Legacy Applications#

For applications that can’t be modified to use the Workload API directly:

1
apiVersion: v1
2
kind: ConfigMap
3
metadata:
4
  name: spiffe-helper-config
5
  namespace: spiffe-demo
6
data:
7
  helper.conf: |
8
    agent_address = "/spiffe-workload-api/spire-agent.sock"
9
    cmd = "/app/legacy-app"
10
    cmd_args = ""
11
    cert_dir = "/certs"
12
    add_intermediates = true
13
    renew_signal = "SIGHUP"
14
    svid_file_name = "cert.pem"
15
    svid_key_file_name = "key.pem"
16
    svid_bundle_file_name = "ca.pem"
17
---
18
apiVersion: apps/v1
19
kind: Deployment
20
metadata:
21
  name: legacy-app
22
  namespace: spiffe-demo
23
spec:
24
  replicas: 1
25
  selector:
26
    matchLabels:
27
      app: legacy-app
28
  template:
29
    metadata:
30
      labels:
31
        app: legacy-app
32
        spiffe: enabled
33
    spec:
34
      serviceAccountName: legacy-app
35
      initContainers:
36
        - name: spiffe-helper
37
          image: ghcr.io/spiffe/spiffe-helper:latest
38
          command: ["/opt/spiffe-helper"]
39
          args: ["-config", "/config/helper.conf"]
40
          volumeMounts:
41
            - name: spiffe-workload-api
42
              mountPath: /spiffe-workload-api
43
              readOnly: true
44
            - name: helper-config
45
              mountPath: /config
46
            - name: certs
47
              mountPath: /certs
48
      containers:
49
        - name: legacy-app
50
          image: nginx:alpine
51
          volumeMounts:
52
            - name: certs
53
              mountPath: /etc/nginx/certs
54
              readOnly: true
55
          # Configure nginx to use certificates from /etc/nginx/certs/
56
      volumes:
57
        - name: spiffe-workload-api
58
          csi:
59
            driver: "csi.spiffe.io"
60
            readOnly: true
61
        - name: helper-config
62
          configMap:
63
            name: spiffe-helper-config
64
        - name: certs
65
          emptyDir: {}

Step 5: Production Considerations#

Health Checks and Monitoring#

1
// health-check.go - SVID health monitoring
2
package main
3

4
import (
5
    "context"
6
    "encoding/json"
7
    "net/http"
8
    "time"
9

10
    "github.com/spiffe/go-spiffe/v2/workloadapi"
11
)
12

13
type HealthStatus struct {
14
    Status          string    `json:"status"`
15
    SPIFFEID        string    `json:"spiffe_id"`
16
    CertificateExpiry time.Time `json:"certificate_expiry"`
17
    TimeToRenewal   string    `json:"time_to_renewal"`
18
}
19

20
func healthCheckHandler(client *workloadapi.Client) http.HandlerFunc {
21
    return func(w http.ResponseWriter, r *http.Request) {
22
        ctx := context.Background()
23

24
        x509Context, err := client.FetchX509Context(ctx)
25
        if err != nil {
26
            w.WriteHeader(http.StatusServiceUnavailable)
27
            json.NewEncoder(w).Encode(map[string]string{
28
                "status": "unhealthy",
29
                "error":  err.Error(),
30
            })
31
            return
32
        }
33

34
        svid := x509Context.DefaultSVID()
35
        cert, _ := svid.Certificates[0], svid.PrivateKey
36

37
        status := HealthStatus{
38
            Status:          "healthy",
39
            SPIFFEID:        svid.ID.String(),
40
            CertificateExpiry: cert.NotAfter,
41
            TimeToRenewal:   time.Until(cert.NotAfter).String(),
42
        }
43

44
        // Warn if certificate expires soon
45
        if time.Until(cert.NotAfter) < 30*time.Minute {
46
            status.Status = "warning"
47
        }
48

49
        w.Header().Set("Content-Type", "application/json")
50
        json.NewEncoder(w).Encode(status)
51
    }
52
}

Graceful SVID Rotation#

1
// svid-rotation.go - Handle SVID rotation gracefully
2
package main
3

4
import (
5
    "context"
6
    "crypto/tls"
7
    "log"
8
    "net/http"
9
    "sync"
10
    "time"
11

12
    "github.com/spiffe/go-spiffe/v2/workloadapi"
13
)
14

15
type RotatingTLSConfig struct {
16
    client    *workloadapi.Client
17
    tlsConfig *tls.Config
18
    mu        sync.RWMutex
19
    ctx       context.Context
20
    cancel    context.CancelFunc
21
}
22

23
func NewRotatingTLSConfig(ctx context.Context, client *workloadapi.Client) (*RotatingTLSConfig, error) {
24
    ctx, cancel := context.WithCancel(ctx)
25

26
    rtc := &RotatingTLSConfig{
27
        client: client,
28
        ctx:    ctx,
29
        cancel: cancel,
30
    }
31

32
    // Initial TLS config
33
    if err := rtc.updateTLSConfig(); err != nil {
34
        cancel()
35
        return nil, err
36
    }
37

38
    // Watch for SVID updates
39
    go rtc.watchSVIDRotation()
40

41
    return rtc, nil
42
}
43

44
func (rtc *RotatingTLSConfig) updateTLSConfig() error {
45
    x509Context, err := rtc.client.FetchX509Context(rtc.ctx)
46
    if err != nil {
47
        return err
48
    }
49

50
    tlsConfig := &tls.Config{
51
        GetCertificate: func(*tls.ClientHelloInfo) (*tls.Certificate, error) {
52
            svid := x509Context.DefaultSVID()
53
            cert := &tls.Certificate{
54
                Certificate: [][]byte{},
55
                PrivateKey:  svid.PrivateKey,
56
            }
57
            for _, c := range svid.Certificates {
58
                cert.Certificate = append(cert.Certificate, c.Raw)
59
            }
60
            return cert, nil
61
        },
62
        ClientAuth: tls.RequireAndVerifyClientCert,
63
        GetClientCertificate: func(*tls.CertificateRequestInfo) (*tls.Certificate, error) {
64
            svid := x509Context.DefaultSVID()
65
            cert := &tls.Certificate{
66
                Certificate: [][]byte{},
67
                PrivateKey:  svid.PrivateKey,
68
            }
69
            for _, c := range svid.Certificates {
70
                cert.Certificate = append(cert.Certificate, c.Raw)
71
            }
72
            return cert, nil
73
        },
74
    }
75

76
    rtc.mu.Lock()
77
    rtc.tlsConfig = tlsConfig
78
    rtc.mu.Unlock()
79

80
    log.Println("TLS configuration updated with new SVID")
81
    return nil
82
}
83

84
func (rtc *RotatingTLSConfig) watchSVIDRotation() {
85
    ticker := time.NewTicker(30 * time.Second)
86
    defer ticker.Stop()
87

88
    for {
89
        select {
90
        case <-rtc.ctx.Done():
91
            return
92
        case <-ticker.C:
93
            if err := rtc.updateTLSConfig(); err != nil {
94
                log.Printf("Failed to update TLS config: %v", err)
95
            }
96
        }
97
    }
98
}
99

100
func (rtc *RotatingTLSConfig) GetTLSConfig() *tls.Config {
101
    rtc.mu.RLock()
102
    defer rtc.mu.RUnlock()
103
    return rtc.tlsConfig
104
}

Error Handling and Retries#

1
// resilient-client.go - Production-grade error handling
2
package main
3

4
import (
5
    "context"
6
    "fmt"
7
    "net/http"
8
    "time"
9

10
    "github.com/cenkalti/backoff/v4"
11
    "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig"
12
    "github.com/spiffe/go-spiffe/v2/workloadapi"
13
)
14

15
type ResilientSPIFFEClient struct {
16
    workloadClient *workloadapi.Client
17
    httpClient     *http.Client
18
}
19

20
func (r *ResilientSPIFFEClient) CallWithRetry(ctx context.Context, url string) (*http.Response, error) {
21
    operation := func() (*http.Response, error) {
22
        req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
23
        if err != nil {
24
            return nil, backoff.Permanent(err)
25
        }
26

27
        resp, err := r.httpClient.Do(req)
28
        if err != nil {
29
            return nil, err // Temporary error, will retry
30
        }
31

32
        // Don't retry client errors
33
        if resp.StatusCode >= 400 && resp.StatusCode < 500 {
34
            return resp, backoff.Permanent(fmt.Errorf("client error: %d", resp.StatusCode))
35
        }
36

37
        // Retry server errors
38
        if resp.StatusCode >= 500 {
39
            resp.Body.Close()
40
            return nil, fmt.Errorf("server error: %d", resp.StatusCode)
41
        }
42

43
        return resp, nil
44
    }
45

46
    // Configure exponential backoff
47
    b := backoff.NewExponentialBackOff()
48
    b.MaxElapsedTime = 30 * time.Second
49

50
    return backoff.RetryWithData(operation, b)
51
}

Step 6: Observability and Debugging#

mTLS Metrics with Prometheus#

1
// metrics.go - Prometheus metrics for mTLS
2
package main
3

4
import (
5
    "github.com/prometheus/client_golang/prometheus"
6
    "github.com/prometheus/client_golang/prometheus/promauto"
7
)
8

9
var (
10
    mtlsConnectionsTotal = promauto.NewCounterVec(
11
        prometheus.CounterOpts{
12
            Name: "spiffe_mtls_connections_total",
13
            Help: "Total number of mTLS connections established",
14
        },
15
        []string{"source_id", "target_id", "status"},
16
    )
17

18
    svidRotationTotal = promauto.NewCounter(
19
        prometheus.CounterOpts{
20
            Name: "spiffe_svid_rotation_total",
21
            Help: "Total number of SVID rotations",
22
        },
23
    )
24

25
    svidExpirySeconds = promauto.NewGauge(
26
        prometheus.GaugeOpts{
27
            Name: "spiffe_svid_expiry_seconds",
28
            Help: "Time until SVID expiry in seconds",
29
        },
30
    )
31
)

Debugging mTLS Issues#

1
# Check if workloads have SVIDs
2
kubectl exec -n spiffe-demo deployment/frontend -- \
3
  ls -la /spiffe-workload-api/
4

5
# View SPIRE agent logs
6
kubectl logs -n spire-system -l app=spire-agent --tail=100
7

8
# Check registration entries
9
kubectl exec -n spire-system spire-server-0 -c spire-server -- \
10
  /opt/spire/bin/spire-server entry list -selector k8s:ns:spiffe-demo
11

12
# Test Workload API connectivity
13
kubectl exec -n spiffe-demo deployment/frontend -- \
14
  nc -zv /spiffe-workload-api/spire-agent.sock
15

16
# Capture TLS handshake details
17
kubectl exec -n spiffe-demo deployment/frontend -- \
18
  openssl s_client -connect backend:8443 -showcerts

Common Issues and Solutions#

Issue 1: “Unable to create workload API client”#

Symptoms:

1
Unable to create workload API client: workloadapi: unable to dial agent: dial unix /spiffe-workload-api/spire-agent.sock: connect: no such file or directory

Solution:

1
# Ensure CSI driver volume is mounted correctly
2
volumes:
3
  - name: spiffe-workload-api
4
    csi:
5
      driver: "csi.spiffe.io"
6
      readOnly: true
7
      # Optional: specify node publish secret
8
      # nodePublishSecretRef:
9
      #   name: spiffe-csi-driver-node-publish-secret

Issue 2: “x509: certificate signed by unknown authority”#

Symptoms:

1
x509: certificate signed by unknown authority

Solution:

1
// Ensure you're using SPIFFE trust bundle, not system roots
2
tlsConfig := tlsconfig.MTLSClientConfig(
3
    source,  // X.509 source (workload API client)
4
    source,  // Bundle source (same client)
5
    tlsconfig.AuthorizeID(serverID),
6
)

Issue 3: SVID Not Issued#

Symptoms: Pods running but no SVID received

Solution:

1
# Check pod labels match ClusterSPIFFEID selector
2
kubectl get pod -n spiffe-demo -l app=frontend --show-labels
3

4
# Verify ClusterSPIFFEID is created
5
kubectl get clusterspiffeid
6

7
# Check SPIRE server for registration entries
8
kubectl exec -n spire-system spire-server-0 -c spire-server -- \
9
  /opt/spire/bin/spire-server entry list

Performance Optimization#

Connection Pooling#

1
// connection-pool.go - Reuse mTLS connections
2
transport := &http.Transport{
3
    TLSClientConfig:     tlsConfig,
4
    MaxIdleConns:        100,
5
    MaxIdleConnsPerHost: 10,
6
    IdleConnTimeout:     90 * time.Second,
7
    TLSHandshakeTimeout: 10 * time.Second,
8
    ExpectContinueTimeout: 1 * time.Second,
9

10
    // HTTP/2 support
11
    ForceAttemptHTTP2: true,
12
}
13

14
client := &http.Client{
15
    Transport: transport,
16
    Timeout:   30 * time.Second,
17
}

SVID Caching#

1
// svid-cache.go - Cache SVIDs to reduce Workload API calls
2
type SVIDCache struct {
3
    client     *workloadapi.Client
4
    x509Ctx    *workloadapi.X509Context
5
    jwtSVIDs   map[string]*jwtsvid.SVID
6
    mu         sync.RWMutex
7
    updateChan chan struct{}
8
}
9

10
func NewSVIDCache(ctx context.Context, client *workloadapi.Client) (*SVIDCache, error) {
11
    cache := &SVIDCache{
12
        client:     client,
13
        jwtSVIDs:   make(map[string]*jwtsvid.SVID),
14
        updateChan: make(chan struct{}, 1),
15
    }
16

17
    // Initial fetch
18
    if err := cache.update(ctx); err != nil {
19
        return nil, err
20
    }
21

22
    // Watch for updates
23
    go cache.watchUpdates(ctx)
24

25
    return cache, nil
26
}

Integration with Service Meshes#

Using with Istio#

1
apiVersion: v1
2
kind: ConfigMap
3
metadata:
4
  name: istio-spiffe
5
  namespace: istio-system
6
data:
7
  mesh: |
8
    defaultConfig:
9
      proxyStatsMatcher:
10
        inclusionRegexps:
11
        - ".*outlier_detection.*"
12
        - ".*osconfig.*"
13
        - ".*circuit_breakers.*"
14
      proxyMetadata:
15
        PILOT_ENABLE_WORKLOAD_ENTRY_AUTOREGISTRATION: true
16
        BOOTSTRAP_XDS_AGENT: true
17
    trustDomain: prod.example.com
18
    # Use SPIRE as CA
19
    caCertificates:
20
    - spiffe://prod.example.com/spire/ca

Using with Linkerd#

1
apiVersion: linkerd.io/v1alpha1
2
kind: Policy
3
metadata:
4
  name: spiffe-identity
5
  namespace: spiffe-demo
6
spec:
7
  targetRef:
8
    kind: Service
9
    name: backend
10
  requiredAuthenticationRefs:
11
    - kind: MeshTLSAuthentication
12
      name: spiffe-mtls
13
---
14
apiVersion: linkerd.io/v1alpha1
15
kind: MeshTLSAuthentication
16
metadata:
17
  name: spiffe-mtls
18
  namespace: spiffe-demo
19
spec:
20
  identityRefs:
21
    - kind: ServiceAccount
22
      name: frontend

Conclusion#

Implementing pod-to-pod mTLS with SPIFFE/SPIRE transforms Kubernetes security from network-based trust to cryptographic identity-based trust. We’ve covered:

✅ CSI driver integration for seamless SVID delivery
✅ mTLS implementation patterns for different scenarios
✅ Production considerations including rotation and monitoring
✅ Debugging techniques and common issues
✅ Performance optimization strategies

The combination of SPIFFE’s standardized identity format and SPIRE’s robust implementation provides a production-ready foundation for zero-trust networking in Kubernetes.

In the next post, we’ll explore high-availability SPIRE deployments, including multi-region federation and disaster recovery strategies.

Additional Resources#

Have questions about implementing mTLS in your environment? Join the discussion in the SPIFFE Slack community or reach out directly.