Introduction: Beyond Single-Cluster Identity
In our previous guides, we’ve built robust SPIFFE/SPIRE deployments within single Kubernetes clusters. However, modern enterprises operate across multiple clusters, regions, and cloud providers. This creates the challenge of establishing trust relationships between workloads that span organizational boundaries while maintaining the cryptographic guarantees that make SPIFFE/SPIRE so powerful.
This comprehensive guide explores SPIFFE federation—the mechanism that enables secure, verifiable communication between workloads across different trust domains without compromising on zero-trust principles or requiring complex credential management.
Understanding SPIFFE Federation Architecture
Let’s visualize a multi-cluster federated SPIFFE deployment:
graph TB subgraph "Trust Domain: aws.company.com" subgraph "AWS Production Cluster" AWS_SPIRE_SERVER[SPIRE Server AWS] AWS_SPIRE_AGENT1[SPIRE Agent] AWS_WL1[Frontend Service] AWS_WL2[Auth Service]
AWS_SPIRE_SERVER --> AWS_SPIRE_AGENT1 AWS_SPIRE_AGENT1 --> AWS_WL1 AWS_SPIRE_AGENT1 --> AWS_WL2 end
AWS_BUNDLE_EP[Bundle Endpoint<br/>8443] AWS_SPIRE_SERVER --> AWS_BUNDLE_EP end
subgraph "Trust Domain: gcp.company.com" subgraph "GCP Data Cluster" GCP_SPIRE_SERVER[SPIRE Server GCP] GCP_SPIRE_AGENT1[SPIRE Agent] GCP_WL1[Data Service] GCP_WL2[ML Pipeline]
GCP_SPIRE_SERVER --> GCP_SPIRE_AGENT1 GCP_SPIRE_AGENT1 --> GCP_WL1 GCP_SPIRE_AGENT1 --> GCP_WL2 end
GCP_BUNDLE_EP[Bundle Endpoint<br/>8443] GCP_SPIRE_SERVER --> GCP_BUNDLE_EP end
subgraph "Trust Domain: onprem.company.com" subgraph "On-Premises Edge Cluster" ONPREM_SPIRE_SERVER[SPIRE Server OnPrem] ONPREM_SPIRE_AGENT1[SPIRE Agent] ONPREM_WL1[Legacy System] ONPREM_WL2[Edge Gateway]
ONPREM_SPIRE_SERVER --> ONPREM_SPIRE_AGENT1 ONPREM_SPIRE_AGENT1 --> ONPREM_WL1 ONPREM_SPIRE_AGENT1 --> ONPREM_WL2 end
ONPREM_BUNDLE_EP[Bundle Endpoint<br/>8443] ONPREM_SPIRE_SERVER --> ONPREM_BUNDLE_EP end
subgraph "Federation Relationships" AWS_BUNDLE_EP -.->|Trust Bundle Exchange| GCP_BUNDLE_EP GCP_BUNDLE_EP -.->|Trust Bundle Exchange| ONPREM_BUNDLE_EP ONPREM_BUNDLE_EP -.->|Trust Bundle Exchange| AWS_BUNDLE_EP end
subgraph "Cross-Cluster Communication" AWS_WL1 -.->|mTLS with Federated SVID| GCP_WL1 GCP_WL2 -.->|mTLS with Federated SVID| ONPREM_WL2 ONPREM_WL1 -.->|mTLS with Federated SVID| AWS_WL2 end
style AWS_SPIRE_SERVER fill:#ff9999 style GCP_SPIRE_SERVER fill:#99ff99 style ONPREM_SPIRE_SERVER fill:#9999ff style AWS_BUNDLE_EP fill:#ffcccc style GCP_BUNDLE_EP fill:#ccffcc style ONPREM_BUNDLE_EP fill:#ccccff
Federation Benefits
- Cross-Cloud Zero Trust: Workloads authenticate across cloud boundaries without VPNs or complex networking
- Cryptographic Trust: Federation relationships are based on cryptographic verification, not network controls
- Scalable Identity: Central identity management across distributed infrastructure
- Regulatory Compliance: Meet data residency requirements while maintaining unified security
- Disaster Recovery: Seamless failover between federated clusters
Federation Concepts and Components
Trust Domains
Each SPIRE deployment operates within a trust domain - a boundary within which SPIRE has authority to mint and validate identities:
# Trust domain examplesspiffe://aws.company.com # AWS production environmentspiffe://gcp.company.com # GCP data processing environmentspiffe://onprem.company.com # On-premises legacy systemsspiffe://edge.company.com # Edge computing nodesspiffe://partner.example.com # External partner systems
Trust Bundles
A trust bundle contains the public keys (root CAs) that a trust domain uses to validate SVIDs. During federation, trust domains exchange their bundles to establish mutual trust.
Bundle Endpoints
Each SPIRE Server exposes a bundle endpoint that allows other trust domains to retrieve its current trust bundle. This enables automatic trust bundle updates when certificates rotate.
Setting Up Multi-Cluster Federation
Step 1: Configure Trust Domain Infrastructure
Let’s set up three clusters representing a typical enterprise scenario:
# aws-cluster-config.yaml - AWS Production ClusterapiVersion: v1kind: ConfigMapmetadata: name: spire-server-aws-config namespace: spire-systemdata: server.conf: | server { bind_address = "0.0.0.0" bind_port = "8081" socket_path = "/tmp/spire-server/private/api.sock" trust_domain = "aws.company.com" data_dir = "/run/spire/data" log_level = "INFO"
# Federation configuration federation { # Bundle endpoint configuration bundle_endpoint { address = "0.0.0.0" port = 8443
# ACL for bundle access acme { tos_accepted = true cache_dir = "/tmp/spire-server/private/acme" directory_url = "https://acme-v02.api.letsencrypt.org/directory" } }
# Federated trust domains federates_with { "gcp.company.com" { bundle_endpoint_url = "https://spire-bundle.gcp.company.com:8443" bundle_endpoint_profile { endpoint_spiffe_id = "spiffe://gcp.company.com/spire/server"
# Authentication method type = "https_spiffe"
# Custom CA for verification (optional) # tls_ca_cert_path = "/etc/ssl/certs/gcp-ca.pem" } }
"onprem.company.com" { bundle_endpoint_url = "https://spire-bundle.onprem.company.com:8443" bundle_endpoint_profile { type = "https_web"
# Web PKI verification # tls_ca_cert_path = "/etc/ssl/certs/ca-certificates.crt" } }
# Partner trust domain with restricted access "partner.example.com" { bundle_endpoint_url = "https://spire-bundle.partner.example.com:8443" bundle_endpoint_profile { type = "https_spiffe" endpoint_spiffe_id = "spiffe://partner.example.com/spire/server" }
# Refresh interval for partner bundles refresh_hint = "3600s" } } }
# CA configuration for federation ca_subject = { country = ["US"], organization = ["Company Corp"], organizational_unit = ["AWS Production"], common_name = "SPIRE Server CA - AWS", }
# JWT issuer for cross-domain authentication jwt_issuer = "https://oidc.aws.company.com" }
plugins { NodeAttestor "aws_iid" { plugin_data { access_key_id = "${AWS_ACCESS_KEY_ID}" secret_access_key = "${AWS_SECRET_ACCESS_KEY}" account_ids_for_verification = ["123456789012"] instance_tag_requirements = { "Environment" = ["production"] "TrustDomain" = ["aws.company.com"] } } }
WorkloadAttestor "k8s" { plugin_data { skip_kubelet_verification = false kubelet_secure_port = 10250 } }
DataStore "sql" { plugin_data { database_type = "postgres" connection_string = "host=postgres-aws.data.svc.cluster.local dbname=spire user=spire sslmode=require" } }
KeyManager "aws_kms" { plugin_data { key_id = "arn:aws:kms:us-east-1:123456789012:key/aws-spire-key" region = "us-east-1" } }
UpstreamAuthority "aws_pca" { plugin_data { certificate_authority_arn = "arn:aws:acm-pca:us-east-1:123456789012:certificate-authority/aws-spire-ca" region = "us-east-1" validity_period_hours = 8760 } }
# Bundle endpoint notifier Notifier "k8sbundle" { plugin_data { webhook_label = "spiffe.io/webhook" config_map = "spire-bundle" config_map_key = "bundle.crt" namespace = "spire-system" } } }---# gcp-cluster-config.yaml - GCP Data ClusterapiVersion: v1kind: ConfigMapmetadata: name: spire-server-gcp-config namespace: spire-systemdata: server.conf: | server { bind_address = "0.0.0.0" bind_port = "8081" socket_path = "/tmp/spire-server/private/api.sock" trust_domain = "gcp.company.com" data_dir = "/run/spire/data" log_level = "INFO"
federation { bundle_endpoint { address = "0.0.0.0" port = 8443
# GCP-specific TLS configuration tls { cert_chain_path = "/etc/ssl/spire/bundle-endpoint.crt" private_key_path = "/etc/ssl/spire/bundle-endpoint.key" ca_cert_path = "/etc/ssl/spire/ca.crt" } }
federates_with { "aws.company.com" { bundle_endpoint_url = "https://spire-bundle.aws.company.com:8443" bundle_endpoint_profile { type = "https_spiffe" endpoint_spiffe_id = "spiffe://aws.company.com/spire/server" } }
"onprem.company.com" { bundle_endpoint_url = "https://spire-bundle.onprem.company.com:8443" bundle_endpoint_profile { type = "https_web" } } } }
ca_subject = { country = ["US"], organization = ["Company Corp"], organizational_unit = ["GCP Data Processing"], common_name = "SPIRE Server CA - GCP", }
jwt_issuer = "https://oidc.gcp.company.com" }
plugins { NodeAttestor "gcp_iit" { plugin_data { projectid_whitelist = ["company-gcp-data"] service_account_whitelist = [ "spire-agent@company-gcp-data.iam.gserviceaccount.com" ] zone_whitelist = ["us-central1-a", "us-central1-b"] } }
WorkloadAttestor "k8s" { plugin_data { skip_kubelet_verification = false kubelet_secure_port = 10250 } }
DataStore "sql" { plugin_data { database_type = "postgres" connection_string = "host=postgres-gcp.data.svc.cluster.local dbname=spire user=spire sslmode=require" } }
KeyManager "gcp_kms" { plugin_data { key_name = "projects/company-gcp-data/locations/us-central1/keyRings/spire/cryptoKeys/spire-server" } }
UpstreamAuthority "gcp_cas" { plugin_data { ca_name = "projects/company-gcp-data/locations/us-central1/certificateAuthorities/spire-ca" validity_period_hours = 8760 } } }---# onprem-cluster-config.yaml - On-Premises Edge ClusterapiVersion: v1kind: ConfigMapmetadata: name: spire-server-onprem-config namespace: spire-systemdata: server.conf: | server { bind_address = "0.0.0.0" bind_port = "8081" socket_path = "/tmp/spire-server/private/api.sock" trust_domain = "onprem.company.com" data_dir = "/run/spire/data" log_level = "INFO"
federation { bundle_endpoint { address = "0.0.0.0" port = 8443
# On-premises certificate management tls { cert_chain_path = "/etc/ssl/spire/server.crt" private_key_path = "/etc/ssl/spire/server.key" ca_cert_path = "/etc/ssl/spire/ca.crt" }
# Access control for on-premises acl { authorized_keys = [ "/etc/ssl/spire/federation-client.pub" ] } }
federates_with { "aws.company.com" { bundle_endpoint_url = "https://spire-bundle.aws.company.com:8443" bundle_endpoint_profile { type = "https_spiffe" endpoint_spiffe_id = "spiffe://aws.company.com/spire/server" } }
"gcp.company.com" { bundle_endpoint_url = "https://spire-bundle.gcp.company.com:8443" bundle_endpoint_profile { type = "https_spiffe" endpoint_spiffe_id = "spiffe://gcp.company.com/spire/server" } } } }
ca_subject = { country = ["US"], organization = ["Company Corp"], organizational_unit = ["On-Premises Operations"], common_name = "SPIRE Server CA - OnPrem", }
jwt_issuer = "https://oidc.onprem.company.com" }
plugins { NodeAttestor "join_token" { plugin_data = {} }
WorkloadAttestor "k8s" { plugin_data { skip_kubelet_verification = false kubelet_secure_port = 10250 } }
WorkloadAttestor "unix" { plugin_data { discover_workload_path = true } }
DataStore "sql" { plugin_data { database_type = "postgres" connection_string = "host=postgres-onprem.data.svc.cluster.local dbname=spire user=spire sslmode=require" } }
KeyManager "disk" { plugin_data { keys_path = "/run/spire/data/keys.json" } }
UpstreamAuthority "disk" { plugin_data { cert_file_path = "/run/spire/ca/ca.crt" key_file_path = "/run/spire/ca/ca.key" } } }
Step 2: Configure Workload Registration for Federation
Create ClusterSPIFFEID resources that enable cross-cluster communication:
# AWS Frontend Service - can communicate with GCP data servicesapiVersion: spire.spiffe.io/v1alpha1kind: ClusterSPIFFEIDmetadata: name: aws-frontend-service namespace: spire-systemspec: spiffeIDTemplate: "spiffe://aws.company.com/ns/{{ .PodMeta.Namespace }}/service/{{ .PodMeta.Labels.service }}"
podSelector: matchLabels: component: frontend
namespaceSelector: matchNames: - "production" - "staging"
workloadSelectorTemplates: - "k8s:ns:{{ .PodMeta.Namespace }}" - "k8s:sa:{{ .PodSpec.ServiceAccountName }}" - "k8s:service:{{ .PodMeta.Labels.service }}"
# Enable federation with GCP and on-premises federatesWith: - "gcp.company.com" - "onprem.company.com"
dnsNameTemplates: - "{{ .PodMeta.Labels.service }}.{{ .PodMeta.Namespace }}.svc.cluster.local" - "frontend.aws.company.com"
ttl: 3600---# GCP Data Service - can receive requests from AWS and send to on-premisesapiVersion: spire.spiffe.io/v1alpha1kind: ClusterSPIFFEIDmetadata: name: gcp-data-service namespace: spire-systemspec: spiffeIDTemplate: "spiffe://gcp.company.com/ns/{{ .PodMeta.Namespace }}/service/{{ .PodMeta.Labels.service }}/version/{{ .PodMeta.Labels.version }}"
podSelector: matchLabels: component: data-processing
workloadSelectorTemplates: - "k8s:ns:{{ .PodMeta.Namespace }}" - "k8s:sa:{{ .PodSpec.ServiceAccountName }}" - "k8s:service:{{ .PodMeta.Labels.service }}" - "k8s:version:{{ .PodMeta.Labels.version }}"
# Federation with AWS (to receive requests) and on-premises (to access legacy data) federatesWith: - "aws.company.com" - "onprem.company.com"
dnsNameTemplates: - "{{ .PodMeta.Labels.service }}.{{ .PodMeta.Namespace }}.svc.cluster.local" - "data.gcp.company.com"
ttl: 3600---# On-Premises Legacy System - bridge to cloud servicesapiVersion: spire.spiffe.io/v1alpha1kind: ClusterSPIFFEIDmetadata: name: onprem-legacy-bridge namespace: spire-systemspec: spiffeIDTemplate: "spiffe://onprem.company.com/datacenter/{{ .PodMeta.Labels.datacenter }}/system/{{ .PodMeta.Labels.system }}"
podSelector: matchLabels: component: legacy-bridge
workloadSelectorTemplates: - "k8s:ns:{{ .PodMeta.Namespace }}" - "k8s:sa:{{ .PodSpec.ServiceAccountName }}" - "k8s:datacenter:{{ .PodMeta.Labels.datacenter }}" - "k8s:system:{{ .PodMeta.Labels.system }}"
# Can communicate with cloud services federatesWith: - "aws.company.com" - "gcp.company.com"
dnsNameTemplates: - "{{ .PodMeta.Labels.system }}.{{ .PodMeta.Namespace }}.svc.cluster.local" - "legacy.onprem.company.com"
ttl: 7200 # Longer TTL for stable legacy systems---# Cross-domain API GatewayapiVersion: spire.spiffe.io/v1alpha1kind: ClusterSPIFFEIDmetadata: name: cross-domain-gateway namespace: spire-systemspec: spiffeIDTemplate: "spiffe://{{ .TrustDomain }}/gateway/{{ .PodMeta.Labels.gateway-type }}/{{ .PodMeta.Labels.region }}"
podSelector: matchLabels: component: api-gateway
workloadSelectorTemplates: - "k8s:ns:{{ .PodMeta.Namespace }}" - "k8s:sa:{{ .PodSpec.ServiceAccountName }}" - "k8s:gateway-type:{{ .PodMeta.Labels.gateway-type }}" - "k8s:region:{{ .PodMeta.Labels.region }}"
# Gateway can communicate across all domains federatesWith: - "aws.company.com" - "gcp.company.com" - "onprem.company.com" - "partner.example.com"
dnsNameTemplates: - "api-gateway.{{ .PodMeta.Namespace }}.svc.cluster.local" - 'api.{{ .TrustDomain | replace "://" "." }}'
ttl: 3600
Step 3: Deploy Cross-Cluster Application Example
Let’s deploy a distributed application that spans multiple clusters:
# aws-frontend-deployment.yaml - Deployed in AWS clusterapiVersion: v1kind: Namespacemetadata: name: distributed-app labels: federation: enabled---apiVersion: v1kind: ServiceAccountmetadata: name: frontend-service namespace: distributed-app---apiVersion: apps/v1kind: Deploymentmetadata: name: frontend namespace: distributed-appspec: replicas: 3 selector: matchLabels: app: frontend service: frontend template: metadata: labels: app: frontend service: frontend component: frontend version: v1 spec: serviceAccountName: frontend-service containers: - name: frontend image: company/frontend:v1.2.3 ports: - containerPort: 8080 env: # SPIFFE configuration - name: SPIFFE_ENDPOINT_SOCKET value: "unix:///run/spire/sockets/agent.sock" - name: TRUST_DOMAIN value: "aws.company.com" # Service endpoints in other clusters - name: DATA_SERVICE_URL value: "https://data.gcp.company.com:8443" - name: LEGACY_SERVICE_URL value: "https://legacy.onprem.company.com:8443" volumeMounts: - name: spire-agent-socket mountPath: /run/spire/sockets readOnly: true volumes: - name: spire-agent-socket hostPath: path: /run/spire/sockets type: DirectoryOrCreate---apiVersion: v1kind: Servicemetadata: name: frontend namespace: distributed-appspec: selector: app: frontend ports: - port: 8080 targetPort: 8080 name: http type: LoadBalancer
# gcp-data-service-deployment.yaml - Deployed in GCP clusterapiVersion: v1kind: Namespacemetadata: name: distributed-app labels: federation: enabled---apiVersion: v1kind: ServiceAccountmetadata: name: data-service namespace: distributed-app---apiVersion: apps/v1kind: Deploymentmetadata: name: data-service namespace: distributed-appspec: replicas: 2 selector: matchLabels: app: data-service service: data-processing template: metadata: labels: app: data-service service: data-processing component: data-processing version: v2 spec: serviceAccountName: data-service containers: - name: data-service image: company/data-service:v2.1.0 ports: - containerPort: 8443 env: - name: SPIFFE_ENDPOINT_SOCKET value: "unix:///run/spire/sockets/agent.sock" - name: TRUST_DOMAIN value: "gcp.company.com" # Allowed client trust domains - name: ALLOWED_CLIENT_TRUST_DOMAINS value: "aws.company.com,onprem.company.com" # Legacy system endpoint - name: LEGACY_DB_URL value: "https://database.onprem.company.com:5432" volumeMounts: - name: spire-agent-socket mountPath: /run/spire/sockets readOnly: true volumes: - name: spire-agent-socket hostPath: path: /run/spire/sockets type: DirectoryOrCreate---apiVersion: v1kind: Servicemetadata: name: data-service namespace: distributed-appspec: selector: app: data-service ports: - port: 8443 targetPort: 8443 name: https
# onprem-legacy-bridge-deployment.yaml - Deployed in on-premises clusterapiVersion: v1kind: Namespacemetadata: name: distributed-app labels: federation: enabled---apiVersion: v1kind: ServiceAccountmetadata: name: legacy-bridge namespace: distributed-app---apiVersion: apps/v1kind: Deploymentmetadata: name: legacy-bridge namespace: distributed-appspec: replicas: 1 selector: matchLabels: app: legacy-bridge system: mainframe-bridge template: metadata: labels: app: legacy-bridge system: mainframe-bridge component: legacy-bridge datacenter: primary spec: serviceAccountName: legacy-bridge containers: - name: legacy-bridge image: company/legacy-bridge:v1.0.5 ports: - containerPort: 8443 - containerPort: 5432 # Database proxy port env: - name: SPIFFE_ENDPOINT_SOCKET value: "unix:///run/spire/sockets/agent.sock" - name: TRUST_DOMAIN value: "onprem.company.com" # Cloud service endpoints that can access this bridge - name: ALLOWED_CLIENT_TRUST_DOMAINS value: "aws.company.com,gcp.company.com" # Legacy system configuration - name: MAINFRAME_HOST value: "mainframe.internal.company.com" - name: DATABASE_HOST value: "db-cluster.internal.company.com" volumeMounts: - name: spire-agent-socket mountPath: /run/spire/sockets readOnly: true # Mount legacy certificates for backward compatibility - name: legacy-certs mountPath: /etc/ssl/legacy readOnly: true volumes: - name: spire-agent-socket hostPath: path: /run/spire/sockets type: DirectoryOrCreate - name: legacy-certs secret: secretName: legacy-system-certs---apiVersion: v1kind: Servicemetadata: name: legacy-bridge namespace: distributed-appspec: selector: app: legacy-bridge ports: - port: 8443 targetPort: 8443 name: https - port: 5432 targetPort: 5432 name: database
Step 4: Implement Cross-Cluster mTLS Communication
Here’s how applications use federated SPIFFE identities for cross-cluster communication:
// frontend-service.go - AWS Frontend Servicepackage main
import ( "context" "crypto/tls" "fmt" "io" "net/http" "time"
"github.com/spiffe/go-spiffe/v2/spiffeid" "github.com/spiffe/go-spiffe/v2/spiffetls" "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig" "github.com/spiffe/go-spiffe/v2/workloadapi")
func main() { ctx := context.Background()
// Create Workload API client client, err := workloadapi.New(ctx, workloadapi.WithAddr("unix:///run/spire/sockets/agent.sock")) if err != nil { panic(fmt.Sprintf("Failed to create workload API client: %v", err)) } defer client.Close()
// Set up HTTP server for incoming requests go startHTTPServer(client)
// Example: Call GCP data service callGCPDataService(client)
// Example: Call on-premises legacy service callOnPremLegacyService(client)
select {} // Keep running}
func startHTTPServer(client *workloadapi.Client) { // Create TLS config that accepts requests from any federated trust domain tlsConfig := tlsconfig.MTLSServerConfig(client, client, tlsconfig.AuthorizeAny())
server := &http.Server{ Addr: ":8080", TLSConfig: tlsConfig, Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { // Extract client identity from certificate if r.TLS != nil && len(r.TLS.PeerCertificates) > 0 { clientID, err := spiffeid.FromURI(r.TLS.PeerCertificates[0].URIs[0]) if err == nil { fmt.Printf("Request from: %s\n", clientID)
// Make decisions based on client trust domain switch clientID.TrustDomain().String() { case "gcp.company.com": w.Header().Set("Content-Type", "application/json") fmt.Fprintf(w, `{"message": "Hello from AWS frontend", "client": "%s"}`, clientID) case "onprem.company.com": w.Header().Set("Content-Type", "application/json") fmt.Fprintf(w, `{"message": "Legacy system acknowledged", "client": "%s"}`, clientID) default: http.Error(w, "Unauthorized trust domain", http.StatusForbidden) } return } } http.Error(w, "No valid client certificate", http.StatusUnauthorized) }), }
fmt.Println("Frontend server starting on :8080...") if err := server.ListenAndServeTLS("", ""); err != nil { panic(fmt.Sprintf("Server failed: %v", err)) }}
func callGCPDataService(client *workloadapi.Client) { // Create TLS config for calling GCP data service gcpDataID := spiffeid.Must("gcp.company.com", "ns", "distributed-app", "service", "data-processing") tlsConfig := tlsconfig.MTLSClientConfig(client, client, tlsconfig.AuthorizeID(gcpDataID))
httpClient := &http.Client{ Transport: &http.Transport{ TLSClientConfig: tlsConfig, }, Timeout: 30 * time.Second, }
// Make authenticated request to GCP data service resp, err := httpClient.Get("https://data.gcp.company.com:8443/api/process-data") if err != nil { fmt.Printf("Failed to call GCP data service: %v\n", err) return } defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body) fmt.Printf("GCP Data Service Response: %s\n", body)}
func callOnPremLegacyService(client *workloadapi.Client) { // Create TLS config for calling on-premises legacy bridge onPremID := spiffeid.Must("onprem.company.com", "datacenter", "primary", "system", "mainframe-bridge") tlsConfig := tlsconfig.MTLSClientConfig(client, client, tlsconfig.AuthorizeID(onPremID))
httpClient := &http.Client{ Transport: &http.Transport{ TLSClientConfig: tlsConfig, }, Timeout: 60 * time.Second, // Longer timeout for legacy systems }
// Make authenticated request to on-premises legacy bridge resp, err := httpClient.Get("https://legacy.onprem.company.com:8443/api/legacy-data") if err != nil { fmt.Printf("Failed to call on-premises legacy service: %v\n", err) return } defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body) fmt.Printf("On-Premises Legacy Service Response: %s\n", body)}
// data-service.go - GCP Data Servicepackage main
import ( "context" "database/sql" "encoding/json" "fmt" "net/http" "strings" "time"
"github.com/spiffe/go-spiffe/v2/spiffeid" "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig" "github.com/spiffe/go-spiffe/v2/workloadapi" _ "github.com/lib/pq")
type DataService struct { client *workloadapi.Client database *sql.DB}
func main() { ctx := context.Background()
// Create Workload API client client, err := workloadapi.New(ctx, workloadapi.WithAddr("unix:///run/spire/sockets/agent.sock")) if err != nil { panic(fmt.Sprintf("Failed to create workload API client: %v", err)) } defer client.Close()
// Connect to database (simulated) db, err := sql.Open("postgres", "host=db-cluster.data.svc.cluster.local dbname=analytics user=dataservice sslmode=require") if err != nil { panic(fmt.Sprintf("Failed to connect to database: %v", err)) } defer db.Close()
service := &DataService{ client: client, database: db, }
service.startServer()}
func (ds *DataService) startServer() { // Create TLS config that accepts federated clients allowedClients := []spiffeid.ID{ spiffeid.Must("aws.company.com", "ns", "distributed-app", "service", "frontend"), spiffeid.Must("onprem.company.com", "datacenter", "primary", "system", "mainframe-bridge"), }
tlsConfig := tlsconfig.MTLSServerConfig(ds.client, ds.client, tlsconfig.AuthorizeOneOf(allowedClients...))
mux := http.NewServeMux() mux.HandleFunc("/api/process-data", ds.processDataHandler) mux.HandleFunc("/api/health", ds.healthHandler)
server := &http.Server{ Addr: ":8443", TLSConfig: tlsConfig, Handler: mux, }
fmt.Println("GCP Data Service starting on :8443...") if err := server.ListenAndServeTLS("", ""); err != nil { panic(fmt.Sprintf("Server failed: %v", err)) }}
func (ds *DataService) processDataHandler(w http.ResponseWriter, r *http.Request) { // Extract and validate client identity clientID, err := ds.getClientIdentity(r) if err != nil { http.Error(w, "Invalid client identity", http.StatusUnauthorized) return }
fmt.Printf("Processing data request from: %s\n", clientID)
// Different processing based on client trust domain var response map[string]interface{}
switch clientID.TrustDomain().String() { case "aws.company.com": // AWS frontend gets aggregated data response = map[string]interface{}{ "type": "aggregated", "data": ds.getAggregatedData(), "source": "gcp.company.com", "client": clientID.String(), "timestamp": time.Now().Unix(), }
case "onprem.company.com": // On-premises gets raw data for legacy processing response = map[string]interface{}{ "type": "raw", "data": ds.getRawDataForLegacy(), "source": "gcp.company.com", "client": clientID.String(), "timestamp": time.Now().Unix(), }
default: http.Error(w, "Unauthorized trust domain", http.StatusForbidden) return }
w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(response)}
func (ds *DataService) healthHandler(w http.ResponseWriter, r *http.Request) { w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(map[string]string{ "status": "healthy", "trust_domain": "gcp.company.com", "service": "data-processing", })}
func (ds *DataService) getClientIdentity(r *http.Request) (spiffeid.ID, error) { if r.TLS == nil || len(r.TLS.PeerCertificates) == 0 { return spiffeid.ID{}, fmt.Errorf("no client certificate") }
return spiffeid.FromURI(r.TLS.PeerCertificates[0].URIs[0])}
func (ds *DataService) getAggregatedData() interface{} { // Simulate aggregated data processing return map[string]interface{}{ "total_records": 15420, "categories": []string{"analytics", "ml-training", "reporting"}, "processed_at": time.Now().Format(time.RFC3339), }}
func (ds *DataService) getRawDataForLegacy() interface{} { // Simulate raw data for legacy systems return map[string]interface{}{ "records": []map[string]interface{}{ {"id": 1, "value": "legacy-compatible-data-1"}, {"id": 2, "value": "legacy-compatible-data-2"}, }, "format": "legacy-v1", }}
Advanced Federation Patterns
Hierarchical Trust Relationships
Configure hierarchical trust where some domains trust others transitively:
apiVersion: v1kind: ConfigMapmetadata: name: spire-server-hierarchical-config namespace: spire-systemdata: server.conf: | server { bind_address = "0.0.0.0" bind_port = "8081" trust_domain = "root.company.com"
# Root trust domain configuration federation { bundle_endpoint { address = "0.0.0.0" port = 8443 }
# Direct trust relationships federates_with { # Production environments "aws.prod.company.com" { bundle_endpoint_url = "https://spire-bundle.aws.prod.company.com:8443" bundle_endpoint_profile { type = "https_spiffe" endpoint_spiffe_id = "spiffe://aws.prod.company.com/spire/server" } trust_level = "high" }
"gcp.prod.company.com" { bundle_endpoint_url = "https://spire-bundle.gcp.prod.company.com:8443" bundle_endpoint_profile { type = "https_spiffe" endpoint_spiffe_id = "spiffe://gcp.prod.company.com/spire/server" } trust_level = "high" }
# Staging environments (lower trust) "staging.company.com" { bundle_endpoint_url = "https://spire-bundle.staging.company.com:8443" bundle_endpoint_profile { type = "https_web" } trust_level = "medium" refresh_hint = "1800s" # More frequent refresh for staging }
# Partner environments (restricted trust) "partner.example.com" { bundle_endpoint_url = "https://spire-bundle.partner.example.com:8443" bundle_endpoint_profile { type = "https_spiffe" endpoint_spiffe_id = "spiffe://partner.example.com/spire/server" } trust_level = "limited" refresh_hint = "3600s"
# Additional validation for partner domains additional_validation { required_san = ["spire-server.partner.example.com"] certificate_transparency = true } } }
# Transitive trust policies transitive_trust { # Allow production environments to trust each other transitively allow_transitive = ["aws.prod.company.com", "gcp.prod.company.com"]
# Block transitive trust for external partners block_transitive = ["partner.example.com"]
# Maximum trust chain length max_chain_length = 3 } } }
Conditional Federation
Implement time-based and condition-based federation:
apiVersion: v1kind: ConfigMapmetadata: name: spire-server-conditional-config namespace: spire-systemdata: server.conf: | server { bind_address = "0.0.0.0" bind_port = "8081" trust_domain = "conditional.company.com"
federation { bundle_endpoint { address = "0.0.0.0" port = 8443 }
federates_with { # Business hours federation "partner.business.com" { bundle_endpoint_url = "https://spire-bundle.partner.business.com:8443" bundle_endpoint_profile { type = "https_spiffe" endpoint_spiffe_id = "spiffe://partner.business.com/spire/server" }
# Time-based access control time_restrictions { allowed_hours = ["09:00-17:00"] timezone = "America/New_York" allowed_days = ["MON", "TUE", "WED", "THU", "FRI"] }
# IP-based restrictions for additional security ip_restrictions { allowed_cidrs = ["203.0.113.0/24", "198.51.100.0/24"] } }
# Emergency access federation "emergency.company.com" { bundle_endpoint_url = "https://spire-bundle.emergency.company.com:8443" bundle_endpoint_profile { type = "https_spiffe" endpoint_spiffe_id = "spiffe://emergency.company.com/spire/server" }
# Emergency conditions emergency_access { # Only activate during incidents activation_conditions = ["incident_declared", "security_breach"]
# Automatic deactivation max_duration = "4h"
# Approval workflow requires_approval = true approvers = ["security-team", "incident-commander"] } }
# Geographic federation "eu.company.com" { bundle_endpoint_url = "https://spire-bundle.eu.company.com:8443" bundle_endpoint_profile { type = "https_spiffe" endpoint_spiffe_id = "spiffe://eu.company.com/spire/server" }
# Geographic restrictions for GDPR compliance geographic_restrictions { allowed_regions = ["eu-west-1", "eu-central-1"] data_residency_required = true } } } } }
Monitoring and Observability for Federation
Set up comprehensive monitoring for federated environments:
apiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata: name: spire-federation-monitoring namespace: monitoringspec: selector: matchLabels: app: spire-server federation: enabled endpoints: - port: metrics interval: 30s path: /metrics relabelings: - sourceLabels: [__name__] regex: "spire_server_federation_.*|spire_server_bundle_.*" action: keep---# Federation-specific alertsapiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata: name: spire-federation-alerts namespace: monitoringspec: groups: - name: spire.federation rules: - alert: SPIREFederationBundleExpiry expr: | (spire_server_bundle_expiry_timestamp_seconds - time()) / 86400 < 7 for: 1h labels: severity: warning annotations: summary: "SPIRE federation bundle expiring soon" description: "Bundle for trust domain {{ $labels.trust_domain }} expires in less than 7 days"
- alert: SPIREFederationEndpointDown expr: | up{job="spire-federation-endpoints"} == 0 for: 5m labels: severity: critical annotations: summary: "SPIRE federation endpoint down" description: "Federation endpoint {{ $labels.instance }} is unreachable"
- alert: SPIREFederationTrustBundleUpdateFailed expr: | increase(spire_server_federation_bundle_update_errors_total[5m]) > 0 for: 2m labels: severity: warning annotations: summary: "SPIRE federation bundle update failed" description: "Failed to update trust bundle from {{ $labels.trust_domain }}"
- alert: SPIRECrossDomainAuthenticationFailures expr: | rate(spire_server_federation_authentication_failures_total[5m]) > 0.1 for: 3m labels: severity: warning annotations: summary: "High rate of cross-domain authentication failures" description: "{{ $value }} authentication failures per second between trust domains"
- alert: SPIREFederationLatencyHigh expr: | histogram_quantile(0.99, rate(spire_server_federation_request_duration_seconds_bucket[5m])) > 10 for: 5m labels: severity: warning annotations: summary: "High federation request latency" description: "99th percentile federation request latency is {{ $value }}s"---# Grafana dashboard for federationapiVersion: v1kind: ConfigMapmetadata: name: spire-federation-dashboard namespace: monitoringdata: dashboard.json: | { "dashboard": { "title": "SPIRE Federation Overview", "panels": [ { "title": "Active Federation Relationships", "type": "stat", "targets": [ { "expr": "count(spire_server_federation_relationship_active)", "legendFormat": "Active Federations" } ] }, { "title": "Cross-Domain Request Rate", "type": "graph", "targets": [ { "expr": "sum(rate(spire_server_federation_requests_total[5m])) by (source_trust_domain, target_trust_domain)", "legendFormat": "{{ source_trust_domain }} -> {{ target_trust_domain }}" } ] }, { "title": "Trust Bundle Health", "type": "table", "targets": [ { "expr": "spire_server_bundle_expiry_timestamp_seconds", "legendFormat": "Bundle Expiry" } ] }, { "title": "Federation Errors", "type": "graph", "targets": [ { "expr": "sum(rate(spire_server_federation_errors_total[5m])) by (trust_domain, error_type)", "legendFormat": "{{ trust_domain }} - {{ error_type }}" } ] } ] } }
Security Considerations and Best Practices
Trust Domain Segmentation
apiVersion: v1kind: ConfigMapmetadata: name: trust-domain-security-policy namespace: spire-systemdata: security-policy.rego: | package spire.federation.security
import future.keywords.contains import future.keywords.if
# Default deny all cross-domain access default allow_federation = false
# Production trust domains production_domains := { "aws.prod.company.com", "gcp.prod.company.com", "onprem.prod.company.com" }
# Staging trust domains staging_domains := { "aws.staging.company.com", "gcp.staging.company.com" }
# Partner trust domains partner_domains := { "partner.example.com", "vendor.supplier.com" }
# Allow federation within production environments allow_federation { input.source_trust_domain in production_domains input.target_trust_domain in production_domains production_security_checks }
# Allow limited staging to production access allow_federation { input.source_trust_domain in staging_domains input.target_trust_domain in production_domains staging_to_production_checks }
# Partner access with strict controls allow_federation { input.source_trust_domain in partner_domains input.target_trust_domain in production_domains partner_access_checks business_hours_check }
production_security_checks { # Require strong attestation input.attestation_strength == "high"
# Require recent certificate time.now_ns() - input.certificate_issued_time < (24 * 60 * 60 * 1000000000) # 24 hours
# Verify certificate chain input.certificate_chain_valid == true }
staging_to_production_checks { # More restrictive for staging access input.attestation_strength == "high" input.purpose in ["testing", "development", "ci-cd"]
# Time-based restrictions business_hours_check }
partner_access_checks { # Very strict for partners input.attestation_strength == "high" input.partner_approval == true
# Specific service restrictions input.target_service in allowed_partner_services
# IP whitelist input.source_ip in partner_allowed_ips }
business_hours_check { hour := time.now_ns() / 1000000000 / 3600 % 24 hour >= 9 hour <= 17 }
allowed_partner_services := [ "api-gateway", "webhook-receiver", "data-export" ]
partner_allowed_ips := [ "203.0.113.0/24", "198.51.100.0/24" ]
Certificate Rotation in Federated Environments
apiVersion: batch/v1kind: CronJobmetadata: name: federation-cert-rotation namespace: spire-systemspec: schedule: "0 2 * * *" # Daily at 2 AM jobTemplate: spec: template: spec: serviceAccountName: spire-server containers: - name: cert-rotator image: company/spire-cert-rotator:v1.0.0 env: - name: TRUST_DOMAIN value: "aws.company.com" - name: FEDERATED_DOMAINS value: "gcp.company.com,onprem.company.com,partner.example.com" - name: ROTATION_THRESHOLD_DAYS value: "30" command: - /bin/sh - -c - | # Check certificate expiry across all federated domains for domain in $(echo $FEDERATED_DOMAINS | tr ',' ' '); do echo "Checking federation with $domain..."
# Get current bundle kubectl exec spire-server-0 -c spire-server -- \ /opt/spire/bin/spire-server bundle show \ -format spiffe -socketPath /tmp/spire-server/private/api.sock \ -trustDomain $domain > /tmp/${domain}-bundle.pem
# Check expiry expiry=$(openssl x509 -in /tmp/${domain}-bundle.pem -noout -enddate | cut -d= -f2) expiry_epoch=$(date -d "$expiry" +%s) current_epoch=$(date +%s) days_until_expiry=$(( (expiry_epoch - current_epoch) / 86400 ))
if [ $days_until_expiry -lt $ROTATION_THRESHOLD_DAYS ]; then echo "Certificate for $domain expires in $days_until_expiry days, triggering rotation..."
# Trigger bundle refresh kubectl exec spire-server-0 -c spire-server -- \ /opt/spire/bin/spire-server bundle refresh \ -trustDomain $domain -socketPath /tmp/spire-server/private/api.sock
# Notify monitoring curl -X POST http://alertmanager.monitoring.svc.cluster.local:9093/api/v1/alerts \ -H "Content-Type: application/json" \ -d "[{ \"labels\": { \"alertname\": \"SPIREFederationCertRotation\", \"trust_domain\": \"$domain\", \"severity\": \"info\" }, \"annotations\": { \"summary\": \"Federation certificate rotated for $domain\" } }]" else echo "Certificate for $domain is valid for $days_until_expiry more days" fi done restartPolicy: OnFailure
Troubleshooting Federation Issues
Common Federation Problems and Solutions
# Federation troubleshooting commands
# 1. Check federation statuskubectl exec -n spire-system spire-server-0 -c spire-server -- \ /opt/spire/bin/spire-server federation list
# 2. Verify bundle endpoint connectivitykubectl exec -n spire-system spire-server-0 -c spire-server -- \ curl -v https://spire-bundle.gcp.company.com:8443
# 3. Check trust bundle contentkubectl exec -n spire-system spire-server-0 -c spire-server -- \ /opt/spire/bin/spire-server bundle show -format spiffe -trustDomain gcp.company.com
# 4. Test cross-domain SVID validationkubectl exec -n spire-system spire-server-0 -c spire-server -- \ /opt/spire/bin/spire-server validate-jwt-svid \ -audience spiffe://aws.company.com/frontend \ -svid-file /tmp/test-svid.jwt
# 5. Check federation logskubectl logs -n spire-system spire-server-0 -c spire-server | grep -i federation
# 6. Verify network connectivity between clusterskubectl run federation-test --rm -i --tty --image=curlimages/curl -- \ curl -v https://spire-bundle.gcp.company.com:8443
# 7. Check DNS resolutionkubectl run dns-test --rm -i --tty --image=busybox -- \ nslookup spire-bundle.gcp.company.com
# 8. Test certificate chain validationopenssl s_client -connect spire-bundle.gcp.company.com:8443 -servername spire-bundle.gcp.company.com
# 9. Verify SPIFFE ID in cross-domain communicationkubectl exec -n distributed-app frontend-xxx -- \ openssl s_client -connect data.gcp.company.com:8443 -servername data.gcp.company.com -showcerts
Conclusion
Multi-cluster SPIFFE federation transforms isolated identity silos into a unified, enterprise-scale zero-trust architecture. By implementing federation, organizations can:
- ✅ Enable Seamless Cross-Cloud Communication: Workloads authenticate across any infrastructure boundary
- ✅ Maintain Cryptographic Trust: Federation relationships are based on verifiable certificates, not network controls
- ✅ Scale Identity Management: Central policies with distributed enforcement across all environments
- ✅ Meet Compliance Requirements: Satisfy data residency and regulatory requirements while maintaining security
- ✅ Simplify Operations: Reduce VPN complexity and eliminate credential sprawl across environments
The patterns and examples in this guide provide a foundation for building production-grade federated identity systems that can scale from small multi-cluster deployments to global enterprise architectures spanning clouds, edge locations, and partner organizations.
In our next post, we’ll explore GitOps patterns for managing SPIFFE/SPIRE configurations, showing how to implement infrastructure-as-code practices for identity management at scale.
Additional Resources
- SPIFFE Federation Specification
- SPIRE Federation Guide
- Multi-Cluster Service Mesh with SPIRE
- SPIFFE Trust Domain Best Practices
Building a federated SPIFFE architecture for your organization? The SPIFFE community provides extensive support for enterprise federation deployments and complex multi-cloud scenarios.