Building a Secure Service Mesh Without Kubernetes Using SPIFFE, SPIRE, and Cilium
While Kubernetes has become the de facto standard for container orchestration, many organizations still run services on traditional virtual machines or have specific requirements that make Kubernetes adoption challenging. This guide demonstrates how to implement a secure, zero-trust service mesh on Linux VMs without Kubernetes, using industry-standard components like SPIFFE/SPIRE for identity management, Cilium for networking, and private DNS for service discovery.
Architecture Overview
Our architecture provides a comprehensive security model with defense-in-depth through multiple layers of protection:
graph TD User[User Terminal] -->|Commands| VM1[Control Plane VM] VM1 -->|Manages| VM2[Service Node 1] VM1 -->|Manages| VM3[Service Node 2]
subgraph "Control Plane Components" SPIRE[SPIRE Server] DNS[CoreDNS] CiliumC[Cilium Controller] end
subgraph "Service Node Components" AGENT1[SPIRE Agent] CILIUM1[Cilium Agent] SVC1[Service 1]
AGENT2[SPIRE Agent] CILIUM2[Cilium Agent] SVC2[Service 2] end
SPIRE -->|Issues Identity| AGENT1 SPIRE -->|Issues Identity| AGENT2
CiliumC -->|Network Policy| CILIUM1 CiliumC -->|Network Policy| CILIUM2
SVC1 -->|mTLS| SVC2 SVC1 -->|DNS Lookup| DNS SVC2 -->|DNS Lookup| DNS
This architecture implements:
- Identity and Access Management: SPIFFE/SPIRE for cryptographic service identity
- Network Security: Cilium for policy enforcement and segmentation
- Secure Communication: mTLS for all service interactions
- Service Discovery: Private DNS for internal name resolution
Prerequisites
Before starting the implementation, ensure you have:
- Linux VMs: Ubuntu 20.04+ or similar modern distribution
- Hardware Requirements:
- Minimum 4 cores and 8GB RAM per VM
- 50GB available storage
- x86_64 architecture
- Network Connectivity: All VMs must be able to communicate with each other
- Root Access: Administrative privileges on all VMs
Implementation Steps
Our implementation follows a modular approach, with each component fulfilling a specific security function within the mesh.
1. Setting Up Cilium for Service Mesh
Cilium provides networking, security, and observability capabilities for our service mesh. It will enforce network policies based on SPIFFE identities.
# Install system dependenciessudo apt update && sudo apt install -y curl wget tar jq git build-essential \ pkg-config libssl-dev linux-headers-$(uname -r)
# Install Cilium CLIexport CILIUM_VERSION="1.14.0"curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/latest/download/cilium-linux-amd64.tar.gzsudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
# Configure Cilium (standalone mode)sudo mkdir -p /etc/ciliumcat << EOF | sudo tee /etc/cilium/config.yaml---cluster-name: standalonecluster-id: 1ipam: mode: "cluster-pool" operator: clusterPoolIPv4PodCIDR: "10.0.0.0/16"tunnel: disabledenableIPv4Masquerade: trueenableIdentityMark: trueendpointRoutes: enabled: trueEOF
# Initialize Ciliumsudo cilium install --config /etc/cilium/config.yaml \ --version ${CILIUM_VERSION#v} \ --set enable-l7-proxy=true \ --set enable-identity=true \ --set enable-host-reachable-services=true
# Create systemd servicecat << EOF | sudo tee /etc/systemd/system/cilium.service[Unit]Description=Cilium AgentAfter=network.target
[Service]Type=simpleExecStart=/usr/local/bin/cilium-agent --config-dir=/etc/ciliumRestart=alwaysUser=root
[Install]WantedBy=multi-user.targetEOF
sudo systemctl daemon-reloadsudo systemctl enable ciliumsudo systemctl start cilium
2. Configuring Private DNS with CoreDNS
A private DNS server is essential for service discovery within our mesh. We’ll use CoreDNS for this purpose.
# Install CoreDNSexport COREDNS_VERSION="1.10.0"wget https://github.com/coredns/coredns/releases/download/v${COREDNS_VERSION}/coredns_${COREDNS_VERSION}_linux_amd64.tgztar xzf coredns_${COREDNS_VERSION}_linux_amd64.tgzsudo mv coredns /usr/local/bin/
# Create directoriessudo mkdir -p /etc/coredns/zones
# Create CoreDNS configurationcat << EOF | sudo tee /etc/coredns/Corefileinternal.cluster.local { file /etc/coredns/zones/internal.cluster.local cache { success 3600 denial 300 } health :8091 prometheus :9153 errors log { class error } reload 10s}
. { forward . /etc/resolv.conf cache 30 errors log}EOF
# Create zone filecat << EOF | sudo tee /etc/coredns/zones/internal.cluster.local\$ORIGIN internal.cluster.local.\$TTL 3600@ IN SOA ns.internal.cluster.local. admin.internal.cluster.local. ( 2023121501 ; serial 7200 ; refresh 3600 ; retry 1209600 ; expire 3600 ; minimum)
@ IN NS ns.internal.cluster.local.ns IN A 127.0.0.1
; Add service entries hereservice1 IN A 10.0.1.1service2 IN A 10.0.1.2EOF
# Create systemd servicecat << EOF | sudo tee /etc/systemd/system/coredns.service[Unit]Description=CoreDNS DNS serverDocumentation=https://coredns.ioAfter=network.target
[Service]ExecStart=/usr/local/bin/coredns -conf /etc/coredns/CorefileRestart=on-failureUser=rootAmbientCapabilities=CAP_NET_BIND_SERVICELimitNOFILE=1048576
[Install]WantedBy=multi-user.targetEOF
sudo systemctl daemon-reloadsudo systemctl enable corednssudo systemctl start coredns
3. Installing SPIRE Server and Agents
SPIFFE/SPIRE provides identity management for our service mesh, enabling zero-trust authentication between services.
# Download and install SPIREexport SPIRE_VERSION="1.8.0"curl -s -N -L https://github.com/spiffe/spire/releases/download/v${SPIRE_VERSION}/spire-${SPIRE_VERSION}-linux-x86_64-glibc.tar.gz | tar xzcd spire-${SPIRE_VERSION}
# Create directoriessudo mkdir -p /opt/spire/{bin,conf,data}sudo cp -r bin/* /opt/spire/bin/
# Configure SPIRE Server (on control plane VM)cat << EOF | sudo tee /opt/spire/conf/server.confserver { bind_address = "0.0.0.0" bind_port = "8081" trust_domain = "internal.cluster.local" data_dir = "/opt/spire/data" log_level = "DEBUG" ca_ttl = "168h" default_svid_ttl = "24h"
plugins { DataStore "sql" { plugin_data { database_type = "sqlite3" connection_string = "/opt/spire/data/datastore.sqlite3" } }
KeyManager "disk" { plugin_data { keys_path = "/opt/spire/data/keys.json" } }
NodeAttestor "join_token" { plugin_data {} } }}EOF
# Create SPIRE server servicecat << EOF | sudo tee /etc/systemd/system/spire-server.service[Unit]Description=SPIRE ServerAfter=network.target
[Service]ExecStart=/opt/spire/bin/spire-server run -config /opt/spire/conf/server.confRestart=alwaysUser=rootWorkingDirectory=/opt/spire
[Install]WantedBy=multi-user.targetEOF
# Start SPIRE serversudo systemctl daemon-reloadsudo systemctl enable spire-serversudo systemctl start spire-server
# Generate a join token for agentsexport SPIRE_JOIN_TOKEN=$(sudo /opt/spire/bin/spire-server token generate -ttl 3600)echo $SPIRE_JOIN_TOKEN # Save this for agent setup
Next, set up the SPIRE agent on each service VM:
# Configure SPIRE agent (on service VMs)cat << EOF | sudo tee /opt/spire/conf/agent.confagent { data_dir = "/opt/spire/data/agent" log_level = "DEBUG" server_address = "CONTROL_PLANE_IP" # Replace with actual IP server_port = "8081" socket_path = "/tmp/spire-agent/public/api.sock" trust_domain = "internal.cluster.local"
plugins { NodeAttestor "join_token" { plugin_data { join_token = "${SPIRE_JOIN_TOKEN}" } }
KeyManager "disk" { plugin_data { directory = "/opt/spire/data/agent" } }
WorkloadAttestor "unix" { plugin_data {} } }}EOF
# Create SPIRE agent servicecat << EOF | sudo tee /etc/systemd/system/spire-agent.service[Unit]Description=SPIRE AgentAfter=network.target
[Service]ExecStart=/opt/spire/bin/spire-agent run -config /opt/spire/conf/agent.confRestart=alwaysUser=rootWorkingDirectory=/opt/spire
[Install]WantedBy=multi-user.targetEOF
# Start SPIRE agentsudo systemctl daemon-reloadsudo systemctl enable spire-agentsudo systemctl start spire-agent
4. Assigning SPIFFE IDs to Services
Next, we need to define SPIFFE identities for our services:
# On the control plane VM, create entries for servicessudo /opt/spire/bin/spire-server entry create \ -spiffeID spiffe://internal.cluster.local/service1 \ -selector unix:user:service1 \ -parentID spiffe://internal.cluster.local/host
sudo /opt/spire/bin/spire-server entry create \ -spiffeID spiffe://internal.cluster.local/service2 \ -selector unix:user:service2 \ -parentID spiffe://internal.cluster.local/host
# Create service users on the respective VMssudo useradd -r -s /bin/false service1 # On VM running service1sudo useradd -r -s /bin/false service2 # On VM running service2
5. Configuring Cilium Network Policies
Now, we’ll set up Cilium network policies based on SPIFFE identities:
# Create policy directorysudo mkdir -p /etc/cilium/policies
# Create basic network policycat << EOF | sudo tee /etc/cilium/policies/basic.yamlapiVersion: "cilium.io/v2"kind: CiliumNetworkPolicymetadata: name: "secure-service-policy"spec: endpointSelector: matchLabels: "spiffe.io/spiffeid": "spiffe://internal.cluster.local/service1" ingress: - fromEndpoints: - matchLabels: "spiffe.io/spiffeid": "spiffe://internal.cluster.local/service2" egress: - toEndpoints: - matchLabels: "spiffe.io/spiffeid": "spiffe://internal.cluster.local/service2"EOF
# Apply policycilium policy import /etc/cilium/policies/basic.yaml
6. Configuring Services for mTLS
Finally, we need to configure our services to use mTLS with SPIFFE identities. This example uses the Go programming language:
package main
import ( "context" "fmt" "net/http"
"github.com/spiffe/go-spiffe/v2/spiffeid" "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig" "github.com/spiffe/go-spiffe/v2/workloadapi")
func main() { ctx, cancel := context.WithCancel(context.Background()) defer cancel()
// Create workload API client client, err := workloadapi.New(ctx, workloadapi.WithAddr("unix:///tmp/spire-agent/public/api.sock")) if err != nil { panic(err) } defer client.Close()
// Create SPIFFE ID for authorized service authorizedID := spiffeid.Must("internal.cluster.local", "service2")
// Create TLS configuration for server serverConfig := tlsconfig.MTLSServerConfig( client, client, tlsconfig.AuthorizeID(authorizedID)) server := &http.Server{ Addr: ":8443", TLSConfig: serverConfig, Handler: http.HandlerFunc(handler), }
// Start the server fmt.Println("Starting secure server on :8443") if err := server.ListenAndServeTLS("", ""); err != nil { panic(err) }}
func handler(w http.ResponseWriter, r *http.Request) { fmt.Fprintf(w, "Hello from secure service\n")}
And for the client:
package main
import ( "context" "fmt" "io/ioutil" "net/http"
"github.com/spiffe/go-spiffe/v2/spiffeid" "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig" "github.com/spiffe/go-spiffe/v2/workloadapi")
func main() { ctx, cancel := context.WithCancel(context.Background()) defer cancel()
// Create workload API client client, err := workloadapi.New(ctx, workloadapi.WithAddr("unix:///tmp/spire-agent/public/api.sock")) if err != nil { panic(err) } defer client.Close()
// Create SPIFFE ID for server serverID := spiffeid.Must("internal.cluster.local", "service1")
// Create TLS configuration for client clientConfig := tlsconfig.MTLSClientConfig( client, client, tlsconfig.AuthorizeID(serverID)) httpClient := &http.Client{ Transport: &http.Transport{ TLSClientConfig: clientConfig, }, }
// Make the request resp, err := httpClient.Get("https://service1.internal.cluster.local:8443") if err != nil { panic(err) } defer resp.Body.Close()
// Read and print the response body, err := ioutil.ReadAll(resp.Body) if err != nil { panic(err) } fmt.Println(string(body))}
Security Benefits and Considerations
This architecture provides multiple layers of security:
Identity and Access Management
- Every service gets a cryptographic identity via SPIFFE/SPIRE
- All service-to-service communication is authenticated
- Zero-trust model where identity is the foundation of security
Network Security
- All traffic between services is encrypted with mTLS
- Network policies are enforced based on identities, not just IPs
- L3/L4/L7 policy enforcement through Cilium
- Network microsegmentation between services
Service Discovery and DNS Security
- Internal service resolution through private DNS
- Protection against DNS spoofing and man-in-the-middle attacks
- Secure mapping between service names and actual endpoints
Best Practices
When implementing this architecture in production, follow these best practices:
Certificate Management
- Rotate SPIFFE certificates regularly (every 24-48 hours)
- Implement proper certificate revocation procedures
- Monitor certificate expirations and renewals
Network Policies
- Start with default-deny policies and gradually open required paths
- Regularly audit network policies
- Limit egress traffic to only necessary destinations
- Use specific L7 policies where appropriate
Monitoring and Alerting
- Set up monitoring for all components
- Create alerts for security-relevant events
- Monitor for unauthorized access attempts
- Track certificate renewal failures
Regular Maintenance
- Keep all components updated with security patches
- Perform regular security audits
- Test recovery procedures
- Document all configurations and procedures
Troubleshooting
If you encounter issues with your implementation, here are some common troubleshooting steps:
SPIRE Connectivity Issues
# Check SPIRE server healthsudo /opt/spire/bin/spire-server healthcheck
# Verify SPIRE agent connectionsudo /opt/spire/bin/spire-agent healthcheck
# Review agent logssudo journalctl -u spire-agent -f
DNS Resolution Problems
# Test DNS resolutiondig @localhost service1.internal.cluster.local
# Check CoreDNS logssudo journalctl -u coredns -f
Cilium Policy Enforcement Issues
# Check Cilium statuscilium status
# View applied policiescilium policy get
# Monitor dropped packetscilium monitor --type drop
Conclusion
Building a secure service mesh without Kubernetes is entirely possible with the right components. By combining SPIFFE/SPIRE for identity management, Cilium for networking and security, and CoreDNS for service discovery, you can create a robust, zero-trust architecture that provides many of the security benefits typically associated with service mesh implementations in Kubernetes.
This approach is particularly valuable for organizations with traditional VM-based infrastructure, regulated environments with specific security requirements, or specialized use cases where Kubernetes may not be the optimal solution.
The architecture described here provides a solid foundation that you can adapt and extend to meet your specific needs, ensuring secure communication between services in your environment.