Building a Secure Service Mesh Without Kubernetes Using SPIFFE, SPIRE, and Cilium#

While Kubernetes has become the de facto standard for container orchestration, many organizations still run services on traditional virtual machines or have specific requirements that make Kubernetes adoption challenging. This guide demonstrates how to implement a secure, zero-trust service mesh on Linux VMs without Kubernetes, using industry-standard components like SPIFFE/SPIRE for identity management, Cilium for networking, and private DNS for service discovery.

Architecture Overview#

Our architecture provides a comprehensive security model with defense-in-depth through multiple layers of protection:

1
graph TD
2
    User[User Terminal] -->|Commands| VM1[Control Plane VM]
3
    VM1 -->|Manages| VM2[Service Node 1]
4
    VM1 -->|Manages| VM3[Service Node 2]
5

6
    subgraph "Control Plane Components"
7
        SPIRE[SPIRE Server]
8
        DNS[CoreDNS]
9
        CiliumC[Cilium Controller]
10
    end
11

12
    subgraph "Service Node Components"
13
        AGENT1[SPIRE Agent]
14
        CILIUM1[Cilium Agent]
15
        SVC1[Service 1]
16

17
        AGENT2[SPIRE Agent]
18
        CILIUM2[Cilium Agent]
19
        SVC2[Service 2]
20
    end
21

22
    SPIRE -->|Issues Identity| AGENT1
23
    SPIRE -->|Issues Identity| AGENT2
24

25
    CiliumC -->|Network Policy| CILIUM1
26
    CiliumC -->|Network Policy| CILIUM2
27

28
    SVC1 -->|mTLS| SVC2
29
    SVC1 -->|DNS Lookup| DNS
30
    SVC2 -->|DNS Lookup| DNS

This architecture implements:

Identity and Access Management: SPIFFE/SPIRE for cryptographic service identity
Network Security: Cilium for policy enforcement and segmentation
Secure Communication: mTLS for all service interactions
Service Discovery: Private DNS for internal name resolution

Prerequisites#

Before starting the implementation, ensure you have:

Linux VMs: Ubuntu 20.04+ or similar modern distribution
Hardware Requirements:
- Minimum 4 cores and 8GB RAM per VM
- 50GB available storage
- x86_64 architecture
Network Connectivity: All VMs must be able to communicate with each other
Root Access: Administrative privileges on all VMs

Implementation Steps#

Our implementation follows a modular approach, with each component fulfilling a specific security function within the mesh.

1. Setting Up Cilium for Service Mesh#

Cilium provides networking, security, and observability capabilities for our service mesh. It will enforce network policies based on SPIFFE identities.

1
# Install system dependencies
2
sudo apt update && sudo apt install -y curl wget tar jq git build-essential \
3
    pkg-config libssl-dev linux-headers-$(uname -r)
4

5
# Install Cilium CLI
6
export CILIUM_VERSION="1.14.0"
7
curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/latest/download/cilium-linux-amd64.tar.gz
8
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
9

10
# Configure Cilium (standalone mode)
11
sudo mkdir -p /etc/cilium
12
cat << EOF | sudo tee /etc/cilium/config.yaml
13
---
14
cluster-name: standalone
15
cluster-id: 1
16
ipam:
17
  mode: "cluster-pool"
18
  operator:
19
    clusterPoolIPv4PodCIDR: "10.0.0.0/16"
20
tunnel: disabled
21
enableIPv4Masquerade: true
22
enableIdentityMark: true
23
endpointRoutes:
24
  enabled: true
25
EOF
26

27
# Initialize Cilium
28
sudo cilium install --config /etc/cilium/config.yaml \
29
    --version ${CILIUM_VERSION#v} \
30
    --set enable-l7-proxy=true \
31
    --set enable-identity=true \
32
    --set enable-host-reachable-services=true
33

34
# Create systemd service
35
cat << EOF | sudo tee /etc/systemd/system/cilium.service
36
[Unit]
37
Description=Cilium Agent
38
After=network.target
39

40
[Service]
41
Type=simple
42
ExecStart=/usr/local/bin/cilium-agent --config-dir=/etc/cilium
43
Restart=always
44
User=root
45

46
[Install]
47
WantedBy=multi-user.target
48
EOF
49

50
sudo systemctl daemon-reload
51
sudo systemctl enable cilium
52
sudo systemctl start cilium

2. Configuring Private DNS with CoreDNS#

A private DNS server is essential for service discovery within our mesh. We’ll use CoreDNS for this purpose.

1
# Install CoreDNS
2
export COREDNS_VERSION="1.10.0"
3
wget https://github.com/coredns/coredns/releases/download/v${COREDNS_VERSION}/coredns_${COREDNS_VERSION}_linux_amd64.tgz
4
tar xzf coredns_${COREDNS_VERSION}_linux_amd64.tgz
5
sudo mv coredns /usr/local/bin/
6

7
# Create directories
8
sudo mkdir -p /etc/coredns/zones
9

10
# Create CoreDNS configuration
11
cat << EOF | sudo tee /etc/coredns/Corefile
12
internal.cluster.local {
13
    file /etc/coredns/zones/internal.cluster.local
14
    cache {
15
        success 3600
16
        denial 300
17
    }
18
    health :8091
19
    prometheus :9153
20
    errors
21
    log {
22
        class error
23
    }
24
    reload 10s
25
}
26

27
. {
28
    forward . /etc/resolv.conf
29
    cache 30
30
    errors
31
    log
32
}
33
EOF
34

35
# Create zone file
36
cat << EOF | sudo tee /etc/coredns/zones/internal.cluster.local
37
\$ORIGIN internal.cluster.local.
38
\$TTL 3600
39
@       IN      SOA     ns.internal.cluster.local. admin.internal.cluster.local. (
40
                        2023121501 ; serial
41
                        7200       ; refresh
42
                        3600       ; retry
43
                        1209600    ; expire
44
                        3600       ; minimum
45
)
46

47
@       IN      NS      ns.internal.cluster.local.
48
ns      IN      A       127.0.0.1
49

50
; Add service entries here
51
service1 IN      A       10.0.1.1
52
service2 IN      A       10.0.1.2
53
EOF
54

55
# Create systemd service
56
cat << EOF | sudo tee /etc/systemd/system/coredns.service
57
[Unit]
58
Description=CoreDNS DNS server
59
Documentation=https://coredns.io
60
After=network.target
61

62
[Service]
63
ExecStart=/usr/local/bin/coredns -conf /etc/coredns/Corefile
64
Restart=on-failure
65
User=root
66
AmbientCapabilities=CAP_NET_BIND_SERVICE
67
LimitNOFILE=1048576
68

69
[Install]
70
WantedBy=multi-user.target
71
EOF
72

73
sudo systemctl daemon-reload
74
sudo systemctl enable coredns
75
sudo systemctl start coredns

3. Installing SPIRE Server and Agents#

SPIFFE/SPIRE provides identity management for our service mesh, enabling zero-trust authentication between services.

1
# Download and install SPIRE
2
export SPIRE_VERSION="1.8.0"
3
curl -s -N -L https://github.com/spiffe/spire/releases/download/v${SPIRE_VERSION}/spire-${SPIRE_VERSION}-linux-x86_64-glibc.tar.gz | tar xz
4
cd spire-${SPIRE_VERSION}
5

6
# Create directories
7
sudo mkdir -p /opt/spire/{bin,conf,data}
8
sudo cp -r bin/* /opt/spire/bin/
9

10
# Configure SPIRE Server (on control plane VM)
11
cat << EOF | sudo tee /opt/spire/conf/server.conf
12
server {
13
    bind_address = "0.0.0.0"
14
    bind_port = "8081"
15
    trust_domain = "internal.cluster.local"
16
    data_dir = "/opt/spire/data"
17
    log_level = "DEBUG"
18
    ca_ttl = "168h"
19
    default_svid_ttl = "24h"
20

21
    plugins {
22
        DataStore "sql" {
23
            plugin_data {
24
                database_type = "sqlite3"
25
                connection_string = "/opt/spire/data/datastore.sqlite3"
26
            }
27
        }
28

29
        KeyManager "disk" {
30
            plugin_data {
31
                keys_path = "/opt/spire/data/keys.json"
32
            }
33
        }
34

35
        NodeAttestor "join_token" {
36
            plugin_data {}
37
        }
38
    }
39
}
40
EOF
41

42
# Create SPIRE server service
43
cat << EOF | sudo tee /etc/systemd/system/spire-server.service
44
[Unit]
45
Description=SPIRE Server
46
After=network.target
47

48
[Service]
49
ExecStart=/opt/spire/bin/spire-server run -config /opt/spire/conf/server.conf
50
Restart=always
51
User=root
52
WorkingDirectory=/opt/spire
53

54
[Install]
55
WantedBy=multi-user.target
56
EOF
57

58
# Start SPIRE server
59
sudo systemctl daemon-reload
60
sudo systemctl enable spire-server
61
sudo systemctl start spire-server
62

63
# Generate a join token for agents
64
export SPIRE_JOIN_TOKEN=$(sudo /opt/spire/bin/spire-server token generate -ttl 3600)
65
echo $SPIRE_JOIN_TOKEN  # Save this for agent setup

Next, set up the SPIRE agent on each service VM:

1
# Configure SPIRE agent (on service VMs)
2
cat << EOF | sudo tee /opt/spire/conf/agent.conf
3
agent {
4
    data_dir = "/opt/spire/data/agent"
5
    log_level = "DEBUG"
6
    server_address = "CONTROL_PLANE_IP"  # Replace with actual IP
7
    server_port = "8081"
8
    socket_path = "/tmp/spire-agent/public/api.sock"
9
    trust_domain = "internal.cluster.local"
10

11
    plugins {
12
        NodeAttestor "join_token" {
13
            plugin_data {
14
                join_token = "${SPIRE_JOIN_TOKEN}"
15
            }
16
        }
17

18
        KeyManager "disk" {
19
            plugin_data {
20
                directory = "/opt/spire/data/agent"
21
            }
22
        }
23

24
        WorkloadAttestor "unix" {
25
            plugin_data {}
26
        }
27
    }
28
}
29
EOF
30

31
# Create SPIRE agent service
32
cat << EOF | sudo tee /etc/systemd/system/spire-agent.service
33
[Unit]
34
Description=SPIRE Agent
35
After=network.target
36

37
[Service]
38
ExecStart=/opt/spire/bin/spire-agent run -config /opt/spire/conf/agent.conf
39
Restart=always
40
User=root
41
WorkingDirectory=/opt/spire
42

43
[Install]
44
WantedBy=multi-user.target
45
EOF
46

47
# Start SPIRE agent
48
sudo systemctl daemon-reload
49
sudo systemctl enable spire-agent
50
sudo systemctl start spire-agent

4. Assigning SPIFFE IDs to Services#

Next, we need to define SPIFFE identities for our services:

1
# On the control plane VM, create entries for services
2
sudo /opt/spire/bin/spire-server entry create \
3
    -spiffeID spiffe://internal.cluster.local/service1 \
4
    -selector unix:user:service1 \
5
    -parentID spiffe://internal.cluster.local/host
6

7
sudo /opt/spire/bin/spire-server entry create \
8
    -spiffeID spiffe://internal.cluster.local/service2 \
9
    -selector unix:user:service2 \
10
    -parentID spiffe://internal.cluster.local/host
11

12
# Create service users on the respective VMs
13
sudo useradd -r -s /bin/false service1  # On VM running service1
14
sudo useradd -r -s /bin/false service2  # On VM running service2

5. Configuring Cilium Network Policies#

Now, we’ll set up Cilium network policies based on SPIFFE identities:

1
# Create policy directory
2
sudo mkdir -p /etc/cilium/policies
3

4
# Create basic network policy
5
cat << EOF | sudo tee /etc/cilium/policies/basic.yaml
6
apiVersion: "cilium.io/v2"
7
kind: CiliumNetworkPolicy
8
metadata:
9
  name: "secure-service-policy"
10
spec:
11
  endpointSelector:
12
    matchLabels:
13
      "spiffe.io/spiffeid": "spiffe://internal.cluster.local/service1"
14
  ingress:
15
  - fromEndpoints:
16
    - matchLabels:
17
        "spiffe.io/spiffeid": "spiffe://internal.cluster.local/service2"
18
  egress:
19
  - toEndpoints:
20
    - matchLabels:
21
        "spiffe.io/spiffeid": "spiffe://internal.cluster.local/service2"
22
EOF
23

24
# Apply policy
25
cilium policy import /etc/cilium/policies/basic.yaml

6. Configuring Services for mTLS#

Finally, we need to configure our services to use mTLS with SPIFFE identities. This example uses the Go programming language:

1
package main
2

3
import (
4
    "context"
5
    "fmt"
6
    "net/http"
7

8
    "github.com/spiffe/go-spiffe/v2/spiffeid"
9
    "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig"
10
    "github.com/spiffe/go-spiffe/v2/workloadapi"
11
)
12

13
func main() {
14
    ctx, cancel := context.WithCancel(context.Background())
15
    defer cancel()
16

17
    // Create workload API client
18
    client, err := workloadapi.New(ctx, workloadapi.WithAddr("unix:///tmp/spire-agent/public/api.sock"))
19
    if err != nil {
20
        panic(err)
21
    }
22
    defer client.Close()
23

24
    // Create SPIFFE ID for authorized service
25
    authorizedID := spiffeid.Must("internal.cluster.local", "service2")
26

27
    // Create TLS configuration for server
28
    serverConfig := tlsconfig.MTLSServerConfig(
29
        client, client, tlsconfig.AuthorizeID(authorizedID))
30
    server := &http.Server{
31
        Addr:      ":8443",
32
        TLSConfig: serverConfig,
33
        Handler:   http.HandlerFunc(handler),
34
    }
35

36
    // Start the server
37
    fmt.Println("Starting secure server on :8443")
38
    if err := server.ListenAndServeTLS("", ""); err != nil {
39
        panic(err)
40
    }
41
}
42

43
func handler(w http.ResponseWriter, r *http.Request) {
44
    fmt.Fprintf(w, "Hello from secure service\n")
45
}

And for the client:

1
package main
2

3
import (
4
    "context"
5
    "fmt"
6
    "io/ioutil"
7
    "net/http"
8

9
    "github.com/spiffe/go-spiffe/v2/spiffeid"
10
    "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig"
11
    "github.com/spiffe/go-spiffe/v2/workloadapi"
12
)
13

14
func main() {
15
    ctx, cancel := context.WithCancel(context.Background())
16
    defer cancel()
17

18
    // Create workload API client
19
    client, err := workloadapi.New(ctx, workloadapi.WithAddr("unix:///tmp/spire-agent/public/api.sock"))
20
    if err != nil {
21
        panic(err)
22
    }
23
    defer client.Close()
24

25
    // Create SPIFFE ID for server
26
    serverID := spiffeid.Must("internal.cluster.local", "service1")
27

28
    // Create TLS configuration for client
29
    clientConfig := tlsconfig.MTLSClientConfig(
30
        client, client, tlsconfig.AuthorizeID(serverID))
31
    httpClient := &http.Client{
32
        Transport: &http.Transport{
33
            TLSClientConfig: clientConfig,
34
        },
35
    }
36

37
    // Make the request
38
    resp, err := httpClient.Get("https://service1.internal.cluster.local:8443")
39
    if err != nil {
40
        panic(err)
41
    }
42
    defer resp.Body.Close()
43

44
    // Read and print the response
45
    body, err := ioutil.ReadAll(resp.Body)
46
    if err != nil {
47
        panic(err)
48
    }
49
    fmt.Println(string(body))
50
}

Security Benefits and Considerations#

This architecture provides multiple layers of security:

Identity and Access Management#

Every service gets a cryptographic identity via SPIFFE/SPIRE
All service-to-service communication is authenticated
Zero-trust model where identity is the foundation of security

Network Security#

All traffic between services is encrypted with mTLS
Network policies are enforced based on identities, not just IPs
L3/L4/L7 policy enforcement through Cilium
Network microsegmentation between services

Service Discovery and DNS Security#

Internal service resolution through private DNS
Protection against DNS spoofing and man-in-the-middle attacks
Secure mapping between service names and actual endpoints

Best Practices#

When implementing this architecture in production, follow these best practices:

Certificate Management#

Rotate SPIFFE certificates regularly (every 24-48 hours)
Implement proper certificate revocation procedures
Monitor certificate expirations and renewals

Network Policies#

Start with default-deny policies and gradually open required paths
Regularly audit network policies
Limit egress traffic to only necessary destinations
Use specific L7 policies where appropriate

Monitoring and Alerting#

Set up monitoring for all components
Create alerts for security-relevant events
Monitor for unauthorized access attempts
Track certificate renewal failures

Regular Maintenance#

Keep all components updated with security patches
Perform regular security audits
Test recovery procedures
Document all configurations and procedures

Troubleshooting#

If you encounter issues with your implementation, here are some common troubleshooting steps:

SPIRE Connectivity Issues#

1
# Check SPIRE server health
2
sudo /opt/spire/bin/spire-server healthcheck
3

4
# Verify SPIRE agent connection
5
sudo /opt/spire/bin/spire-agent healthcheck
6

7
# Review agent logs
8
sudo journalctl -u spire-agent -f

DNS Resolution Problems#

1
# Test DNS resolution
2
dig @localhost service1.internal.cluster.local
3

4
# Check CoreDNS logs
5
sudo journalctl -u coredns -f

Cilium Policy Enforcement Issues#

1
# Check Cilium status
2
cilium status
3

4
# View applied policies
5
cilium policy get
6

7
# Monitor dropped packets
8
cilium monitor --type drop

Conclusion#

Building a secure service mesh without Kubernetes is entirely possible with the right components. By combining SPIFFE/SPIRE for identity management, Cilium for networking and security, and CoreDNS for service discovery, you can create a robust, zero-trust architecture that provides many of the security benefits typically associated with service mesh implementations in Kubernetes.

This approach is particularly valuable for organizations with traditional VM-based infrastructure, regulated environments with specific security requirements, or specialized use cases where Kubernetes may not be the optimal solution.

The architecture described here provides a solid foundation that you can adapt and extend to meet your specific needs, ensuring secure communication between services in your environment.