Running Multiple Services in Docker Like a Pro: My s6-overlay Production Setup#

Why I Chose s6-overlay Over Other Solutions#

After years of wrestling with various containerization patterns—and watching countless engineers religiously chant the “one process per container” mantra like it’s some sacred doctrine—I’ve learned that reality has this annoying habit of being more complicated than purist ideologies. Shocking, I know.

When you’re building complex systems like OpenSearch clusters with monitoring agents, log processors, and health checkers, you need reliable multi-service supervision—not lectures about container purity from people who’ve never deployed anything more complex than a “Hello World” app.

I’ve tried systemd in containers (because apparently 200MB+ overhead is “efficient”), supervisord (ah yes, Python dependency hell in containers—genius!), and custom bash scripts (which work great until 3 AM when everything breaks). s6-overlay is different—it’s minimal, container-native, and actually works reliably in production. Revolutionary concept, I know.

What Makes s6-overlay Superior#

The Technical Advantages#

Feature	s6-overlay	systemd	supervisord	Custom Scripts
Size	~2MB	~200MB+	~50MB+	Minimal
Boot Time	~100ms	~5s	~2s	Variable
Memory Usage	~1MB	~50MB+	~15MB+	~5MB
Zombie Handling	Excellent	Good	Poor	Manual
Container Native	✅	❌	⚠️	✅
Process Tree Cleanup	Perfect	Good	Poor	Manual

Where I Use s6-overlay#

In my production environments, s6-overlay powers all the things that would make container purists hyperventilate:

OpenSearch Clusters: Main OpenSearch + monitoring + log rotation (because apparently running three separate containers for this is “better”)
Data Pipeline Containers: Kafka processors + health checkers + metrics exporters (gasp! Multiple processes!)
Security Tools: Log analyzers + alerting + backup processes (the horror of actually useful containers)
Development Environments: Database + cache + background workers (practical? In MY containers? Unthinkable!)

My Proven Architecture Pattern#

Container Service Hierarchy#

1
graph TD
2
    subgraph "Docker Container"
3
        A[s6-overlay Init PID 1]
4

5
        subgraph "Service Layer"
6
            B[Primary Service]
7
            C[Monitor Service]
8
            D[Helper Service]
9
            E[Cleanup Service]
10
        end
11

12
        subgraph "Process Management"
13
            F[s6-supervise B]
14
            G[s6-supervise C]
15
            H[s6-supervise D]
16
            I[s6-supervise E]
17
        end
18
    end
19

20
    A --> F
21
    A --> G
22
    A --> H
23
    A --> I
24

25
    F --> B
26
    G --> C
27
    H --> D
28
    I --> E
29

30
    style A fill:#ff6b6b
31
    style B fill:#51cf66
32
    style F fill:#4dabf7

The Three-Tier Service Pattern#

I’ve developed a consistent pattern for organizing services—because apparently having a plan is radical in the world of container architecture:

Primary Services: Core application logic (OpenSearch, databases)—you know, the stuff that actually matters
Support Services: Monitoring, health checks, log rotation—the things that keep your primary services from failing spectacularly at 2 AM
Maintenance Services: Cleanup, backup, metric collection—the unglamorous work that prevents your infrastructure from becoming a digital wasteland

Building Production-Ready Multi-Service Containers#

Project Structure I Use#

Here’s my standardized directory structure for multi-service containers:

1
opensearch-multi/
2
├── Dockerfile
3
├── docker-compose.yml
4
├── config/
5
│   ├── opensearch.yml
6
│   ├── jvm.options
7
│   └── log4j2.properties
8
├── services/
9
│   ├── opensearch/
10
│   │   ├── run
11
│   │   ├── finish
12
│   │   └── down
13
│   ├── node-exporter/
14
│   │   ├── run
15
│   │   └── healthcheck.sh
16
│   ├── log-rotator/
17
│   │   ├── run
18
│   │   └── rotate-logs.sh
19
│   └── cluster-monitor/
20
│       ├── run
21
│       └── cluster-health.py
22
├── scripts/
23
│   ├── setup-opensearch.sh
24
│   ├── pre-start.sh
25
│   └── post-start.sh
26
└── monitoring/
27
    ├── dashboards/
28
    └── alerts/

My Production Dockerfile#

This is my battle-tested Dockerfile template that I adapt for different applications—because unlike those toy examples you see in tutorials, this one actually works in production:

1
# Multi-Service Container with s6-overlay
2
# Optimized for production OpenSearch deployments
3
# Built by: Anubhav Gain
4

5
FROM alpine:3.18 AS base
6

7
# Metadata
8
LABEL maintainer="Anubhav Gain <iamanubhavgain@gmail.com>"
9
LABEL description="Production-ready multi-service container with s6-overlay"
10
LABEL version="2.1.0"
11

12
# Install base dependencies
13
RUN apk add --no-cache \
14
    bash \
15
    curl \
16
    ca-certificates \
17
    tzdata \
18
    tini \
19
    procps \
20
    htop \
21
    && rm -rf /var/cache/apk/*
22

23
# Install s6-overlay
24
ARG S6_VERSION=v3.1.5.0
25
ARG S6_ARCH=x86_64
26

27
RUN curl -L -o /tmp/s6-overlay.tar.xz \
28
    "https://github.com/just-containers/s6-overlay/releases/download/${S6_VERSION}/s6-overlay-${S6_ARCH}.tar.xz" \
29
    && tar -C / -Jxpf /tmp/s6-overlay.tar.xz \
30
    && rm -f /tmp/s6-overlay.tar.xz
31

32
# Java runtime for OpenSearch
33
FROM base AS java-runtime
34

35
RUN apk add --no-cache openjdk17-jre-headless \
36
    && rm -rf /var/cache/apk/*
37

38
# Set Java environment
39
ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk
40
ENV PATH=$PATH:$JAVA_HOME/bin
41

42
# OpenSearch installation
43
FROM java-runtime AS opensearch-base
44

45
# Create opensearch user
46
RUN addgroup -g 1000 opensearch && \
47
    adduser -u 1000 -G opensearch -s /bin/bash -D opensearch
48

49
# Install OpenSearch
50
ARG OPENSEARCH_VERSION=2.11.1
51
ARG OPENSEARCH_HOME=/usr/share/opensearch
52

53
RUN mkdir -p ${OPENSEARCH_HOME} \
54
    && curl -L -o opensearch.tar.gz \
55
       "https://artifacts.opensearch.org/releases/bundle/opensearch/${OPENSEARCH_VERSION}/opensearch-${OPENSEARCH_VERSION}-linux-x64.tar.gz" \
56
    && tar -xzf opensearch.tar.gz -C ${OPENSEARCH_HOME} --strip-components=1 \
57
    && rm opensearch.tar.gz \
58
    && chown -R opensearch:opensearch ${OPENSEARCH_HOME}
59

60
# Set OpenSearch environment
61
ENV OPENSEARCH_HOME=${OPENSEARCH_HOME}
62
ENV OPENSEARCH_PATH_CONF=${OPENSEARCH_HOME}/config
63
ENV PATH=$PATH:${OPENSEARCH_HOME}/bin
64

65
# Final production image
66
FROM opensearch-base AS production
67

68
# Install additional monitoring tools
69
RUN apk add --no-cache \
70
    python3 \
71
    py3-pip \
72
    py3-requests \
73
    node-exporter \
74
    logrotate \
75
    && rm -rf /var/cache/apk/*
76

77
# Install Python monitoring dependencies
78
RUN pip3 install --no-cache-dir \
79
    opensearch-py \
80
    prometheus_client \
81
    psutil \
82
    pyyaml
83

84
# Create directory structure
85
RUN mkdir -p \
86
    /etc/services.d \
87
    /etc/cont-init.d \
88
    /etc/cont-finish.d \
89
    /var/log/services \
90
    /var/lib/opensearch \
91
    /usr/local/bin/monitoring \
92
    && chown -R opensearch:opensearch /var/lib/opensearch
93

94
# Copy service definitions
95
COPY services/ /etc/services.d/
96
COPY scripts/ /usr/local/bin/
97
COPY config/ /usr/share/opensearch/config/
98
COPY monitoring/ /usr/local/bin/monitoring/
99

100
# Set executable permissions
101
RUN find /etc/services.d -name run -exec chmod +x {} \; \
102
    && find /etc/services.d -name finish -exec chmod +x {} \; \
103
    && find /usr/local/bin -name "*.sh" -exec chmod +x {} \; \
104
    && find /usr/local/bin -name "*.py" -exec chmod +x {} \;
105

106
# Environment configuration
107
ENV S6_BEHAVIOUR_IF_STAGE2_FAILS=2
108
ENV S6_KEEP_ENV=1
109
ENV S6_LOGGING=0
110
ENV S6_CMD_WAIT_FOR_SERVICES_MAXTIME=30000
111

112
# Expose ports
113
EXPOSE 9200 9300 9100 8080
114

115
# Health check
116
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
117
    CMD curl -f http://localhost:9200/_cluster/health || exit 1
118

119
# Volumes
120
VOLUME ["/var/lib/opensearch", "/var/log/services"]
121

122
# Switch to opensearch user
123
USER opensearch
124

125
# Set working directory
126
WORKDIR /usr/share/opensearch
127

128
# Use s6-overlay as init
129
ENTRYPOINT ["/init"]

Service Definitions That Actually Work#

Primary Service: OpenSearch#

1
#!/usr/bin/with-contenv bash
2
# Main OpenSearch service with proper signal handling
3

4
# Source environment
5
source /usr/local/bin/setup-environment.sh
6

7
# Pre-start validations
8
if ! /usr/local/bin/pre-start.sh; then
9
    echo "Pre-start checks failed, exiting..."
10
    exit 1
11
fi
12

13
# Set JVM options based on container memory
14
CONTAINER_MEMORY=$(cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>/dev/null || echo "2147483648")
15
HEAP_SIZE=$((CONTAINER_MEMORY / 2))
16

17
export OPENSEARCH_JAVA_OPTS="-Xms${HEAP_SIZE} -Xmx${HEAP_SIZE} -XX:+UseG1GC -XX:MaxGCPauseMillis=200"
18

19
# Ensure proper ownership
20
chown -R opensearch:opensearch /var/lib/opensearch
21

22
# Start OpenSearch with proper signal handling
23
cd /usr/share/opensearch
24

25
exec opensearch \
26
    -Epath.data=/var/lib/opensearch \
27
    -Epath.logs=/var/log/services/opensearch \
28
    -Ecluster.name=docker-cluster \
29
    -Enode.name=${HOSTNAME} \
30
    -Enetwork.host=0.0.0.0 \
31
    -Ediscovery.type=single-node \
32
    -Eopensearch.security.ssl.http.enabled=false \
33
    -Eopensearch.security.disabled=true

Support Service: Node Exporter#

1
#!/usr/bin/with-contenv bash
2
# Prometheus node exporter for system metrics
3

4
echo "Starting Node Exporter for system metrics..."
5

6
# Wait for primary service to be ready
7
/usr/local/bin/wait-for-service.sh opensearch 30
8

9
# Start node exporter
10
exec node_exporter \
11
    --web.listen-address=:9100 \
12
    --path.procfs=/host/proc \
13
    --path.sysfs=/host/sys \
14
    --collector.filesystem.ignored-mount-points='^/(dev|proc|sys|var/lib/docker/.+)($|/)' \
15
    --collector.textfile.directory=/var/log/services/metrics

Maintenance Service: Log Rotator#

1
#!/usr/bin/with-contenv bash
2
# Intelligent log rotation service
3

4
echo "Starting log rotation service..."
5

6
while true; do
7
    # Run log rotation script
8
    /usr/local/bin/rotate-logs.sh
9

10
    # Sleep for 1 hour
11
    sleep 3600
12
done

Monitoring Service: Cluster Health#

1
#!/usr/bin/with-contenv bash
2
# OpenSearch cluster health monitoring
3

4
echo "Starting cluster health monitor..."
5

6
# Wait for OpenSearch to be ready
7
/usr/local/bin/wait-for-service.sh opensearch 60
8

9
# Start monitoring
10
exec /usr/local/bin/monitoring/cluster-health.py

Smart Support Scripts#

Environment Setup Script#

1
#!/bin/bash
2
# Environment configuration and validation
3

4
set -e
5

6
# Detect container resources
7
detect_resources() {
8
    CONTAINER_CPUS=$(nproc)
9
    CONTAINER_MEMORY=$(cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>/dev/null || echo "2147483648")
10
    CONTAINER_MEMORY_GB=$((CONTAINER_MEMORY / 1024 / 1024 / 1024))
11

12
    export CONTAINER_CPUS
13
    export CONTAINER_MEMORY
14
    export CONTAINER_MEMORY_GB
15

16
    echo "Container Resources: ${CONTAINER_CPUS} CPUs, ${CONTAINER_MEMORY_GB}GB RAM"
17
}
18

19
# Validate minimum requirements
20
validate_requirements() {
21
    if [[ $CONTAINER_MEMORY_GB -lt 2 ]]; then
22
        echo "ERROR: Minimum 2GB RAM required for OpenSearch"
23
        exit 1
24
    fi
25

26
    if [[ ! -d /var/lib/opensearch ]]; then
27
        echo "ERROR: Data directory not mounted"
28
        exit 1
29
    fi
30
}
31

32
# Configure based on resources
33
configure_services() {
34
    # Adjust thread pools based on CPU count
35
    if [[ $CONTAINER_CPUS -gt 8 ]]; then
36
        export OPENSEARCH_PROCESSORS=$((CONTAINER_CPUS - 2))
37
    else
38
        export OPENSEARCH_PROCESSORS=$CONTAINER_CPUS
39
    fi
40

41
    # Set appropriate JVM heap size
42
    HEAP_SIZE_GB=$((CONTAINER_MEMORY_GB / 2))
43
    if [[ $HEAP_SIZE_GB -gt 32 ]]; then
44
        HEAP_SIZE_GB=32  # OpenSearch recommendation
45
    fi
46

47
    export HEAP_SIZE_GB
48
}
49

50
# Main execution
51
main() {
52
    echo "Setting up container environment..."
53

54
    detect_resources
55
    validate_requirements
56
    configure_services
57

58
    echo "Environment setup completed successfully"
59
}
60

61
main "$@"

Service Wait Script#

1
#!/bin/bash
2
# Wait for services to become available
3

4
SERVICE_NAME=$1
5
TIMEOUT=${2:-30}
6
PORT=${3:-9200}
7

8
wait_for_service() {
9
    local counter=0
10

11
    echo "Waiting for $SERVICE_NAME to become available (timeout: ${TIMEOUT}s)..."
12

13
    while [ $counter -lt $TIMEOUT ]; do
14
        if curl -sf http://localhost:$PORT/_cluster/health >/dev/null 2>&1; then
15
            echo "$SERVICE_NAME is ready!"
16
            return 0
17
        fi
18

19
        echo "Waiting for $SERVICE_NAME... ($counter/$TIMEOUT)"
20
        sleep 1
21
        counter=$((counter + 1))
22
    done
23

24
    echo "ERROR: $SERVICE_NAME failed to start within $TIMEOUT seconds"
25
    return 1
26
}
27

28
wait_for_service

Cluster Health Monitor#

1
#!/usr/bin/env python3
2
# Advanced OpenSearch cluster health monitoring
3

4
import time
5
import json
6
import logging
7
import requests
8
from datetime import datetime, timedelta
9
from prometheus_client import start_http_server, Gauge, Counter
10

11
# Configure logging
12
logging.basicConfig(
13
    level=logging.INFO,
14
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
15
)
16
logger = logging.getLogger(__name__)
17

18
# Prometheus metrics
19
CLUSTER_STATUS = Gauge('opensearch_cluster_status', 'Cluster status (0=red, 1=yellow, 2=green)')
20
NODE_COUNT = Gauge('opensearch_nodes_total', 'Total number of nodes')
21
ACTIVE_SHARDS = Gauge('opensearch_active_shards_total', 'Number of active shards')
22
SEARCH_RATE = Gauge('opensearch_search_rate_per_sec', 'Search requests per second')
23
INDEX_RATE = Gauge('opensearch_index_rate_per_sec', 'Index requests per second')
24

25
class OpenSearchMonitor:
26
    def __init__(self, host='localhost', port=9200):
27
        self.base_url = f'http://{host}:{port}'
28
        self.session = requests.Session()
29
        self.session.timeout = 10
30

31
        # Start Prometheus metrics server
32
        start_http_server(8080)
33
        logger.info("Started Prometheus metrics server on port 8080")
34

35
    def get_cluster_health(self):
36
        """Get cluster health information"""
37
        try:
38
            response = self.session.get(f'{self.base_url}/_cluster/health')
39
            response.raise_for_status()
40
            return response.json()
41
        except requests.RequestException as e:
42
            logger.error(f"Failed to get cluster health: {e}")
43
            return None
44

45
    def get_cluster_stats(self):
46
        """Get cluster statistics"""
47
        try:
48
            response = self.session.get(f'{self.base_url}/_cluster/stats')
49
            response.raise_for_status()
50
            return response.json()
51
        except requests.RequestException as e:
52
            logger.error(f"Failed to get cluster stats: {e}")
53
            return None
54

55
    def get_node_stats(self):
56
        """Get node statistics"""
57
        try:
58
            response = self.session.get(f'{self.base_url}/_nodes/stats')
59
            response.raise_for_status()
60
            return response.json()
61
        except requests.RequestException as e:
62
            logger.error(f"Failed to get node stats: {e}")
63
            return None
64

65
    def update_metrics(self):
66
        """Update Prometheus metrics"""
67

68
        # Cluster health
69
        health = self.get_cluster_health()
70
        if health:
71
            status_map = {'red': 0, 'yellow': 1, 'green': 2}
72
            CLUSTER_STATUS.set(status_map.get(health['status'], 0))
73
            NODE_COUNT.set(health['number_of_nodes'])
74
            ACTIVE_SHARDS.set(health['active_shards'])
75

76
            logger.info(f"Cluster status: {health['status']}, Nodes: {health['number_of_nodes']}")
77

78
        # Node statistics
79
        stats = self.get_node_stats()
80
        if stats:
81
            # Calculate search and index rates
82
            total_search = sum(node['indices']['search']['query_total']
83
                             for node in stats['nodes'].values())
84
            total_index = sum(node['indices']['indexing']['index_total']
85
                            for node in stats['nodes'].values())
86

87
            # Store for rate calculation (simplified)
88
            SEARCH_RATE.set(total_search)
89
            INDEX_RATE.set(total_index)
90

91
    def check_disk_usage(self):
92
        """Check disk usage and warn if high"""
93
        try:
94
            response = self.session.get(f'{self.base_url}/_nodes/stats/fs')
95
            response.raise_for_status()
96
            stats = response.json()
97

98
            for node_id, node in stats['nodes'].items():
99
                fs_data = node['fs']['total']
100
                used_percent = (fs_data['total_in_bytes'] - fs_data['available_in_bytes']) / fs_data['total_in_bytes'] * 100
101

102
                if used_percent > 85:
103
                    logger.warning(f"Node {node['name']} disk usage at {used_percent:.1f}%")
104
                elif used_percent > 95:
105
                    logger.critical(f"Node {node['name']} disk usage critical: {used_percent:.1f}%")
106

107
        except requests.RequestException as e:
108
            logger.error(f"Failed to check disk usage: {e}")
109

110
    def run_monitoring_loop(self):
111
        """Main monitoring loop"""
112
        logger.info("Starting OpenSearch monitoring loop...")
113

114
        while True:
115
            try:
116
                self.update_metrics()
117
                self.check_disk_usage()
118

119
                # Sleep for 30 seconds between checks
120
                time.sleep(30)
121

122
            except KeyboardInterrupt:
123
                logger.info("Monitoring stopped by user")
124
                break
125
            except Exception as e:
126
                logger.error(f"Monitoring error: {e}")
127
                time.sleep(60)  # Wait longer on error
128

129
if __name__ == "__main__":
130
    monitor = OpenSearchMonitor()
131
    monitor.run_monitoring_loop()

Docker Compose for Development#

Complete Development Stack#

1
version: '3.8'
2

3
# Multi-service OpenSearch development environment
4
# Optimized for local development and testing
5
# Created by: Anubhav Gain
6

7
services:
8
  opensearch-multi:
9
    build:
10
      context: .
11
      dockerfile: Dockerfile
12
      target: production
13
    container_name: opensearch-multi
14
    hostname: opensearch-node1
15
    environment:
16
      - OPENSEARCH_CLUSTER_NAME=dev-cluster
17
      - OPENSEARCH_NODE_NAME=node1
18
      - OPENSEARCH_HEAP_SIZE=1g
19
      - OPENSEARCH_DISCOVERY_TYPE=single-node
20
      - OPENSEARCH_SECURITY_DISABLED=true
21
    ports:
22
      - "9200:9200"    # OpenSearch API
23
      - "9300:9300"    # OpenSearch cluster communication
24
      - "9100:9100"    # Node Exporter metrics
25
      - "8080:8080"    # Cluster health metrics
26
    volumes:
27
      - opensearch-data:/var/lib/opensearch
28
      - opensearch-logs:/var/log/services
29
      - ./config:/usr/share/opensearch/config:ro
30
    networks:
31
      - opensearch-net
32
    ulimits:
33
      memlock:
34
        soft: -1
35
        hard: -1
36
      nofile:
37
        soft: 65536
38
        hard: 65536
39
    deploy:
40
      resources:
41
        limits:
42
          memory: 4G
43
        reservations:
44
          memory: 2G
45

46
  # Grafana for monitoring dashboards
47
  grafana:
48
    image: grafana/grafana:10.2.0
49
    container_name: grafana
50
    environment:
51
      - GF_SECURITY_ADMIN_PASSWORD=admin
52
      - GF_USERS_ALLOW_SIGN_UP=false
53
    ports:
54
      - "3000:3000"
55
    volumes:
56
      - grafana-data:/var/lib/grafana
57
      - ./monitoring/dashboards:/etc/grafana/provisioning/dashboards:ro
58
      - ./monitoring/datasources:/etc/grafana/provisioning/datasources:ro
59
    networks:
60
      - opensearch-net
61
    depends_on:
62
      - opensearch-multi
63

64
  # Prometheus for metrics collection
65
  prometheus:
66
    image: prom/prometheus:v2.47.0
67
    container_name: prometheus
68
    command:
69
      - '--config.file=/etc/prometheus/prometheus.yml'
70
      - '--storage.tsdb.path=/prometheus'
71
      - '--web.console.libraries=/etc/prometheus/console_libraries'
72
      - '--web.console.templates=/etc/prometheus/consoles'
73
    ports:
74
      - "9090:9090"
75
    volumes:
76
      - prometheus-data:/prometheus
77
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
78
    networks:
79
      - opensearch-net
80
    depends_on:
81
      - opensearch-multi
82

83
volumes:
84
  opensearch-data:
85
    driver: local
86
  opensearch-logs:
87
    driver: local
88
  grafana-data:
89
    driver: local
90
  prometheus-data:
91
    driver: local
92

93
networks:
94
  opensearch-net:
95
    driver: bridge
96
    ipam:
97
      config:
98
        - subnet: 172.20.0.0/16

Production Deployment Strategies#

Kubernetes Deployment#

1
apiVersion: apps/v1
2
kind: StatefulSet
3
metadata:
4
  name: opensearch-multi
5
  namespace: logging
6
  labels:
7
    app: opensearch-multi
8
    component: opensearch
9
spec:
10
  serviceName: opensearch-multi
11
  replicas: 3
12
  selector:
13
    matchLabels:
14
      app: opensearch-multi
15
  template:
16
    metadata:
17
      labels:
18
        app: opensearch-multi
19
    spec:
20
      securityContext:
21
        fsGroup: 1000
22
      initContainers:
23
        - name: sysctl-init
24
          image: alpine:3.18
25
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
26
          securityContext:
27
            privileged: true
28
      containers:
29
        - name: opensearch-multi
30
          image: anubhavgain/opensearch-multi:latest
31
          ports:
32
            - containerPort: 9200
33
              name: http
34
            - containerPort: 9300
35
              name: transport
36
            - containerPort: 9100
37
              name: node-exporter
38
            - containerPort: 8080
39
              name: health-metrics
40
          env:
41
            - name: OPENSEARCH_CLUSTER_NAME
42
              value: "production-cluster"
43
            - name: OPENSEARCH_NODE_NAME
44
              valueFrom:
45
                fieldRef:
46
                  fieldPath: metadata.name
47
            - name: OPENSEARCH_DISCOVERY_SEED_HOSTS
48
              value: "opensearch-multi-0,opensearch-multi-1,opensearch-multi-2"
49
            - name: OPENSEARCH_CLUSTER_INITIAL_MASTER_NODES
50
              value: "opensearch-multi-0,opensearch-multi-1,opensearch-multi-2"
51
          resources:
52
            requests:
53
              cpu: "1000m"
54
              memory: "4Gi"
55
            limits:
56
              cpu: "2000m"
57
              memory: "8Gi"
58
          volumeMounts:
59
            - name: opensearch-data
60
              mountPath: /var/lib/opensearch
61
            - name: opensearch-logs
62
              mountPath: /var/log/services
63
          livenessProbe:
64
            httpGet:
65
              path: /_cluster/health
66
              port: 9200
67
            initialDelaySeconds: 120
68
            periodSeconds: 30
69
          readinessProbe:
70
            httpGet:
71
              path: /_cluster/health?local=true
72
              port: 9200
73
            initialDelaySeconds: 60
74
            periodSeconds: 10
75
  volumeClaimTemplates:
76
    - metadata:
77
        name: opensearch-data
78
      spec:
79
        accessModes: ['ReadWriteOnce']
80
        storageClassName: 'fast-ssd'
81
        resources:
82
          requests:
83
            storage: 100Gi
84
    - metadata:
85
        name: opensearch-logs
86
      spec:
87
        accessModes: ['ReadWriteOnce']
88
        storageClassName: 'standard'
89
        resources:
90
          requests:
91
            storage: 20Gi
92

93
---
94
apiVersion: v1
95
kind: Service
96
metadata:
97
  name: opensearch-multi
98
  namespace: logging
99
spec:
100
  clusterIP: None
101
  selector:
102
    app: opensearch-multi
103
  ports:
104
    - name: http
105
      port: 9200
106
      targetPort: 9200
107
    - name: transport
108
      port: 9300
109
      targetPort: 9300
110
    - name: node-exporter
111
      port: 9100
112
      targetPort: 9100
113
    - name: health-metrics
114
      port: 8080
115
      targetPort: 8080

Advanced Troubleshooting#

Service Debugging Commands#

1
# Check service status
2
s6-svstat /run/s6/services/opensearch
3
s6-svstat /run/s6/services/node-exporter
4

5
# View service logs
6
s6-tail /run/s6/services/opensearch
7
s6-tail /run/s6/services/cluster-monitor
8

9
# Restart specific service
10
s6-svc -r /run/s6/services/opensearch
11

12
# Stop all services gracefully
13
s6-svscanctl -t /run/s6/services
14

15
# Check process tree
16
ps auxf | grep -E "(s6|opensearch|node_exporter)"

Common Issues I’ve Solved#

Issue 1: Services Won’t Start#

Symptoms: Services fail to start or restart immediately

Diagnosis:

1
# Check service logs
2
cat /run/s6/services/opensearch/log/current
3

4
# Check permissions
5
ls -la /etc/services.d/opensearch/
6
ls -la /usr/local/bin/
7

8
# Verify dependencies
9
ldd /usr/share/opensearch/bin/opensearch

Solution:

1
# Fix permissions
2
chmod +x /etc/services.d/*/run
3
chown -R opensearch:opensearch /var/lib/opensearch
4

5
# Check script syntax
6
bash -n /etc/services.d/opensearch/run

Issue 2: Memory Limitations#

Symptoms: OpenSearch fails with OutOfMemoryError

Diagnosis:

1
# Check container limits
2
cat /sys/fs/cgroup/memory/memory.limit_in_bytes
3

4
# Check current usage
5
free -h
6
ps aux --sort=-%mem | head -10

Solution:

1
# Adjust JVM heap size
2
export OPENSEARCH_JAVA_OPTS="-Xms2g -Xmx2g"
3

4
# Or use automatic detection in setup script
5
HEAP_SIZE=$((CONTAINER_MEMORY / 2))

Issue 3: Port Conflicts#

Symptoms: Services can’t bind to ports

Diagnosis:

1
# Check port usage
2
netstat -tulpn | grep :9200
3
ss -tlpn | grep :9200
4

5
# Check service order
6
s6-svstat /run/s6/services/*

Solution: Use my service dependency pattern:

1
# In dependent service run script
2
/usr/local/bin/wait-for-service.sh opensearch 60 9200

Performance Optimization Tips#

Memory Tuning#

1
# Optimize for container environment
2
echo 'vm.max_map_count=262144' >> /etc/sysctl.conf
3
echo 'vm.swappiness=1' >> /etc/sysctl.conf
4

5
# Container-specific JVM options
6
OPENSEARCH_JAVA_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+UseCompressedOops"

CPU Optimization#

1
# Adjust thread pools based on CPU count
2
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
3
{
4
  "persistent": {
5
    "thread_pool.search.size": "'$(nproc)'",
6
    "thread_pool.write.size": "'$(nproc)'"
7
  }
8
}'

Storage Optimization#

1
# Use appropriate storage classes in Kubernetes
2
storageClassName: 'fast-ssd'    # For data
3
storageClassName: 'standard'    # For logs
4

5
# Optimize disk I/O
6
echo mq-deadline > /sys/block/sda/queue/scheduler
7
echo 4096 > /sys/block/sda/queue/read_ahead_kb

Security Best Practices#

Container Security#

1
# Use non-root user
2
USER opensearch
3

4
# Minimal attack surface
5
RUN apk del curl wget && rm -rf /var/cache/apk/*
6

7
# Read-only filesystem where possible
8
VOLUME ["/var/lib/opensearch", "/var/log/services"]

Network Security#

1
# In docker-compose.yml
2
networks:
3
  opensearch-net:
4
    driver: bridge
5
    internal: true  # No external access
6
    ipam:
7
      config:
8
        - subnet: 172.20.0.0/16

Secret Management#

1
# Use Docker secrets
2
echo "supersecret" | docker secret create opensearch_password -
3

4
# Or Kubernetes secrets
5
kubectl create secret generic opensearch-creds \
6
  --from-literal=username=admin \
7
  --from-literal=password=supersecret

Real-World Performance Results#

Production Metrics I’ve Achieved#

Metric	Before s6-overlay	With s6-overlay	Improvement
Container Boot Time	45-60s	15-20s	70% faster
Memory Overhead	150-200MB	50-75MB	65% reduction
Service Recovery	Manual	Automatic	100% automated
Zombie Processes	Common	None	Eliminated
Log Management	Manual	Automated	100% automated

Why This Matters#

Faster Deployments: Reduced boot time means faster scaling
Better Resource Utilization: Lower overhead means more application resources
Higher Reliability: Automatic service recovery reduces downtime
Easier Debugging: Structured logging and service isolation

Conclusion: The Production-Ready Difference#

After implementing s6-overlay across hundreds of containers in production—and watching other teams struggle with their “pure” single-process containers—I can confidently say it’s the right way to run multi-service containers. Fight me.

Key Benefits I’ve Realized:#

✅ Reliability: Services restart automatically, containers don’t become zombies (unlike your career if you keep following Docker “best practices” blindly) ✅ Observability: Clear service boundaries and structured logging (revolutionary!) ✅ Performance: Minimal overhead with maximum functionality (imagine that—efficiency!) ✅ Maintainability: Consistent patterns across all deployments (because consistency is apparently a novel concept) ✅ Scalability: Works equally well in Docker and Kubernetes (shocking that good design scales!)

When to Use This Approach:#

Complex Applications: When you need multiple tightly-coupled services
Legacy Migration: When containerizing existing multi-process applications
Resource Constraints: When you need maximum efficiency
Production Workloads: When reliability and observability matter

My Recommendation:#

Start with this pattern for any container that needs more than one process. It’s easier to maintain, more reliable, and performs better than alternatives—despite what the container purity police might tell you.

The investment in setting up s6-overlay properly pays dividends in reduced operational overhead and improved system reliability. Or you could keep running seventeen separate containers for what should be one cohesive application. Your choice.

Ready to supercharge your containers? Try my s6-overlay setup and experience the difference in your production deployments. 🚀

Resources and Tools#

Source Code: GitHub - My s6-overlay Templates
Docker Images: DockerHub - Production-Ready Images
Monitoring Dashboards: Grafana Dashboard Collection
Kubernetes Manifests: Production K8s Configs