Firecracker Performance Optimization: Maximizing MicroVM Efficiency and Throughput

Table of Contents#

Introduction#

Firecracker’s lightweight virtualization architecture provides excellent baseline performance, but production environments often require fine-tuning to achieve optimal efficiency. This comprehensive guide explores advanced performance optimization techniques for Firecracker microVMs, covering everything from kernel configuration to resource management.

Understanding performance optimization in Firecracker involves multiple layers: host system tuning, kernel optimization, memory management, I/O acceleration, and network performance. Each layer contributes to the overall efficiency of microVM workloads.

Performance Architecture Overview#

1
graph TB
2
    subgraph "Host System Layer"
3
        CPU[CPU Optimization]
4
        MEMORY[Memory Management]
5
        KERNEL[Kernel Tuning]
6
        SCHEDULER[Process Scheduling]
7
    end
8

9
    subgraph "Firecracker VMM Layer"
10
        FC_PROC[FC Process Optimization]
11
        RATE_LIM[Rate Limiting]
12
        DEVICE_EMU[Device Emulation]
13
        API_OPT[API Optimization]
14
    end
15

16
    subgraph "Guest System Layer"
17
        GUEST_KERNEL[Guest Kernel Optimization]
18
        GUEST_MEM[Guest Memory Management]
19
        GUEST_IO[Guest I/O Optimization]
20
        GUEST_NET[Guest Network Stack]
21
    end
22

23
    subgraph "Application Layer"
24
        APP_OPT[Application Optimization]
25
        WORKLOAD[Workload Tuning]
26
        MONITORING[Performance Monitoring]
27
    end
28

29
    CPU --> FC_PROC
30
    MEMORY --> FC_PROC
31
    KERNEL --> FC_PROC
32
    SCHEDULER --> FC_PROC
33

34
    FC_PROC --> GUEST_KERNEL
35
    RATE_LIM --> GUEST_IO
36
    DEVICE_EMU --> GUEST_IO
37
    API_OPT --> GUEST_KERNEL
38

39
    GUEST_KERNEL --> APP_OPT
40
    GUEST_MEM --> APP_OPT
41
    GUEST_IO --> APP_OPT
42
    GUEST_NET --> APP_OPT
43

44
    APP_OPT --> MONITORING
45
    WORKLOAD --> MONITORING

Performance Optimization Principles#

Minimize Overhead: Reduce unnecessary layers and processing Maximize Parallelism: Leverage multi-core systems effectively
Optimize Memory Access: Improve cache efficiency and reduce page faults Accelerate I/O: Minimize storage and network latency Resource Isolation: Prevent noisy neighbor effects

Host System Optimization#

CPU Configuration and Tuning#

1
#!/bin/bash
2

3
# Host CPU optimization for Firecracker
4
echo "=== CPU Optimization ==="
5

6
# Check CPU information
7
echo "CPU Information:"
8
lscpu | grep -E "(Architecture|CPU\(s\)|Thread|Core|Socket|Vendor|Model name|CPU MHz|Cache|Flags)"
9

10
# Enable CPU performance governor
11
echo "Setting CPU governor to performance..."
12
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
13
    [ -f "$cpu" ] && echo performance | sudo tee "$cpu" > /dev/null
14
done
15

16
# Disable CPU frequency scaling
17
echo "Disabling CPU frequency scaling..."
18
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo 2>/dev/null || echo "Intel P-State not available"
19

20
# Set CPU affinity for optimal performance
21
set_cpu_affinity() {
22
    local vm_id=$1
23
    local cpu_list=$2
24

25
    # Find Firecracker process
26
    fc_pid=$(pgrep -f "firecracker.*${vm_id}")
27

28
    if [ -n "$fc_pid" ]; then
29
        echo "Setting CPU affinity for VM $vm_id (PID: $fc_pid) to CPUs: $cpu_list"
30
        taskset -cp "$cpu_list" "$fc_pid"
31

32
        # Set high priority
33
        sudo renice -10 -p "$fc_pid"
34

35
        # Set real-time scheduling (use with caution)
36
        # sudo chrt -r -p 50 "$fc_pid"
37
    else
38
        echo "Firecracker process for VM $vm_id not found"
39
    fi
40
}
41

42
# Optimize interrupt handling
43
echo "Optimizing interrupt handling..."
44
# Distribute interrupts across CPUs
45
echo 2 | sudo tee /proc/irq/default_smp_affinity
46

47
# Disable unnecessary kernel features
48
echo "Disabling unnecessary kernel features..."
49
echo 0 | sudo tee /proc/sys/kernel/watchdog
50
echo 0 | sudo tee /proc/sys/kernel/nmi_watchdog
51

52
# Configure CPU isolation (add to kernel command line)
53
cat << 'EOF'
54
To isolate CPUs for Firecracker workloads, add to GRUB_CMDLINE_LINUX_DEFAULT:
55
isolcpus=2-7 nohz_full=2-7 rcu_nocbs=2-7
56

57
Then update GRUB and reboot:
58
sudo update-grub
59
sudo reboot
60
EOF
61

62
echo "CPU optimization complete!"

Memory Optimization#

1
#!/usr/bin/env python3
2
import os
3
import subprocess
4
import mmap
5
from pathlib import Path
6

7
class MemoryOptimizer:
8
    """Optimize memory settings for Firecracker performance"""
9

10
    def __init__(self):
11
        self.hugepage_sizes = ['1048576', '2048']  # 1GB and 2MB hugepages
12
        self.transparent_hugepages = Path('/sys/kernel/mm/transparent_hugepage')
13

14
    def configure_hugepages(self, num_1gb_pages=4, num_2mb_pages=512):
15
        """Configure hugepages for better memory performance"""
16

17
        print("=== Configuring Hugepages ===")
18

19
        # Configure 1GB hugepages
20
        hugepages_1g = Path('/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages')
21
        if hugepages_1g.exists():
22
            try:
23
                with open(hugepages_1g, 'w') as f:
24
                    f.write(str(num_1gb_pages))
25
                print(f"✓ Configured {num_1gb_pages} x 1GB hugepages")
26
            except PermissionError:
27
                print("✗ Need root permissions to configure 1GB hugepages")
28

29
        # Configure 2MB hugepages
30
        hugepages_2m = Path('/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages')
31
        if hugepages_2m.exists():
32
            try:
33
                with open(hugepages_2m, 'w') as f:
34
                    f.write(str(num_2mb_pages))
35
                print(f"✓ Configured {num_2mb_pages} x 2MB hugepages")
36
            except PermissionError:
37
                print("✗ Need root permissions to configure 2MB hugepages")
38

39
        # Mount hugepages filesystem
40
        self._mount_hugepages()
41

42
    def _mount_hugepages(self):
43
        """Mount hugepages filesystems"""
44

45
        mount_points = [
46
            ('/mnt/huge-1G', '1GB', 'pagesize=1G'),
47
            ('/mnt/huge-2M', '2MB', 'pagesize=2M')
48
        ]
49

50
        for mount_point, size, option in mount_points:
51
            os.makedirs(mount_point, exist_ok=True)
52

53
            # Check if already mounted
54
            result = subprocess.run(['mount'], capture_output=True, text=True)
55
            if mount_point not in result.stdout:
56
                try:
57
                    subprocess.run([
58
                        'sudo', 'mount', '-t', 'hugetlbfs',
59
                        '-o', option, 'none', mount_point
60
                    ], check=True)
61
                    print(f"✓ Mounted {size} hugepages at {mount_point}")
62
                except subprocess.CalledProcessError:
63
                    print(f"✗ Failed to mount {size} hugepages")
64

65
    def configure_transparent_hugepages(self):
66
        """Configure transparent hugepages"""
67

68
        print("=== Configuring Transparent Hugepages ===")
69

70
        # Disable transparent hugepages for consistent performance
71
        thp_enabled = self.transparent_hugepages / 'enabled'
72
        thp_defrag = self.transparent_hugepages / 'defrag'
73

74
        settings = [
75
            (thp_enabled, 'never', 'THP enabled'),
76
            (thp_defrag, 'never', 'THP defrag')
77
        ]
78

79
        for setting_file, value, description in settings:
80
            if setting_file.exists():
81
                try:
82
                    with open(setting_file, 'w') as f:
83
                        f.write(value)
84
                    print(f"✓ Set {description} to {value}")
85
                except PermissionError:
86
                    print(f"✗ Need root permissions to configure {description}")
87

88
    def configure_memory_overcommit(self):
89
        """Configure memory overcommit settings"""
90

91
        print("=== Configuring Memory Overcommit ===")
92

93
        # Conservative overcommit for predictable performance
94
        overcommit_settings = [
95
            ('/proc/sys/vm/overcommit_memory', '2', 'Strict overcommit'),
96
            ('/proc/sys/vm/overcommit_ratio', '80', 'Overcommit ratio'),
97
            ('/proc/sys/vm/swappiness', '1', 'Swappiness'),
98
            ('/proc/sys/vm/vfs_cache_pressure', '50', 'VFS cache pressure')
99
        ]
100

101
        for setting_file, value, description in overcommit_settings:
102
            try:
103
                with open(setting_file, 'w') as f:
104
                    f.write(value)
105
                print(f"✓ Set {description} to {value}")
106
            except PermissionError:
107
                print(f"✗ Need root permissions to configure {description}")
108

109
    def optimize_numa_settings(self):
110
        """Optimize NUMA settings for Firecracker"""
111

112
        print("=== Optimizing NUMA Settings ===")
113

114
        # Check NUMA topology
115
        try:
116
            result = subprocess.run(['numactl', '--hardware'],
117
                                  capture_output=True, text=True, check=True)
118
            print("NUMA topology:")
119
            print(result.stdout)
120
        except (subprocess.CalledProcessError, FileNotFoundError):
121
            print("NUMA tools not available")
122
            return
123

124
        # Configure zone reclaim
125
        numa_settings = [
126
            ('/proc/sys/vm/zone_reclaim_mode', '0', 'Zone reclaim mode')
127
        ]
128

129
        for setting_file, value, description in numa_settings:
130
            try:
131
                with open(setting_file, 'w') as f:
132
                    f.write(value)
133
                print(f"✓ Set {description} to {value}")
134
            except PermissionError:
135
                print(f"✗ Need root permissions to configure {description}")
136

137
    def create_memory_pools(self, pool_size_mb=1024, num_pools=4):
138
        """Create pre-allocated memory pools for Firecracker VMs"""
139

140
        print(f"=== Creating Memory Pools ===")
141
        print(f"Creating {num_pools} memory pools of {pool_size_mb}MB each")
142

143
        pools = []
144
        pool_size = pool_size_mb * 1024 * 1024  # Convert to bytes
145

146
        for i in range(num_pools):
147
            try:
148
                # Create anonymous memory mapping
149
                pool = mmap.mmap(-1, pool_size, mmap.MAP_PRIVATE | mmap.MAP_ANONYMOUS)
150

151
                # Touch all pages to ensure allocation
152
                for offset in range(0, pool_size, 4096):
153
                    pool[offset] = 0
154

155
                pools.append(pool)
156
                print(f"✓ Created memory pool {i+1}")
157

158
            except Exception as e:
159
                print(f"✗ Failed to create memory pool {i+1}: {e}")
160

161
        return pools
162

163
    def run_optimization(self):
164
        """Run all memory optimizations"""
165

166
        self.configure_hugepages()
167
        self.configure_transparent_hugepages()
168
        self.configure_memory_overcommit()
169
        self.optimize_numa_settings()
170

171
        print("\n=== Memory Optimization Summary ===")
172
        self.print_memory_info()
173

174
    def print_memory_info(self):
175
        """Print current memory configuration"""
176

177
        # Memory information
178
        with open('/proc/meminfo', 'r') as f:
179
            meminfo = f.read()
180

181
        print("\nMemory Information:")
182
        for line in meminfo.split('\n'):
183
            if any(keyword in line for keyword in ['MemTotal', 'MemFree', 'HugePages', 'Hugepagesize']):
184
                print(f"  {line}")
185

186
        # Hugepages information
187
        hugepages_info = []
188
        hugepages_dir = Path('/sys/kernel/mm/hugepages')
189

190
        if hugepages_dir.exists():
191
            for size_dir in hugepages_dir.iterdir():
192
                if size_dir.is_dir():
193
                    try:
194
                        with open(size_dir / 'nr_hugepages', 'r') as f:
195
                            nr_pages = f.read().strip()
196
                        with open(size_dir / 'free_hugepages', 'r') as f:
197
                            free_pages = f.read().strip()
198

199
                        size = size_dir.name.replace('hugepages-', '').replace('kB', '')
200
                        hugepages_info.append(f"  {size}: {nr_pages} total, {free_pages} free")
201
                    except:
202
                        pass
203

204
        if hugepages_info:
205
            print("\nHugepages:")
206
            for info in hugepages_info:
207
                print(info)
208

209
if __name__ == '__main__':
210
    optimizer = MemoryOptimizer()
211
    optimizer.run_optimization()

Kernel Optimization#

1
#!/bin/bash
2

3
# Kernel optimization for Firecracker
4
echo "=== Kernel Optimization ==="
5

6
# Disable unnecessary kernel features
7
echo "Disabling unnecessary kernel features..."
8
kernel_settings=(
9
    # Disable audit subsystem
10
    "kernel.audit=0"
11
    # Disable printk rate limiting
12
    "kernel.printk_ratelimit=0"
13
    # Optimize scheduling
14
    "kernel.sched_migration_cost_ns=5000000"
15
    "kernel.sched_autogroup_enabled=0"
16
    # Optimize memory management
17
    "vm.dirty_ratio=15"
18
    "vm.dirty_background_ratio=5"
19
    "vm.dirty_expire_centisecs=12000"
20
    "vm.dirty_writeback_centisecs=1200"
21
    # Optimize network stack
22
    "net.core.rmem_max=67108864"
23
    "net.core.wmem_max=67108864"
24
    "net.core.rmem_default=262144"
25
    "net.core.wmem_default=262144"
26
    "net.core.netdev_max_backlog=5000"
27
    # Optimize file system
28
    "fs.file-max=2097152"
29
    "fs.nr_open=1048576"
30
)
31

32
# Apply kernel settings
33
for setting in "${kernel_settings[@]}"; do
34
    key=$(echo "$setting" | cut -d'=' -f1)
35
    value=$(echo "$setting" | cut -d'=' -f2)
36

37
    echo "Setting $key = $value"
38
    echo "$value" | sudo tee "/proc/sys/${key//./\/}" > /dev/null 2>&1 || echo "Failed to set $key"
39
done
40

41
# Make settings persistent
42
sudo tee -a /etc/sysctl.conf << 'EOF'
43

44
# Firecracker optimizations
45
kernel.audit=0
46
kernel.printk_ratelimit=0
47
kernel.sched_migration_cost_ns=5000000
48
kernel.sched_autogroup_enabled=0
49
vm.dirty_ratio=15
50
vm.dirty_background_ratio=5
51
vm.dirty_expire_centisecs=12000
52
vm.dirty_writeback_centisecs=1200
53
net.core.rmem_max=67108864
54
net.core.wmem_max=67108864
55
net.core.rmem_default=262144
56
net.core.wmem_default=262144
57
net.core.netdev_max_backlog=5000
58
fs.file-max=2097152
59
fs.nr_open=1048576
60
EOF
61

62
echo "Kernel optimization complete!"
63

64
# Create optimized kernel configuration
65
create_optimized_kernel_config() {
66
    cat > /tmp/firecracker_kernel_config << 'EOF'
67
# Firecracker optimized kernel configuration
68

69
# Minimal required features
70
CONFIG_64BIT=y
71
CONFIG_X86_64=y
72
CONFIG_SMP=y
73
CONFIG_HYPERVISOR_GUEST=y
74

75
# Virtualization support
76
CONFIG_PARAVIRT=y
77
CONFIG_PARAVIRT_SPINLOCKS=y
78
CONFIG_KVM_GUEST=y
79

80
# Essential drivers
81
CONFIG_VIRTIO=y
82
CONFIG_VIRTIO_PCI=y
83
CONFIG_VIRTIO_BLK=y
84
CONFIG_VIRTIO_NET=y
85
CONFIG_VIRTIO_CONSOLE=y
86
CONFIG_VIRTIO_VSOCKETS=y
87

88
# Memory management
89
CONFIG_TRANSPARENT_HUGEPAGE=n
90
CONFIG_HUGETLBFS=y
91
CONFIG_HUGETLB_PAGE=y
92

93
# Scheduler optimizations
94
CONFIG_PREEMPT_NONE=y
95
CONFIG_NO_HZ=y
96
CONFIG_NO_HZ_IDLE=y
97
CONFIG_HIGH_RES_TIMERS=y
98

99
# Disable unnecessary features
100
CONFIG_SUSPEND=n
101
CONFIG_HIBERNATION=n
102
CONFIG_ACPI=n
103
CONFIG_PCI=n
104
CONFIG_USB=n
105
CONFIG_SOUND=n
106
CONFIG_DRM=n
107
CONFIG_WIRELESS=n
108
CONFIG_BLUETOOTH=n
109

110
# Security
111
CONFIG_SECURITY=y
112
CONFIG_SECCOMP=y
113
CONFIG_SECCOMP_FILTER=y
114

115
# Networking (minimal)
116
CONFIG_NET=y
117
CONFIG_INET=y
118
CONFIG_NETFILTER=y
119

120
# File systems
121
CONFIG_EXT4_FS=y
122
CONFIG_PROC_FS=y
123
CONFIG_SYSFS=y
124
CONFIG_TMPFS=y
125
CONFIG_9P_FS=y
126
CONFIG_9P_VIRTIO=y
127

128
# Disable debug features for performance
129
CONFIG_DEBUG_KERNEL=n
130
CONFIG_SLUB_DEBUG=n
131
CONFIG_DEBUG_INFO=n
132
CONFIG_FRAME_POINTER=n
133
CONFIG_STACK_TRACER=n
134
CONFIG_FUNCTION_TRACER=n
135
EOF
136

137
    echo "Optimized kernel config created at /tmp/firecracker_kernel_config"
138
}
139

140
create_optimized_kernel_config

Firecracker VMM Optimization#

Process and Resource Configuration#

1
#!/usr/bin/env python3
2
import json
3
import psutil
4
import subprocess
5
from pathlib import Path
6

7
class FirecrackerVMMOptimizer:
8
    """Optimize Firecracker VMM process for performance"""
9

10
    def __init__(self):
11
        self.firecracker_processes = []
12
        self.optimization_configs = {}
13

14
    def find_firecracker_processes(self):
15
        """Find all running Firecracker processes"""
16

17
        processes = []
18
        for proc in psutil.process_iter(['pid', 'name', 'cmdline']):
19
            try:
20
                if proc.info['name'] == 'firecracker':
21
                    processes.append({
22
                        'pid': proc.info['pid'],
23
                        'cmdline': ' '.join(proc.info['cmdline'])
24
                    })
25
            except (psutil.NoSuchProcess, psutil.AccessDenied):
26
                continue
27

28
        self.firecracker_processes = processes
29
        return processes
30

31
    def optimize_process_scheduling(self, pid, priority=-10, scheduler='normal'):
32
        """Optimize process scheduling parameters"""
33

34
        try:
35
            proc = psutil.Process(pid)
36

37
            # Set process priority
38
            proc.nice(priority)
39
            print(f"✓ Set nice level to {priority} for PID {pid}")
40

41
            # Set I/O priority
42
            try:
43
                subprocess.run(['sudo', 'ionice', '-c', '1', '-n', '2', '-p', str(pid)],
44
                             check=True)
45
                print(f"✓ Set I/O priority to real-time class for PID {pid}")
46
            except subprocess.CalledProcessError:
47
                print(f"⚠ Failed to set I/O priority for PID {pid}")
48

49
            # Set CPU affinity (optional)
50
            if scheduler == 'isolated':
51
                # Bind to specific CPUs (adjust based on your system)
52
                isolated_cpus = [2, 3, 4, 5]  # Example isolated CPUs
53
                proc.cpu_affinity(isolated_cpus)
54
                print(f"✓ Set CPU affinity to {isolated_cpus} for PID {pid}")
55

56
        except (psutil.NoSuchProcess, psutil.AccessDenied) as e:
57
            print(f"✗ Failed to optimize PID {pid}: {e}")
58

59
    def configure_firecracker_limits(self, vm_id):
60
        """Configure optimal resource limits for Firecracker VM"""
61

62
        config = {
63
            "machine-config": {
64
                "vcpu_count": 2,
65
                "mem_size_mib": 512,
66
                # Enable memory balloon for dynamic allocation
67
                "balloon": {
68
                    "amount_mib": 256,
69
                    "deflate_on_oom": True,
70
                    "stats_polling_interval_s": 1
71
                }
72
            },
73
            "cpu-config": {
74
                # Use CPU template for better performance
75
                "cpu_template": "T2"
76
            },
77
            # Configure rate limiters for predictable performance
78
            "network-interfaces": [
79
                {
80
                    "iface_id": "eth0",
81
                    "guest_mac": "AA:FC:00:00:00:01",
82
                    "host_dev_name": "tap0",
83
                    "rx_rate_limiter": {
84
                        "bandwidth": {
85
                            "size": 125000000,  # 1Gbps
86
                            "refill_time": 100
87
                        },
88
                        "ops": {
89
                            "size": 50000,
90
                            "refill_time": 100
91
                        }
92
                    },
93
                    "tx_rate_limiter": {
94
                        "bandwidth": {
95
                            "size": 125000000,  # 1Gbps
96
                            "refill_time": 100
97
                        },
98
                        "ops": {
99
                            "size": 50000,
100
                            "refill_time": 100
101
                        }
102
                    }
103
                }
104
            ],
105
            "drives": [
106
                {
107
                    "drive_id": "rootfs",
108
                    "path_on_host": f"/var/lib/firecracker/{vm_id}/rootfs.ext4",
109
                    "is_root_device": True,
110
                    "is_read_only": False,
111
                    "rate_limiter": {
112
                        "bandwidth": {
113
                            "size": 104857600,  # 100MB/s
114
                            "refill_time": 100
115
                        },
116
                        "ops": {
117
                            "size": 10000,
118
                            "refill_time": 100
119
                        }
120
                    }
121
                }
122
            ],
123
            # Enable metrics for monitoring
124
            "metrics": {
125
                "metrics_path": f"/tmp/firecracker-metrics-{vm_id}.json"
126
            },
127
            # Optimize logging
128
            "logger": {
129
                "level": "Warn",  # Reduce logging overhead
130
                "log_path": f"/var/log/firecracker-{vm_id}.log",
131
                "show_level": False,
132
                "show_log_origin": False
133
            }
134
        }
135

136
        self.optimization_configs[vm_id] = config
137
        return config
138

139
    def generate_optimized_config(self, vm_id, output_path=None):
140
        """Generate optimized Firecracker configuration file"""
141

142
        config = self.configure_firecracker_limits(vm_id)
143

144
        if output_path is None:
145
            output_path = f"/tmp/firecracker-optimized-{vm_id}.json"
146

147
        with open(output_path, 'w') as f:
148
            json.dump(config, f, indent=2)
149

150
        print(f"✓ Generated optimized config for VM {vm_id}: {output_path}")
151
        return output_path
152

153
    def apply_runtime_optimizations(self, socket_path, optimizations):
154
        """Apply runtime optimizations via Firecracker API"""
155

156
        import requests_unixsocket
157
        session = requests_unixsocket.Session()
158
        base_url = f'http+unix://{socket_path.replace("/", "%2F")}'
159

160
        # Apply CPU optimizations
161
        if 'cpu_template' in optimizations:
162
            response = session.put(
163
                f'{base_url}/cpu-config',
164
                json={'cpu_template': optimizations['cpu_template']}
165
            )
166
            if response.status_code == 204:
167
                print("✓ Applied CPU template optimization")
168
            else:
169
                print(f"✗ Failed to apply CPU template: {response.status_code}")
170

171
        # Apply memory balloon configuration
172
        if 'balloon' in optimizations:
173
            response = session.put(
174
                f'{base_url}/balloon',
175
                json=optimizations['balloon']
176
            )
177
            if response.status_code == 204:
178
                print("✓ Applied memory balloon optimization")
179
            else:
180
                print(f"✗ Failed to apply balloon config: {response.status_code}")
181

182
        # Configure rate limiters for existing interfaces
183
        if 'network_rate_limits' in optimizations:
184
            for iface_id, limits in optimizations['network_rate_limits'].items():
185
                response = session.patch(
186
                    f'{base_url}/network-interfaces/{iface_id}',
187
                    json=limits
188
                )
189
                if response.status_code == 204:
190
                    print(f"✓ Applied network rate limits for {iface_id}")
191
                else:
192
                    print(f"✗ Failed to apply network rate limits for {iface_id}")
193

194
    def monitor_performance_metrics(self, metrics_path, duration=60):
195
        """Monitor Firecracker performance metrics"""
196

197
        import time
198

199
        print(f"Monitoring performance for {duration} seconds...")
200

201
        start_time = time.time()
202
        metrics_history = []
203

204
        while time.time() - start_time < duration:
205
            try:
206
                if Path(metrics_path).exists():
207
                    with open(metrics_path, 'r') as f:
208
                        metrics = json.load(f)
209
                        metrics['timestamp'] = time.time()
210
                        metrics_history.append(metrics)
211

212
                time.sleep(1)
213

214
            except (json.JSONDecodeError, FileNotFoundError):
215
                time.sleep(1)
216
                continue
217

218
        # Analyze metrics
219
        self._analyze_metrics(metrics_history)
220
        return metrics_history
221

222
    def _analyze_metrics(self, metrics_history):
223
        """Analyze collected performance metrics"""
224

225
        if not metrics_history:
226
            print("No metrics collected")
227
            return
228

229
        print("\n=== Performance Analysis ===")
230

231
        # Network metrics
232
        if 'net' in metrics_history[-1]:
233
            net_metrics = metrics_history[-1]['net']
234
            print(f"Network Events: RX={net_metrics.get('rx_queue_event_count', 0)}, "
235
                  f"TX={net_metrics.get('tx_queue_event_count', 0)}")
236

237
        # Block I/O metrics
238
        if 'block' in metrics_history[-1]:
239
            block_metrics = metrics_history[-1]['block']
240
            print(f"Block I/O: Reads={block_metrics.get('read_count', 0)}, "
241
                  f"Writes={block_metrics.get('write_count', 0)}")
242

243
        # vCPU metrics
244
        if 'vcpu' in metrics_history[-1]:
245
            for vcpu_id, vcpu_data in metrics_history[-1]['vcpu'].items():
246
                exits = vcpu_data.get('exit_io_in', 0) + vcpu_data.get('exit_io_out', 0)
247
                print(f"{vcpu_id}: VM exits={exits}")
248

249
    def run_optimization_suite(self):
250
        """Run complete optimization suite"""
251

252
        print("=== Firecracker VMM Optimization Suite ===")
253

254
        # Find running Firecracker processes
255
        processes = self.find_firecracker_processes()
256
        print(f"Found {len(processes)} Firecracker processes")
257

258
        # Optimize each process
259
        for proc in processes:
260
            print(f"\nOptimizing PID {proc['pid']}...")
261
            self.optimize_process_scheduling(proc['pid'])
262

263
        # Generate optimized configurations
264
        for i in range(3):  # Generate configs for example VMs
265
            vm_id = f"vm{i:03d}"
266
            config_path = self.generate_optimized_config(vm_id)
267
            print(f"Generated config for {vm_id}: {config_path}")
268

269
if __name__ == '__main__':
270
    optimizer = FirecrackerVMMOptimizer()
271
    optimizer.run_optimization_suite()

I/O Performance Optimization#

Storage Optimization#

1
#!/bin/bash
2

3
# Storage I/O optimization for Firecracker
4
echo "=== Storage I/O Optimization ==="
5

6
# Configure I/O scheduler
7
configure_io_scheduler() {
8
    local device=$1
9
    local scheduler=${2:-mq-deadline}
10

11
    echo "Configuring I/O scheduler for $device to $scheduler"
12

13
    if [ -f "/sys/block/$device/queue/scheduler" ]; then
14
        echo "$scheduler" | sudo tee "/sys/block/$device/queue/scheduler"
15
        echo "✓ Set $device scheduler to $scheduler"
16
    else
17
        echo "✗ Device $device not found"
18
    fi
19
}
20

21
# Optimize block device settings
22
optimize_block_device() {
23
    local device=$1
24

25
    echo "Optimizing block device settings for $device"
26

27
    # Set queue depth
28
    echo 32 | sudo tee "/sys/block/$device/queue/nr_requests"
29

30
    # Enable NCQ (Native Command Queuing)
31
    echo 31 | sudo tee "/sys/block/$device/queue/nr_requests"
32

33
    # Optimize read-ahead
34
    echo 512 | sudo tee "/sys/block/$device/queue/read_ahead_kb"
35

36
    # Enable write caching (if safe)
37
    echo "write back" | sudo tee "/sys/block/$device/queue/write_cache" 2>/dev/null || true
38

39
    # Optimize rotational settings (for SSDs)
40
    echo 0 | sudo tee "/sys/block/$device/queue/rotational"
41

42
    echo "✓ Optimized $device settings"
43
}
44

45
# Configure storage devices
46
for device in sda sdb nvme0n1 nvme1n1; do
47
    if [ -d "/sys/block/$device" ]; then
48
        configure_io_scheduler "$device" "mq-deadline"
49
        optimize_block_device "$device"
50
    fi
51
done
52

53
echo "Storage optimization complete!"

Advanced I/O Configuration#

1
#!/usr/bin/env python3
2
import os
3
import json
4
import subprocess
5
from pathlib import Path
6

7
class IOOptimizer:
8
    """Advanced I/O optimization for Firecracker"""
9

10
    def __init__(self):
11
        self.block_devices = self._discover_block_devices()
12
        self.nvme_devices = self._discover_nvme_devices()
13

14
    def _discover_block_devices(self):
15
        """Discover available block devices"""
16

17
        devices = []
18
        sys_block = Path('/sys/block')
19

20
        for device_path in sys_block.iterdir():
21
            if device_path.is_dir() and not device_path.name.startswith('loop'):
22
                devices.append(device_path.name)
23

24
        return devices
25

26
    def _discover_nvme_devices(self):
27
        """Discover NVMe devices"""
28

29
        devices = []
30
        for device in self.block_devices:
31
            if device.startswith('nvme'):
32
                devices.append(device)
33

34
        return devices
35

36
    def optimize_nvme_devices(self):
37
        """Optimize NVMe devices for Firecracker workloads"""
38

39
        print("=== NVMe Optimization ===")
40

41
        for device in self.nvme_devices:
42
            device_path = f'/dev/{device}'
43

44
            # Set optimal queue depth
45
            self._set_nvme_queue_depth(device, 32)
46

47
            # Configure interrupt coalescing
48
            self._configure_nvme_interrupts(device)
49

50
            # Enable write caching
51
            self._enable_write_cache(device_path)
52

53
            print(f"✓ Optimized NVMe device {device}")
54

55
    def _set_nvme_queue_depth(self, device, depth=32):
56
        """Set NVMe queue depth"""
57

58
        queue_file = f'/sys/block/{device}/queue/nr_requests'
59

60
        try:
61
            with open(queue_file, 'w') as f:
62
                f.write(str(depth))
63
            print(f"  ✓ Set queue depth to {depth} for {device}")
64
        except (IOError, OSError) as e:
65
            print(f"  ✗ Failed to set queue depth for {device}: {e}")
66

67
    def _configure_nvme_interrupts(self, device):
68
        """Configure NVMe interrupt settings"""
69

70
        # Find NVMe PCI device
71
        try:
72
            result = subprocess.run([
73
                'lspci', '-D', '-d', '::0108'  # NVMe controller class
74
            ], capture_output=True, text=True, check=True)
75

76
            for line in result.stdout.strip().split('\n'):
77
                if line:
78
                    pci_id = line.split()[0]
79
                    self._optimize_pci_device_interrupts(pci_id)
80

81
        except subprocess.CalledProcessError:
82
            print(f"  ⚠ Could not find PCI info for {device}")
83

84
    def _optimize_pci_device_interrupts(self, pci_id):
85
        """Optimize PCI device interrupt settings"""
86

87
        # This is a simplified example - actual implementation would be more complex
88
        irq_path = f'/proc/irq'
89

90
        try:
91
            # Find device IRQs
92
            result = subprocess.run([
93
                'grep', '-l', pci_id.replace(':', ''),
94
                f'{irq_path}/*/actions'
95
            ], capture_output=True, text=True, check=False)
96

97
            for irq_file in result.stdout.strip().split('\n'):
98
                if irq_file:
99
                    irq_num = irq_file.split('/')[3]
100
                    # Set interrupt affinity (simplified)
101
                    smp_affinity_file = f'{irq_path}/{irq_num}/smp_affinity'
102
                    if os.path.exists(smp_affinity_file):
103
                        with open(smp_affinity_file, 'w') as f:
104
                            f.write('f')  # Use all CPUs
105
                        print(f"  ✓ Set interrupt affinity for IRQ {irq_num}")
106

107
        except Exception as e:
108
            print(f"  ⚠ Could not optimize interrupts for {pci_id}: {e}")
109

110
    def _enable_write_cache(self, device_path):
111
        """Enable write cache for device"""
112

113
        try:
114
            subprocess.run([
115
                'hdparm', '-W', '1', device_path
116
            ], check=True, capture_output=True)
117
            print(f"  ✓ Enabled write cache for {device_path}")
118
        except (subprocess.CalledProcessError, FileNotFoundError):
119
            print(f"  ⚠ Could not enable write cache for {device_path}")
120

121
    def create_optimized_filesystem(self, device_path, fs_type='ext4', mount_point=None):
122
        """Create optimized filesystem for Firecracker images"""
123

124
        print(f"Creating optimized {fs_type} filesystem on {device_path}")
125

126
        if fs_type == 'ext4':
127
            # Create ext4 with optimizations
128
            cmd = [
129
                'mkfs.ext4',
130
                '-F',  # Force creation
131
                '-O', '^has_journal',  # Disable journaling for performance
132
                '-E', 'lazy_itable_init=0,lazy_journal_init=0',
133
                '-m', '1',  # Reduce reserved blocks
134
                '-b', '4096',  # 4KB block size
135
                device_path
136
            ]
137
        elif fs_type == 'xfs':
138
            # Create XFS with optimizations
139
            cmd = [
140
                'mkfs.xfs',
141
                '-f',  # Force creation
142
                '-d', 'agcount=8',  # Allocation groups
143
                '-l', 'size=64m',  # Log size
144
                '-b', 'size=4096',  # Block size
145
                device_path
146
            ]
147
        else:
148
            print(f"Unsupported filesystem type: {fs_type}")
149
            return False
150

151
        try:
152
            subprocess.run(cmd, check=True, capture_output=True)
153
            print(f"✓ Created {fs_type} filesystem on {device_path}")
154

155
            if mount_point:
156
                self._mount_with_optimizations(device_path, mount_point, fs_type)
157

158
            return True
159

160
        except subprocess.CalledProcessError as e:
161
            print(f"✗ Failed to create filesystem: {e}")
162
            return False
163

164
    def _mount_with_optimizations(self, device_path, mount_point, fs_type):
165
        """Mount filesystem with performance optimizations"""
166

167
        os.makedirs(mount_point, exist_ok=True)
168

169
        if fs_type == 'ext4':
170
            mount_options = [
171
                'noatime',  # Don't update access times
172
                'nodiratime',  # Don't update directory access times
173
                'data=writeback',  # Fastest journaling mode
174
                'barrier=0',  # Disable barriers (use only with UPS/reliable power)
175
                'commit=60'  # Commit every 60 seconds
176
            ]
177
        elif fs_type == 'xfs':
178
            mount_options = [
179
                'noatime',
180
                'nodiratime',
181
                'allocsize=64m',  # Allocation size
182
                'largeio',  # Large I/O operations
183
                'inode64'  # 64-bit inodes
184
            ]
185
        else:
186
            mount_options = ['noatime', 'nodiratime']
187

188
        try:
189
            subprocess.run([
190
                'mount',
191
                '-o', ','.join(mount_options),
192
                device_path, mount_point
193
            ], check=True)
194
            print(f"✓ Mounted {device_path} at {mount_point} with optimizations")
195
        except subprocess.CalledProcessError as e:
196
            print(f"✗ Failed to mount {device_path}: {e}")
197

198
    def benchmark_storage(self, device_path, test_size='1G'):
199
        """Benchmark storage performance"""
200

201
        print(f"=== Benchmarking {device_path} ===")
202

203
        # Sequential write test
204
        print("Running sequential write test...")
205
        try:
206
            result = subprocess.run([
207
                'dd', 'if=/dev/zero', f'of={device_path}',
208
                f'bs=1M', f'count={test_size[:-1]}',
209
                'oflag=direct', 'conv=fdatasync'
210
            ], capture_output=True, text=True, check=True)
211

212
            # Parse dd output for throughput
213
            for line in result.stderr.split('\n'):
214
                if 'bytes' in line and 'copied' in line:
215
                    print(f"  Sequential write: {line}")
216

217
        except subprocess.CalledProcessError as e:
218
            print(f"  ✗ Sequential write test failed: {e}")
219

220
        # Sequential read test
221
        print("Running sequential read test...")
222
        try:
223
            result = subprocess.run([
224
                'dd', f'if={device_path}', 'of=/dev/null',
225
                f'bs=1M', f'count={test_size[:-1]}',
226
                'iflag=direct'
227
            ], capture_output=True, text=True, check=True)
228

229
            for line in result.stderr.split('\n'):
230
                if 'bytes' in line and 'copied' in line:
231
                    print(f"  Sequential read: {line}")
232

233
        except subprocess.CalledProcessError as e:
234
            print(f"  ✗ Sequential read test failed: {e}")
235

236
        # Random I/O test with fio (if available)
237
        self._run_fio_benchmark(device_path)
238

239
    def _run_fio_benchmark(self, device_path):
240
        """Run fio benchmark if available"""
241

242
        try:
243
            # Check if fio is available
244
            subprocess.run(['which', 'fio'], check=True, capture_output=True)
245

246
            # Create fio job file
247
            fio_job = f'''
248
[global]
249
ioengine=libaio
250
direct=1
251
group_reporting
252
time_based
253
runtime=30
254
filename={device_path}
255

256
[random-read]
257
stonewall
258
rw=randread
259
bs=4k
260
iodepth=32
261
numjobs=1
262

263
[random-write]
264
stonewall
265
rw=randwrite
266
bs=4k
267
iodepth=32
268
numjobs=1
269
'''
270

271
            with open('/tmp/firecracker_fio.job', 'w') as f:
272
                f.write(fio_job)
273

274
            print("Running fio benchmark...")
275
            result = subprocess.run([
276
                'fio', '/tmp/firecracker_fio.job'
277
            ], capture_output=True, text=True, check=True)
278

279
            # Parse fio output for IOPS and bandwidth
280
            for line in result.stdout.split('\n'):
281
                if 'IOPS=' in line or 'BW=' in line:
282
                    print(f"  {line.strip()}")
283

284
        except (subprocess.CalledProcessError, FileNotFoundError):
285
            print("  ⚠ fio not available for detailed benchmarking")
286

287
    def run_io_optimization(self):
288
        """Run complete I/O optimization suite"""
289

290
        print("=== I/O Optimization Suite ===")
291

292
        # Optimize NVMe devices
293
        if self.nvme_devices:
294
            self.optimize_nvme_devices()
295
        else:
296
            print("No NVMe devices found")
297

298
        # Print optimization recommendations
299
        self._print_recommendations()
300

301
    def _print_recommendations(self):
302
        """Print I/O optimization recommendations"""
303

304
        print("\n=== I/O Optimization Recommendations ===")
305
        print("1. Use NVMe SSDs for best performance")
306
        print("2. Configure RAID 0 for multiple drives (if redundancy not required)")
307
        print("3. Use ext4 without journaling for maximum performance")
308
        print("4. Mount with noatime and nodiratime options")
309
        print("5. Use O_DIRECT for database workloads")
310
        print("6. Configure appropriate I/O scheduler (mq-deadline for SSDs)")
311
        print("7. Set optimal queue depth (32-128 for NVMe)")
312
        print("8. Align partitions to 4K boundaries")
313

314
if __name__ == '__main__':
315
    optimizer = IOOptimizer()
316
    optimizer.run_io_optimization()

Network Performance Optimization#

Network Stack Tuning#

1
#!/bin/bash
2

3
# Network performance optimization for Firecracker
4
echo "=== Network Performance Optimization ==="
5

6
# Configure network buffer sizes
7
echo "Configuring network buffers..."
8
sudo sysctl -w net.core.rmem_max=134217728
9
sudo sysctl -w net.core.wmem_max=134217728
10
sudo sysctl -w net.core.rmem_default=262144
11
sudo sysctl -w net.core.wmem_default=262144
12

13
# TCP buffer optimization
14
echo "Optimizing TCP buffers..."
15
sudo sysctl -w net.ipv4.tcp_rmem="4096 65536 134217728"
16
sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728"
17
sudo sysctl -w net.ipv4.tcp_mem="786432 1048576 134217728"
18

19
# Configure TCP congestion control
20
echo "Setting TCP congestion control..."
21
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr
22
sudo sysctl -w net.core.default_qdisc=fq
23

24
# Network device queue optimization
25
echo "Optimizing network device queues..."
26
sudo sysctl -w net.core.netdev_max_backlog=30000
27
sudo sysctl -w net.core.netdev_budget=600
28

29
# Disable unnecessary network features
30
echo "Disabling unnecessary features..."
31
sudo sysctl -w net.ipv4.tcp_timestamps=0
32
sudo sysctl -w net.ipv4.tcp_sack=1
33
sudo sysctl -w net.ipv4.tcp_window_scaling=1
34

35
# Configure interrupt coalescing for network interfaces
36
configure_network_interrupts() {
37
    for interface in $(ls /sys/class/net/ | grep -v lo); do
38
        if [ -d "/sys/class/net/$interface/device" ]; then
39
            echo "Configuring interrupts for $interface"
40

41
            # Set interrupt coalescing (if supported)
42
            ethtool -C "$interface" rx-usecs 10 tx-usecs 10 2>/dev/null || true
43

44
            # Set ring buffer sizes (if supported)
45
            ethtool -G "$interface" rx 4096 tx 4096 2>/dev/null || true
46

47
            # Enable receive hashing
48
            ethtool -K "$interface" rxhash on 2>/dev/null || true
49

50
            echo "  ✓ Configured $interface"
51
        fi
52
    done
53
}
54

55
configure_network_interrupts
56

57
echo "Network optimization complete!"

Virtual Network Optimization#

1
#!/usr/bin/env python3
2
import subprocess
3
import json
4
from pathlib import Path
5

6
class NetworkOptimizer:
7
    """Optimize virtual networking for Firecracker"""
8

9
    def __init__(self):
10
        self.tap_interfaces = []
11
        self.bridge_interfaces = []
12

13
    def create_optimized_tap_interface(self, tap_name, mtu=9000):
14
        """Create optimized TAP interface"""
15

16
        print(f"Creating optimized TAP interface: {tap_name}")
17

18
        try:
19
            # Create TAP interface
20
            subprocess.run([
21
                'sudo', 'ip', 'tuntap', 'add',
22
                'dev', tap_name, 'mode', 'tap'
23
            ], check=True)
24

25
            # Set optimal MTU
26
            subprocess.run([
27
                'sudo', 'ip', 'link', 'set', 'dev', tap_name, 'mtu', str(mtu)
28
            ], check=True)
29

30
            # Enable multiqueue (if supported)
31
            subprocess.run([
32
                'sudo', 'ip', 'link', 'set', 'dev', tap_name,
33
                'type', 'tap', 'multi_queue'
34
            ], check=False)  # May not be supported
35

36
            # Optimize queue length
37
            subprocess.run([
38
                'sudo', 'ip', 'link', 'set', 'dev', tap_name, 'txqueuelen', '1000'
39
            ], check=True)
40

41
            # Bring interface up
42
            subprocess.run([
43
                'sudo', 'ip', 'link', 'set', 'dev', tap_name, 'up'
44
            ], check=True)
45

46
            print(f"✓ Created optimized TAP interface {tap_name}")
47
            self.tap_interfaces.append(tap_name)
48

49
            return True
50

51
        except subprocess.CalledProcessError as e:
52
            print(f"✗ Failed to create TAP interface {tap_name}: {e}")
53
            return False
54

55
    def create_optimized_bridge(self, bridge_name, interfaces=None):
56
        """Create optimized bridge interface"""
57

58
        print(f"Creating optimized bridge: {bridge_name}")
59

60
        try:
61
            # Create bridge
62
            subprocess.run([
63
                'sudo', 'ip', 'link', 'add', 'name', bridge_name, 'type', 'bridge'
64
            ], check=True)
65

66
            # Configure bridge parameters for performance
67
            bridge_settings = [
68
                ('stp_state', '0'),  # Disable STP
69
                ('forward_delay', '0'),  # No forwarding delay
70
                ('hello_time', '100'),  # Shorter hello time
71
                ('max_age', '1200'),  # Shorter max age
72
                ('ageing_time', '30000'),  # 5 minute aging
73
            ]
74

75
            bridge_path = f'/sys/class/net/{bridge_name}/bridge'
76
            for setting, value in bridge_settings:
77
                setting_file = f'{bridge_path}/{setting}'
78
                if Path(setting_file).exists():
79
                    with open(setting_file, 'w') as f:
80
                        f.write(value)
81
                    print(f"  ✓ Set {setting} = {value}")
82

83
            # Set optimal MTU
84
            subprocess.run([
85
                'sudo', 'ip', 'link', 'set', 'dev', bridge_name, 'mtu', '9000'
86
            ], check=True)
87

88
            # Add interfaces to bridge
89
            if interfaces:
90
                for interface in interfaces:
91
                    subprocess.run([
92
                        'sudo', 'ip', 'link', 'set', 'dev', interface, 'master', bridge_name
93
                    ], check=True)
94
                    print(f"  ✓ Added {interface} to bridge {bridge_name}")
95

96
            # Bring bridge up
97
            subprocess.run([
98
                'sudo', 'ip', 'link', 'set', 'dev', bridge_name, 'up'
99
            ], check=True)
100

101
            print(f"✓ Created optimized bridge {bridge_name}")
102
            self.bridge_interfaces.append(bridge_name)
103

104
            return True
105

106
        except subprocess.CalledProcessError as e:
107
            print(f"✗ Failed to create bridge {bridge_name}: {e}")
108
            return False
109

110
    def optimize_existing_interfaces(self):
111
        """Optimize existing network interfaces"""
112

113
        print("Optimizing existing network interfaces...")
114

115
        # Get all network interfaces
116
        result = subprocess.run(['ip', 'link', 'show'],
117
                              capture_output=True, text=True, check=True)
118

119
        interfaces = []
120
        for line in result.stdout.split('\n'):
121
            if ': <' in line and not line.strip().startswith('lo:'):
122
                interface = line.split(':')[1].strip().split('@')[0]
123
                interfaces.append(interface)
124

125
        for interface in interfaces:
126
            self._optimize_interface(interface)
127

128
    def _optimize_interface(self, interface):
129
        """Optimize individual network interface"""
130

131
        print(f"Optimizing interface: {interface}")
132

133
        try:
134
            # Check if interface supports ethtool
135
            result = subprocess.run(['sudo', 'ethtool', interface],
136
                                  capture_output=True, text=True, check=False)
137

138
            if result.returncode == 0:
139
                # Optimize interrupt coalescing
140
                subprocess.run([
141
                    'sudo', 'ethtool', '-C', interface,
142
                    'rx-usecs', '10', 'tx-usecs', '10'
143
                ], check=False)
144

145
                # Optimize ring buffers
146
                subprocess.run([
147
                    'sudo', 'ethtool', '-G', interface,
148
                    'rx', '4096', 'tx', '4096'
149
                ], check=False)
150

151
                # Enable offloading features
152
                offload_features = [
153
                    'rx', 'tx', 'sg', 'tso', 'gso', 'gro', 'lro'
154
                ]
155

156
                for feature in offload_features:
157
                    subprocess.run([
158
                        'sudo', 'ethtool', '-K', interface, feature, 'on'
159
                    ], check=False)
160

161
                print(f"  ✓ Optimized {interface}")
162
            else:
163
                print(f"  ⚠ Cannot optimize {interface} (ethtool not supported)")
164

165
        except Exception as e:
166
            print(f"  ✗ Failed to optimize {interface}: {e}")
167

168
    def configure_tc_qdisc(self, interface, bandwidth_mbps=1000):
169
        """Configure traffic control queuing discipline"""
170

171
        print(f"Configuring TC qdisc for {interface}")
172

173
        try:
174
            # Remove existing qdisc
175
            subprocess.run([
176
                'sudo', 'tc', 'qdisc', 'del', 'dev', interface, 'root'
177
            ], check=False)
178

179
            # Add FQ (Fair Queue) qdisc for better performance
180
            subprocess.run([
181
                'sudo', 'tc', 'qdisc', 'add', 'dev', interface, 'root',
182
                'handle', '1:', 'fq'
183
            ], check=True)
184

185
            # Add rate limiting (optional)
186
            if bandwidth_mbps:
187
                rate = f'{bandwidth_mbps}mbit'
188
                subprocess.run([
189
                    'sudo', 'tc', 'qdisc', 'add', 'dev', interface, 'parent', '1:',
190
                    'handle', '10:', 'tbf', 'rate', rate, 'burst', '32kbit', 'latency', '50ms'
191
                ], check=True)
192

193
            print(f"  ✓ Configured TC qdisc for {interface}")
194

195
        except subprocess.CalledProcessError as e:
196
            print(f"  ✗ Failed to configure TC qdisc for {interface}: {e}")
197

198
    def benchmark_network_performance(self, target_ip, duration=10):
199
        """Benchmark network performance"""
200

201
        print(f"=== Network Performance Benchmark ===")
202
        print(f"Target: {target_ip}, Duration: {duration}s")
203

204
        # Test with iperf3 (if available)
205
        try:
206
            subprocess.run(['which', 'iperf3'], check=True, capture_output=True)
207

208
            print("Running iperf3 TCP test...")
209
            result = subprocess.run([
210
                'iperf3', '-c', target_ip, '-t', str(duration), '-P', '4'
211
            ], capture_output=True, text=True, check=True)
212

213
            # Parse iperf3 output
214
            for line in result.stdout.split('\n'):
215
                if 'SUM' in line and 'Mbits/sec' in line:
216
                    print(f"  TCP throughput: {line.split()[-2]} Mbits/sec")
217

218
            print("Running iperf3 UDP test...")
219
            result = subprocess.run([
220
                'iperf3', '-c', target_ip, '-u', '-b', '1G', '-t', str(duration)
221
            ], capture_output=True, text=True, check=True)
222

223
            for line in result.stdout.split('\n'):
224
                if 'Mbits/sec' in line and 'datagrams' in line:
225
                    parts = line.split()
226
                    for i, part in enumerate(parts):
227
                        if 'Mbits/sec' in part:
228
                            print(f"  UDP throughput: {parts[i-1]} Mbits/sec")
229
                            break
230

231
        except (subprocess.CalledProcessError, FileNotFoundError):
232
            print("  ⚠ iperf3 not available for benchmarking")
233

234
            # Fallback to ping test
235
            print("Running ping latency test...")
236
            result = subprocess.run([
237
                'ping', '-c', '10', target_ip
238
            ], capture_output=True, text=True, check=True)
239

240
            for line in result.stdout.split('\n'):
241
                if 'avg' in line and 'ms' in line:
242
                    print(f"  Ping latency: {line}")
243

244
    def generate_network_config(self, vm_id, tap_name=None, bridge_name=None):
245
        """Generate optimized network configuration for Firecracker"""
246

247
        if tap_name is None:
248
            tap_name = f'tap-{vm_id}'
249
        if bridge_name is None:
250
            bridge_name = f'br-{vm_id}'
251

252
        config = {
253
            "network-interfaces": [
254
                {
255
                    "iface_id": "eth0",
256
                    "guest_mac": f"AA:FC:{vm_id[-2:]:0>2}:00:00:01",
257
                    "host_dev_name": tap_name,
258
                    "rx_rate_limiter": {
259
                        "bandwidth": {
260
                            "size": 125000000,  # 1 Gbps
261
                            "refill_time": 100
262
                        },
263
                        "ops": {
264
                            "size": 100000,
265
                            "refill_time": 100
266
                        }
267
                    },
268
                    "tx_rate_limiter": {
269
                        "bandwidth": {
270
                            "size": 125000000,  # 1 Gbps
271
                            "refill_time": 100
272
                        },
273
                        "ops": {
274
                            "size": 100000,
275
                            "refill_time": 100
276
                        }
277
                    }
278
                }
279
            ]
280
        }
281

282
        return config
283

284
    def run_network_optimization(self):
285
        """Run complete network optimization suite"""
286

287
        print("=== Network Optimization Suite ===")
288

289
        # Optimize existing interfaces
290
        self.optimize_existing_interfaces()
291

292
        # Create test TAP interface
293
        self.create_optimized_tap_interface('tap-test', mtu=9000)
294

295
        # Create test bridge
296
        self.create_optimized_bridge('br-test', ['tap-test'])
297

298
        # Print recommendations
299
        self._print_recommendations()
300

301
    def _print_recommendations(self):
302
        """Print network optimization recommendations"""
303

304
        print("\n=== Network Optimization Recommendations ===")
305
        print("1. Use SR-IOV for best performance with physical NICs")
306
        print("2. Enable jumbo frames (MTU 9000) where possible")
307
        print("3. Use multiple queues for high-throughput workloads")
308
        print("4. Disable unnecessary protocol features (timestamps, SACK)")
309
        print("5. Use BBR congestion control for better throughput")
310
        print("6. Configure interrupt coalescing for lower CPU usage")
311
        print("7. Pin network interrupts to specific CPUs")
312
        print("8. Use DPDK for ultra-low latency applications")
313

314
if __name__ == '__main__':
315
    optimizer = NetworkOptimizer()
316
    optimizer.run_network_optimization()

Guest System Optimization#

Guest Kernel Configuration#

1
#!/bin/bash
2

3
# Guest kernel optimization for Firecracker
4
echo "=== Guest Kernel Optimization ==="
5

6
# Create optimized guest kernel config
7
cat > /tmp/guest_kernel_config << 'EOF'
8
# Minimal guest kernel configuration for Firecracker
9

10
# Basic architecture
11
CONFIG_64BIT=y
12
CONFIG_X86_64=y
13
CONFIG_SMP=y
14

15
# Virtualization features
16
CONFIG_PARAVIRT=y
17
CONFIG_PARAVIRT_SPINLOCKS=y
18
CONFIG_KVM_GUEST=y
19
CONFIG_HYPERVISOR_GUEST=y
20

21
# Essential virtio drivers
22
CONFIG_VIRTIO=y
23
CONFIG_VIRTIO_PCI=y
24
CONFIG_VIRTIO_BLK=y
25
CONFIG_VIRTIO_NET=y
26
CONFIG_VIRTIO_CONSOLE=y
27
CONFIG_VIRTIO_VSOCKETS=y
28

29
# Memory management optimizations
30
CONFIG_TRANSPARENT_HUGEPAGE=n
31
CONFIG_HUGETLBFS=y
32
CONFIG_HUGETLB_PAGE=y
33
CONFIG_MEMORY_HOTPLUG=y
34
CONFIG_MEMORY_HOTREMOVE=y
35

36
# CPU optimizations
37
CONFIG_PREEMPT_NONE=y
38
CONFIG_NO_HZ=y
39
CONFIG_NO_HZ_IDLE=y
40
CONFIG_HIGH_RES_TIMERS=y
41
CONFIG_GENERIC_CLOCKEVENTS=y
42

43
# I/O optimizations
44
CONFIG_BLOCK=y
45
CONFIG_BLK_DEV_LOOP=n
46
CONFIG_BLK_DEV_RAM=n
47

48
# Network optimizations
49
CONFIG_NET=y
50
CONFIG_INET=y
51
CONFIG_TCP_CONG_BBR=y
52
CONFIG_TCP_CONG_CUBIC=y
53
CONFIG_NET_SCH_FQ=y
54

55
# Security features (minimal)
56
CONFIG_SECURITY=y
57
CONFIG_SECCOMP=y
58
CONFIG_SECCOMP_FILTER=y
59

60
# Disable unnecessary features
61
CONFIG_SUSPEND=n
62
CONFIG_HIBERNATION=n
63
CONFIG_ACPI=n
64
CONFIG_PCI=n
65
CONFIG_USB=n
66
CONFIG_SOUND=n
67
CONFIG_DRM=n
68
CONFIG_WIRELESS=n
69
CONFIG_BLUETOOTH=n
70
CONFIG_MODULES=n
71

72
# Minimal debugging
73
CONFIG_DEBUG_KERNEL=n
74
CONFIG_SLUB_DEBUG=n
75
CONFIG_DEBUG_INFO=n
76
CONFIG_FRAME_POINTER=n
77
CONFIG_STACK_TRACER=n
78
CONFIG_FUNCTION_TRACER=n
79

80
# File systems (minimal)
81
CONFIG_EXT4_FS=y
82
CONFIG_EXT4_USE_FOR_EXT2=y
83
CONFIG_EXT4_FS_POSIX_ACL=n
84
CONFIG_EXT4_FS_SECURITY=n
85
CONFIG_PROC_FS=y
86
CONFIG_SYSFS=y
87
CONFIG_TMPFS=y
88
CONFIG_DEVTMPFS=y
89
CONFIG_DEVTMPFS_MOUNT=y
90

91
# 9P for virtio-fs
92
CONFIG_9P_FS=y
93
CONFIG_9P_VIRTIO=y
94

95
# Console support
96
CONFIG_TTY=y
97
CONFIG_SERIAL_8250=y
98
CONFIG_SERIAL_8250_CONSOLE=y
99
EOF
100

101
echo "Guest kernel config created at /tmp/guest_kernel_config"
102

103
# Create guest init script for optimization
104
cat > /tmp/guest_init_optimization.sh << 'EOF'
105
#!/bin/sh
106

107
# Guest system optimization script
108
echo "Running guest optimization..."
109

110
# Disable unnecessary services
111
for service in rsyslog cron anacron; do
112
    systemctl stop $service 2>/dev/null || true
113
    systemctl disable $service 2>/dev/null || true
114
done
115

116
# Optimize memory settings
117
echo 1 > /proc/sys/vm/swappiness
118
echo 10 > /proc/sys/vm/dirty_ratio
119
echo 5 > /proc/sys/vm/dirty_background_ratio
120

121
# Optimize network settings
122
echo 1 > /proc/sys/net/ipv4/tcp_window_scaling
123
echo 1 > /proc/sys/net/ipv4.tcp_timestamps
124
echo bbr > /proc/sys/net/ipv4/tcp_congestion_control
125

126
# Optimize I/O scheduler for virtio devices
127
for device in $(ls /sys/block/ | grep vd); do
128
    echo mq-deadline > /sys/block/$device/queue/scheduler
129
    echo 512 > /sys/block/$device/queue/read_ahead_kb
130
done
131

132
# Mount tmpfs for frequently accessed directories
133
mount -t tmpfs tmpfs /tmp -o size=256M,noatime
134
mount -t tmpfs tmpfs /var/tmp -o size=64M,noatime
135

136
echo "Guest optimization complete"
137
EOF
138

139
chmod +x /tmp/guest_init_optimization.sh
140
echo "Guest init optimization script created"

Application-Level Optimization#

1
#!/usr/bin/env python3
2
import os
3
import json
4
import psutil
5
from pathlib import Path
6

7
class GuestApplicationOptimizer:
8
    """Optimize applications running in Firecracker guests"""
9

10
    def __init__(self):
11
        self.optimization_profiles = {
12
            'web_server': self._web_server_profile(),
13
            'database': self._database_profile(),
14
            'compute': self._compute_profile(),
15
            'microservice': self._microservice_profile()
16
        }
17

18
    def _web_server_profile(self):
19
        """Optimization profile for web servers"""
20
        return {
21
            'name': 'Web Server',
22
            'memory': {
23
                'limit_mb': 512,
24
                'balloon_size_mb': 256,
25
                'prealloc': True
26
            },
27
            'cpu': {
28
                'vcpus': 2,
29
                'shares': 1024,
30
                'quota_us': 100000  # 100% of 1 CPU
31
            },
32
            'network': {
33
                'bandwidth_mbps': 100,
34
                'burst_mb': 10,
35
                'latency_ms': 1
36
            },
37
            'io': {
38
                'read_iops': 5000,
39
                'write_iops': 2000,
40
                'read_bps': 104857600,  # 100 MB/s
41
                'write_bps': 52428800   # 50 MB/s
42
            },
43
            'applications': {
44
                'nginx': {
45
                    'worker_processes': 'auto',
46
                    'worker_connections': 1024,
47
                    'keepalive_timeout': 65,
48
                    'gzip': 'on',
49
                    'sendfile': 'on',
50
                    'tcp_nopush': 'on',
51
                    'tcp_nodelay': 'on'
52
                }
53
            }
54
        }
55

56
    def _database_profile(self):
57
        """Optimization profile for databases"""
58
        return {
59
            'name': 'Database',
60
            'memory': {
61
                'limit_mb': 1024,
62
                'balloon_size_mb': 512,
63
                'prealloc': True,
64
                'hugepages': True
65
            },
66
            'cpu': {
67
                'vcpus': 2,
68
                'shares': 2048,
69
                'quota_us': 200000  # 200% of 1 CPU
70
            },
71
            'network': {
72
                'bandwidth_mbps': 1000,
73
                'burst_mb': 50,
74
                'latency_ms': 0.5
75
            },
76
            'io': {
77
                'read_iops': 20000,
78
                'write_iops': 10000,
79
                'read_bps': 524288000,  # 500 MB/s
80
                'write_bps': 262144000  # 250 MB/s
81
            },
82
            'applications': {
83
                'postgresql': {
84
                    'shared_buffers': '256MB',
85
                    'effective_cache_size': '768MB',
86
                    'maintenance_work_mem': '64MB',
87
                    'checkpoint_completion_target': 0.9,
88
                    'wal_buffers': '16MB',
89
                    'default_statistics_target': 100,
90
                    'random_page_cost': 1.1,
91
                    'effective_io_concurrency': 200
92
                }
93
            }
94
        }
95

96
    def _compute_profile(self):
97
        """Optimization profile for compute workloads"""
98
        return {
99
            'name': 'Compute',
100
            'memory': {
101
                'limit_mb': 2048,
102
                'balloon_size_mb': 1024,
103
                'prealloc': True,
104
                'hugepages': True
105
            },
106
            'cpu': {
107
                'vcpus': 4,
108
                'shares': 4096,
109
                'quota_us': 400000  # 400% of 1 CPU
110
            },
111
            'network': {
112
                'bandwidth_mbps': 10,
113
                'burst_mb': 1,
114
                'latency_ms': 10
115
            },
116
            'io': {
117
                'read_iops': 1000,
118
                'write_iops': 500,
119
                'read_bps': 10485760,   # 10 MB/s
120
                'write_bps': 5242880    # 5 MB/s
121
            }
122
        }
123

124
    def _microservice_profile(self):
125
        """Optimization profile for microservices"""
126
        return {
127
            'name': 'Microservice',
128
            'memory': {
129
                'limit_mb': 256,
130
                'balloon_size_mb': 128,
131
                'prealloc': False
132
            },
133
            'cpu': {
134
                'vcpus': 1,
135
                'shares': 512,
136
                'quota_us': 50000   # 50% of 1 CPU
137
            },
138
            'network': {
139
                'bandwidth_mbps': 50,
140
                'burst_mb': 5,
141
                'latency_ms': 2
142
            },
143
            'io': {
144
                'read_iops': 2000,
145
                'write_iops': 1000,
146
                'read_bps': 20971520,   # 20 MB/s
147
                'write_bps': 10485760   # 10 MB/s
148
            }
149
        }
150

151
    def generate_firecracker_config(self, profile_name, vm_id):
152
        """Generate Firecracker configuration from profile"""
153

154
        if profile_name not in self.optimization_profiles:
155
            raise ValueError(f"Unknown profile: {profile_name}")
156

157
        profile = self.optimization_profiles[profile_name]
158

159
        config = {
160
            "boot-source": {
161
                "kernel_image_path": f"/var/lib/firecracker/{vm_id}/vmlinux.bin",
162
                "boot_args": "console=ttyS0 reboot=k panic=1 pci=off nomodules ro"
163
            },
164
            "drives": [
165
                {
166
                    "drive_id": "rootfs",
167
                    "path_on_host": f"/var/lib/firecracker/{vm_id}/rootfs.ext4",
168
                    "is_root_device": True,
169
                    "is_read_only": False,
170
                    "rate_limiter": {
171
                        "bandwidth": {
172
                            "size": profile['io']['read_bps'],
173
                            "refill_time": 100
174
                        },
175
                        "ops": {
176
                            "size": profile['io']['read_iops'],
177
                            "refill_time": 100
178
                        }
179
                    }
180
                }
181
            ],
182
            "network-interfaces": [
183
                {
184
                    "iface_id": "eth0",
185
                    "guest_mac": f"AA:FC:{vm_id[-2:]:0>2}:00:00:01",
186
                    "host_dev_name": f"tap-{vm_id}",
187
                    "rx_rate_limiter": {
188
                        "bandwidth": {
189
                            "size": profile['network']['bandwidth_mbps'] * 125000,  # Convert to bytes/s
190
                            "refill_time": 100
191
                        },
192
                        "ops": {
193
                            "size": 1000,
194
                            "refill_time": 100
195
                        }
196
                    },
197
                    "tx_rate_limiter": {
198
                        "bandwidth": {
199
                            "size": profile['network']['bandwidth_mbps'] * 125000,
200
                            "refill_time": 100
201
                        },
202
                        "ops": {
203
                            "size": 1000,
204
                            "refill_time": 100
205
                        }
206
                    }
207
                }
208
            ],
209
            "machine-config": {
210
                "vcpu_count": profile['cpu']['vcpus'],
211
                "mem_size_mib": profile['memory']['limit_mb']
212
            },
213
            "balloon": {
214
                "amount_mib": profile['memory']['balloon_size_mb'],
215
                "deflate_on_oom": True,
216
                "stats_polling_interval_s": 1
217
            },
218
            "cpu-config": {
219
                "cpu_template": "T2"
220
            },
221
            "logger": {
222
                "level": "Warn",
223
                "log_path": f"/var/log/firecracker-{vm_id}.log"
224
            },
225
            "metrics": {
226
                "metrics_path": f"/tmp/firecracker-metrics-{vm_id}.json"
227
            }
228
        }
229

230
        return config
231

232
    def generate_application_config(self, profile_name, app_name):
233
        """Generate application-specific configuration"""
234

235
        if profile_name not in self.optimization_profiles:
236
            raise ValueError(f"Unknown profile: {profile_name}")
237

238
        profile = self.optimization_profiles[profile_name]
239

240
        if app_name not in profile.get('applications', {}):
241
            return None
242

243
        app_config = profile['applications'][app_name]
244

245
        # Generate configuration based on application
246
        if app_name == 'nginx':
247
            return self._generate_nginx_config(app_config)
248
        elif app_name == 'postgresql':
249
            return self._generate_postgresql_config(app_config)
250
        else:
251
            return app_config
252

253
    def _generate_nginx_config(self, config):
254
        """Generate optimized nginx configuration"""
255

256
        nginx_conf = f"""
257
user nginx;
258
worker_processes {config['worker_processes']};
259
error_log /var/log/nginx/error.log warn;
260
pid /var/run/nginx.pid;
261

262
events {{
263
    worker_connections {config['worker_connections']};
264
    use epoll;
265
    multi_accept on;
266
}}
267

268
http {{
269
    include /etc/nginx/mime.types;
270
    default_type application/octet-stream;
271

272
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
273
                    '$status $body_bytes_sent "$http_referer" '
274
                    '"$http_user_agent" "$http_x_forwarded_for"';
275

276
    access_log /var/log/nginx/access.log main;
277

278
    sendfile {config['sendfile']};
279
    tcp_nopush {config['tcp_nopush']};
280
    tcp_nodelay {config['tcp_nodelay']};
281
    keepalive_timeout {config['keepalive_timeout']};
282

283
    gzip {config['gzip']};
284
    gzip_vary on;
285
    gzip_proxied any;
286
    gzip_comp_level 6;
287
    gzip_types
288
        text/plain
289
        text/css
290
        text/xml
291
        text/javascript
292
        application/javascript
293
        application/xml+rss
294
        application/json;
295

296
    include /etc/nginx/conf.d/*.conf;
297
}}
298
"""
299
        return nginx_conf
300

301
    def _generate_postgresql_config(self, config):
302
        """Generate optimized PostgreSQL configuration"""
303

304
        pg_conf = f"""# PostgreSQL optimized configuration
305

306
# Memory settings
307
shared_buffers = {config['shared_buffers']}
308
effective_cache_size = {config['effective_cache_size']}
309
maintenance_work_mem = {config['maintenance_work_mem']}
310
work_mem = 4MB
311

312
# Checkpoint settings
313
checkpoint_completion_target = {config['checkpoint_completion_target']}
314
wal_buffers = {config['wal_buffers']}
315

316
# Query planner
317
default_statistics_target = {config['default_statistics_target']}
318
random_page_cost = {config['random_page_cost']}
319
effective_io_concurrency = {config['effective_io_concurrency']}
320

321
# Connection settings
322
max_connections = 100
323
shared_preload_libraries = 'pg_stat_statements'
324

325
# Logging
326
log_destination = 'stderr'
327
logging_collector = on
328
log_directory = 'pg_log'
329
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
330
log_statement = 'none'
331
log_min_duration_statement = 1000
332

333
# Performance monitoring
334
track_activities = on
335
track_counts = on
336
track_io_timing = on
337
track_functions = pl
338
"""
339
        return pg_conf
340

341
    def save_config(self, config, file_path):
342
        """Save configuration to file"""
343

344
        with open(file_path, 'w') as f:
345
            if isinstance(config, dict):
346
                json.dump(config, f, indent=2)
347
            else:
348
                f.write(config)
349

350
        print(f"✓ Saved configuration to {file_path}")
351

352
    def run_optimization_suite(self):
353
        """Run complete application optimization suite"""
354

355
        print("=== Application Optimization Suite ===")
356

357
        for profile_name in self.optimization_profiles:
358
            print(f"\nGenerating configuration for profile: {profile_name}")
359

360
            # Generate Firecracker config
361
            vm_id = f"{profile_name.replace('_', '-')}-001"
362
            fc_config = self.generate_firecracker_config(profile_name, vm_id)
363
            self.save_config(fc_config, f"/tmp/firecracker-{profile_name}.json")
364

365
            # Generate application configs
366
            profile = self.optimization_profiles[profile_name]
367
            for app_name in profile.get('applications', {}):
368
                app_config = self.generate_application_config(profile_name, app_name)
369
                if app_config:
370
                    if app_name == 'nginx':
371
                        self.save_config(app_config, f"/tmp/{app_name}-{profile_name}.conf")
372
                    elif app_name == 'postgresql':
373
                        self.save_config(app_config, f"/tmp/postgresql-{profile_name}.conf")
374

375
        print("\n=== Optimization Complete ===")
376
        print("Configuration files generated in /tmp/")
377

378
if __name__ == '__main__':
379
    optimizer = GuestApplicationOptimizer()
380
    optimizer.run_optimization_suite()

Performance Monitoring and Analysis#

Comprehensive Monitoring System#

1
#!/usr/bin/env python3
2
import json
3
import time
4
import psutil
5
import subprocess
6
from datetime import datetime
7
from pathlib import Path
8
import threading
9
from collections import defaultdict, deque
10

11
class FirecrackerPerformanceMonitor:
12
    """Comprehensive performance monitoring for Firecracker"""
13

14
    def __init__(self, collection_interval=5):
15
        self.collection_interval = collection_interval
16
        self.monitoring = False
17
        self.metrics_history = defaultdict(lambda: deque(maxlen=1000))
18
        self.firecracker_processes = {}
19
        self.vm_metrics = defaultdict(dict)
20

21
    def discover_firecracker_processes(self):
22
        """Discover running Firecracker processes"""
23

24
        processes = {}
25
        for proc in psutil.process_iter(['pid', 'name', 'cmdline', 'create_time']):
26
            try:
27
                if proc.info['name'] == 'firecracker':
28
                    vm_id = self._extract_vm_id(proc.info['cmdline'])
29
                    processes[vm_id] = {
30
                        'pid': proc.info['pid'],
31
                        'process': proc,
32
                        'start_time': proc.info['create_time']
33
                    }
34
            except (psutil.NoSuchProcess, psutil.AccessDenied):
35
                continue
36

37
        self.firecracker_processes = processes
38
        return processes
39

40
    def _extract_vm_id(self, cmdline):
41
        """Extract VM ID from command line"""
42

43
        # Try to extract from socket path or config file
44
        for arg in cmdline:
45
            if 'socket' in arg or 'api-sock' in arg:
46
                # Extract from socket path
47
                parts = arg.split('/')
48
                for part in parts:
49
                    if 'vm' in part or 'firecracker' in part:
50
                        return part
51

52
        # Fallback to PID-based ID
53
        return f"vm-{cmdline[0].split('/')[-1]}"
54

55
    def collect_system_metrics(self):
56
        """Collect system-level performance metrics"""
57

58
        timestamp = time.time()
59

60
        # CPU metrics
61
        cpu_percent = psutil.cpu_percent(interval=None, percpu=True)
62
        cpu_freq = psutil.cpu_freq()
63

64
        # Memory metrics
65
        memory = psutil.virtual_memory()
66
        swap = psutil.swap_memory()
67

68
        # Disk I/O metrics
69
        disk_io = psutil.disk_io_counters()
70

71
        # Network I/O metrics
72
        net_io = psutil.net_io_counters()
73

74
        # Load averages
75
        load_avg = os.getloadavg() if hasattr(os, 'getloadavg') else (0, 0, 0)
76

77
        system_metrics = {
78
            'timestamp': timestamp,
79
            'cpu': {
80
                'percent_per_core': cpu_percent,
81
                'percent_total': sum(cpu_percent) / len(cpu_percent),
82
                'frequency_mhz': cpu_freq.current if cpu_freq else 0,
83
                'load_avg': load_avg
84
            },
85
            'memory': {
86
                'total_bytes': memory.total,
87
                'available_bytes': memory.available,
88
                'percent': memory.percent,
89
                'used_bytes': memory.used,
90
                'free_bytes': memory.free,
91
                'buffers_bytes': getattr(memory, 'buffers', 0),
92
                'cached_bytes': getattr(memory, 'cached', 0)
93
            },
94
            'swap': {
95
                'total_bytes': swap.total,
96
                'used_bytes': swap.used,
97
                'free_bytes': swap.free,
98
                'percent': swap.percent
99
            },
100
            'disk_io': {
101
                'read_bytes': disk_io.read_bytes if disk_io else 0,
102
                'write_bytes': disk_io.write_bytes if disk_io else 0,
103
                'read_count': disk_io.read_count if disk_io else 0,
104
                'write_count': disk_io.write_count if disk_io else 0,
105
                'read_time': disk_io.read_time if disk_io else 0,
106
                'write_time': disk_io.write_time if disk_io else 0
107
            },
108
            'network_io': {
109
                'bytes_sent': net_io.bytes_sent if net_io else 0,
110
                'bytes_recv': net_io.bytes_recv if net_io else 0,
111
                'packets_sent': net_io.packets_sent if net_io else 0,
112
                'packets_recv': net_io.packets_recv if net_io else 0,
113
                'errin': net_io.errin if net_io else 0,
114
                'errout': net_io.errout if net_io else 0,
115
                'dropin': net_io.dropin if net_io else 0,
116
                'dropout': net_io.dropout if net_io else 0
117
            }
118
        }
119

120
        self.metrics_history['system'].append(system_metrics)
121
        return system_metrics
122

123
    def collect_firecracker_metrics(self):
124
        """Collect Firecracker-specific metrics"""
125

126
        timestamp = time.time()
127

128
        for vm_id, proc_info in self.firecracker_processes.items():
129
            try:
130
                process = proc_info['process']
131

132
                # Process-level metrics
133
                cpu_percent = process.cpu_percent()
134
                memory_info = process.memory_info()
135
                io_counters = process.io_counters()
136
                num_fds = process.num_fds()
137

138
                vm_metrics = {
139
                    'timestamp': timestamp,
140
                    'vm_id': vm_id,
141
                    'pid': proc_info['pid'],
142
                    'cpu_percent': cpu_percent,
143
                    'memory': {
144
                        'rss_bytes': memory_info.rss,
145
                        'vms_bytes': memory_info.vms,
146
                        'shared_bytes': getattr(memory_info, 'shared', 0),
147
                        'text_bytes': getattr(memory_info, 'text', 0),
148
                        'data_bytes': getattr(memory_info, 'data', 0)
149
                    },
150
                    'io': {
151
                        'read_count': io_counters.read_count,
152
                        'write_count': io_counters.write_count,
153
                        'read_bytes': io_counters.read_bytes,
154
                        'write_bytes': io_counters.write_bytes
155
                    },
156
                    'file_descriptors': num_fds,
157
                    'uptime_seconds': time.time() - proc_info['start_time']
158
                }
159

160
                # Try to collect Firecracker-specific metrics from API
161
                api_metrics = self._collect_firecracker_api_metrics(vm_id)
162
                if api_metrics:
163
                    vm_metrics['firecracker_api'] = api_metrics
164

165
                self.metrics_history[vm_id].append(vm_metrics)
166

167
            except (psutil.NoSuchProcess, psutil.AccessDenied):
168
                # Process might have terminated
169
                continue
170

171
    def _collect_firecracker_api_metrics(self, vm_id):
172
        """Collect metrics from Firecracker API"""
173

174
        socket_path = f"/tmp/firecracker-{vm_id}.socket"
175
        if not Path(socket_path).exists():
176
            return None
177

178
        try:
179
            import requests_unixsocket
180
            session = requests_unixsocket.Session()
181
            base_url = f'http+unix://{socket_path.replace("/", "%2F")}'
182

183
            response = session.get(f'{base_url}/metrics')
184
            if response.status_code == 200:
185
                return response.json()
186
        except Exception:
187
            pass
188

189
        return None
190

191
    def calculate_performance_stats(self, vm_id, window_minutes=5):
192
        """Calculate performance statistics over time window"""
193

194
        if vm_id not in self.metrics_history:
195
            return None
196

197
        metrics = self.metrics_history[vm_id]
198
        if len(metrics) < 2:
199
            return None
200

201
        # Filter to time window
202
        current_time = time.time()
203
        window_start = current_time - (window_minutes * 60)
204

205
        windowed_metrics = [m for m in metrics if m['timestamp'] >= window_start]
206
        if len(windowed_metrics) < 2:
207
            return None
208

209
        # Calculate statistics
210
        cpu_values = [m['cpu_percent'] for m in windowed_metrics]
211
        memory_values = [m['memory']['rss_bytes'] for m in windowed_metrics]
212

213
        # I/O rates (calculate differences)
214
        io_read_rates = []
215
        io_write_rates = []
216

217
        for i in range(1, len(windowed_metrics)):
218
            prev = windowed_metrics[i-1]
219
            curr = windowed_metrics[i]
220
            time_diff = curr['timestamp'] - prev['timestamp']
221

222
            if time_diff > 0:
223
                read_rate = (curr['io']['read_bytes'] - prev['io']['read_bytes']) / time_diff
224
                write_rate = (curr['io']['write_bytes'] - prev['io']['write_bytes']) / time_diff
225
                io_read_rates.append(read_rate)
226
                io_write_rates.append(write_rate)
227

228
        stats = {
229
            'vm_id': vm_id,
230
            'window_minutes': window_minutes,
231
            'sample_count': len(windowed_metrics),
232
            'cpu': {
233
                'min_percent': min(cpu_values),
234
                'max_percent': max(cpu_values),
235
                'avg_percent': sum(cpu_values) / len(cpu_values),
236
                'current_percent': cpu_values[-1]
237
            },
238
            'memory': {
239
                'min_bytes': min(memory_values),
240
                'max_bytes': max(memory_values),
241
                'avg_bytes': sum(memory_values) / len(memory_values),
242
                'current_bytes': memory_values[-1]
243
            },
244
            'io': {
245
                'avg_read_bps': sum(io_read_rates) / len(io_read_rates) if io_read_rates else 0,
246
                'avg_write_bps': sum(io_write_rates) / len(io_write_rates) if io_write_rates else 0,
247
                'max_read_bps': max(io_read_rates) if io_read_rates else 0,
248
                'max_write_bps': max(io_write_rates) if io_write_rates else 0
249
            }
250
        }
251

252
        return stats
253

254
    def detect_performance_anomalies(self, vm_id, thresholds=None):
255
        """Detect performance anomalies"""
256

257
        if thresholds is None:
258
            thresholds = {
259
                'cpu_percent': 80,
260
                'memory_mb': 512,
261
                'io_read_mbps': 100,
262
                'io_write_mbps': 50
263
            }
264

265
        stats = self.calculate_performance_stats(vm_id, window_minutes=5)
266
        if not stats:
267
            return []
268

269
        anomalies = []
270

271
        # CPU anomalies
272
        if stats['cpu']['avg_percent'] > thresholds['cpu_percent']:
273
            anomalies.append({
274
                'type': 'high_cpu',
275
                'severity': 'warning' if stats['cpu']['avg_percent'] < 90 else 'critical',
276
                'value': stats['cpu']['avg_percent'],
277
                'threshold': thresholds['cpu_percent'],
278
                'message': f"High CPU usage: {stats['cpu']['avg_percent']:.1f}%"
279
            })
280

281
        # Memory anomalies
282
        memory_mb = stats['memory']['current_bytes'] / (1024 * 1024)
283
        if memory_mb > thresholds['memory_mb']:
284
            anomalies.append({
285
                'type': 'high_memory',
286
                'severity': 'warning',
287
                'value': memory_mb,
288
                'threshold': thresholds['memory_mb'],
289
                'message': f"High memory usage: {memory_mb:.1f}MB"
290
            })
291

292
        # I/O anomalies
293
        read_mbps = stats['io']['max_read_bps'] / (1024 * 1024)
294
        write_mbps = stats['io']['max_write_bps'] / (1024 * 1024)
295

296
        if read_mbps > thresholds['io_read_mbps']:
297
            anomalies.append({
298
                'type': 'high_io_read',
299
                'severity': 'info',
300
                'value': read_mbps,
301
                'threshold': thresholds['io_read_mbps'],
302
                'message': f"High I/O read rate: {read_mbps:.1f}MB/s"
303
            })
304

305
        if write_mbps > thresholds['io_write_mbps']:
306
            anomalies.append({
307
                'type': 'high_io_write',
308
                'severity': 'info',
309
                'value': write_mbps,
310
                'threshold': thresholds['io_write_mbps'],
311
                'message': f"High I/O write rate: {write_mbps:.1f}MB/s"
312
            })
313

314
        return anomalies
315

316
    def generate_performance_report(self, output_file=None):
317
        """Generate comprehensive performance report"""
318

319
        current_time = datetime.now()
320

321
        report = {
322
            'timestamp': current_time.isoformat(),
323
            'system_overview': {},
324
            'vm_summary': {},
325
            'performance_analysis': {},
326
            'anomalies': {},
327
            'recommendations': []
328
        }
329

330
        # System overview
331
        if self.metrics_history['system']:
332
            latest_system = self.metrics_history['system'][-1]
333
            report['system_overview'] = {
334
                'cpu_usage_percent': latest_system['cpu']['percent_total'],
335
                'memory_usage_percent': latest_system['memory']['percent'],
336
                'load_average': latest_system['cpu']['load_avg'],
337
                'total_vms': len(self.firecracker_processes)
338
            }
339

340
        # VM summary and analysis
341
        for vm_id in self.firecracker_processes:
342
            stats = self.calculate_performance_stats(vm_id)
343
            if stats:
344
                report['vm_summary'][vm_id] = stats
345

346
                # Detect anomalies
347
                anomalies = self.detect_performance_anomalies(vm_id)
348
                if anomalies:
349
                    report['anomalies'][vm_id] = anomalies
350

351
        # Generate recommendations
352
        report['recommendations'] = self._generate_recommendations(report)
353

354
        # Save report
355
        if output_file is None:
356
            output_file = f"firecracker_performance_report_{int(time.time())}.json"
357

358
        with open(output_file, 'w') as f:
359
            json.dump(report, f, indent=2)
360

361
        print(f"Performance report saved to {output_file}")
362
        return report
363

364
    def _generate_recommendations(self, report):
365
        """Generate performance optimization recommendations"""
366

367
        recommendations = []
368

369
        # System-level recommendations
370
        system = report['system_overview']
371
        if system.get('cpu_usage_percent', 0) > 80:
372
            recommendations.append({
373
                'category': 'system',
374
                'type': 'cpu',
375
                'priority': 'high',
376
                'message': 'High system CPU usage detected. Consider CPU isolation or reducing VM density.'
377
            })
378

379
        if system.get('memory_usage_percent', 0) > 85:
380
            recommendations.append({
381
                'category': 'system',
382
                'type': 'memory',
383
                'priority': 'high',
384
                'message': 'High system memory usage. Consider enabling memory ballooning or reducing VM memory allocation.'
385
            })
386

387
        # VM-specific recommendations
388
        for vm_id, anomalies in report['anomalies'].items():
389
            for anomaly in anomalies:
390
                if anomaly['type'] == 'high_cpu':
391
                    recommendations.append({
392
                        'category': 'vm',
393
                        'vm_id': vm_id,
394
                        'type': 'cpu',
395
                        'priority': 'medium',
396
                        'message': f'VM {vm_id} has high CPU usage. Consider CPU limits or optimization.'
397
                    })
398
                elif anomaly['type'] == 'high_memory':
399
                    recommendations.append({
400
                        'category': 'vm',
401
                        'vm_id': vm_id,
402
                        'type': 'memory',
403
                        'priority': 'medium',
404
                        'message': f'VM {vm_id} has high memory usage. Consider memory ballooning.'
405
                    })
406

407
        return recommendations
408

409
    def start_monitoring(self):
410
        """Start performance monitoring in background thread"""
411

412
        if self.monitoring:
413
            print("Monitoring already started")
414
            return
415

416
        self.monitoring = True
417

418
        def monitoring_loop():
419
            print(f"Started performance monitoring (interval: {self.collection_interval}s)")
420

421
            while self.monitoring:
422
                try:
423
                    # Discover processes
424
                    self.discover_firecracker_processes()
425

426
                    # Collect metrics
427
                    self.collect_system_metrics()
428
                    self.collect_firecracker_metrics()
429

430
                    time.sleep(self.collection_interval)
431

432
                except Exception as e:
433
                    print(f"Error in monitoring loop: {e}")
434
                    time.sleep(self.collection_interval)
435

436
            print("Performance monitoring stopped")
437

438
        self.monitoring_thread = threading.Thread(target=monitoring_loop, daemon=True)
439
        self.monitoring_thread.start()
440

441
    def stop_monitoring(self):
442
        """Stop performance monitoring"""
443

444
        self.monitoring = False
445
        if hasattr(self, 'monitoring_thread'):
446
            self.monitoring_thread.join(timeout=10)
447

448
    def print_live_dashboard(self):
449
        """Print live performance dashboard"""
450

451
        os.system('clear')  # Clear screen
452

453
        print("=" * 80)
454
        print(f"Firecracker Performance Dashboard - {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
455
        print("=" * 80)
456

457
        # System overview
458
        if self.metrics_history['system']:
459
            latest = self.metrics_history['system'][-1]
460
            print(f"System CPU: {latest['cpu']['percent_total']:.1f}% | "
461
                  f"Memory: {latest['memory']['percent']:.1f}% | "
462
                  f"Load: {latest['cpu']['load_avg'][0]:.2f}")
463
            print()
464

465
        # VM status
466
        print(f"Running VMs: {len(self.firecracker_processes)}")
467
        print("-" * 80)
468

469
        for vm_id in self.firecracker_processes:
470
            if vm_id in self.metrics_history and self.metrics_history[vm_id]:
471
                latest = self.metrics_history[vm_id][-1]
472
                memory_mb = latest['memory']['rss_bytes'] / (1024 * 1024)
473
                uptime = int(latest['uptime_seconds'])
474

475
                print(f"{vm_id:15} | CPU: {latest['cpu_percent']:5.1f}% | "
476
                      f"Memory: {memory_mb:6.1f}MB | Uptime: {uptime:5d}s")
477

478
        print("-" * 80)
479

480
if __name__ == '__main__':
481
    import os
482

483
    monitor = FirecrackerPerformanceMonitor(collection_interval=5)
484

485
    try:
486
        monitor.start_monitoring()
487

488
        # Live dashboard
489
        while True:
490
            monitor.print_live_dashboard()
491
            time.sleep(5)
492

493
    except KeyboardInterrupt:
494
        print("\nStopping monitoring...")
495
        monitor.stop_monitoring()
496

497
        # Generate final report
498
        report = monitor.generate_performance_report()
499
        print("Final performance report generated")

Conclusion#

Optimizing Firecracker performance requires a holistic approach spanning multiple layers of the stack. Key optimization areas include:

🖥️ Host System: CPU isolation, memory management, kernel tuning
⚙️ Firecracker VMM: Process optimization, resource configuration, API tuning
💾 Storage: NVMe optimization, filesystem tuning, I/O scheduling
🌐 Network: Buffer optimization, interrupt handling, congestion control
🔧 Guest System: Minimal kernels, application-specific tuning
📊 Monitoring: Comprehensive metrics, anomaly detection, performance analysis

By implementing these optimizations systematically and monitoring performance continuously, you can achieve maximum efficiency from your Firecracker deployment while maintaining security and isolation guarantees.