Netflix eBPF Infrastructure Observability: Detecting Noisy Neighbors at Scale
Netflix’s Compute and Performance Engineering teams have revolutionized infrastructure observability by leveraging eBPF to detect “noisy neighbors” in multi-tenant systems. This innovative approach enables continuous monitoring of process scheduling with minimal performance impact, addressing critical challenges in large-scale distributed environments.
The Multi-Tenant Challenge: Noisy Neighbors
graph TB subgraph "Multi-Tenant System Challenges" subgraph "Container 1 - Normal Workload" C1[CPU: 20%] --> C1P[Performance: Good] C1M[Memory: 512MB] --> C1P end
subgraph "Container 2 - Noisy Neighbor" C2[CPU: 95%] --> C2I[High CPU Usage] C2M[Memory: 2GB] --> C2I C2I --> Impact[Performance Impact] end
subgraph "Container 3 - Affected" C3[CPU: 15%] --> C3P[Performance: Degraded] C3L[Latency: High] --> C3P end
Impact --> C3P Impact --> C1P end
style C2I fill:#ffcdd2 style Impact fill:#ffcdd2 style C3P fill:#fff3e0
What Are Noisy Neighbors?
In multi-tenant systems, noisy neighbors are processes or containers that consume excessive resources, negatively impacting the performance of other workloads running on the same physical host. These can manifest as:
- CPU-intensive processes that monopolize CPU cycles
- Memory-hungry applications that cause memory pressure
- I/O-bound workloads that saturate disk or network resources
- Scheduler interference that disrupts process execution timing
Traditional Detection Challenges
Conventional approaches to noisy neighbor detection face several limitations:
sequenceDiagram participant Problem as Performance Issue participant Alert as Alert System participant Engineer as Engineer participant Tools as Analysis Tools participant Resolution as Resolution
Problem->>Alert: Performance degradation detected Alert->>Engineer: Alert triggered Engineer->>Tools: Deploy perf/profiling tools
rect rgb(255, 205, 210) Note over Tools: High overhead analysis Note over Tools: Requires application restart Note over Tools: Post-incident deployment end
Tools->>Engineer: Analysis results Engineer->>Resolution: Apply fixes
Note over Problem,Resolution: Problem already impacted users
Key Limitations
- Reactive Nature: Tools are typically deployed after performance issues have already occurred
- High Overhead: Analysis tools like
perf
introduce significant performance overhead - Expertise Requirements: Require specialized engineering knowledge to operate effectively
- System Disruption: Often require application restarts or recompilation for instrumentation
Netflix’s eBPF Solution
Netflix’s innovative approach uses eBPF to instrument the Linux kernel for continuous, low-overhead monitoring of the scheduler subsystem.
Architecture Overview
graph TB subgraph "Kernel Space" subgraph "eBPF Hooks" H1[sched_wakeup] --> Metrics[Process Latency Calculation] H2[sched_wakeup_new] --> Metrics H3[sched_switch] --> Metrics end
subgraph "Scheduler Events" E1[Process Ready] --> H1 E2[New Process] --> H2 E3[Context Switch] --> H3 end end
subgraph "User Space" subgraph "Data Processing" RB[Ring Buffer] --> Go[Go Application] Go --> Atlas[Atlas Metrics Backend] end
subgraph "Monitoring" Atlas --> Dashboards[Monitoring Dashboards] Atlas --> Alerts[Alert Systems] end end
Metrics --> RB
style H1 fill:#e1f5fe style H2 fill:#e1f5fe style H3 fill:#e1f5fe style Go fill:#c8e6c9
Core eBPF Implementation
#include <vmlinux.h>#include <bpf/bpf_helpers.h>#include <bpf/bpf_tracing.h>#include <bpf/bpf_core_read.h>
// Process latency tracking structurestruct process_event { __u32 pid; __u32 tgid; __u32 cgroup_id; __u64 wakeup_time; __u64 schedule_time; __u64 latency_ns; __u32 preempted_by_pid; __u8 throttled; char comm[16];};
// Maps for tracking process statesstruct { __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 65536); __type(key, __u32); __type(value, __u64);} wakeup_times SEC(".maps");
struct { __uint(type, BPF_MAP_TYPE_RINGBUF); __uint(max_entries, 4 * 1024 * 1024);} events SEC(".maps");
// Track CPU quotas for cgroup throttling detectionstruct { __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 10000); __type(key, __u32); __type(value, __u64);} cgroup_quotas SEC(".maps");
// Hook: Process becomes ready to runSEC("tp_btf/sched_wakeup")int handle_sched_wakeup(u64 *ctx) { struct task_struct *task = (struct task_struct *)ctx[0]; __u32 pid = BPF_CORE_READ(task, pid); __u64 now = bpf_ktime_get_ns();
// Store wakeup timestamp bpf_map_update_elem(&wakeup_times, &pid, &now, BPF_ANY);
return 0;}
// Hook: New process becomes ready to runSEC("tp_btf/sched_wakeup_new")int handle_sched_wakeup_new(u64 *ctx) { struct task_struct *task = (struct task_struct *)ctx[0]; __u32 pid = BPF_CORE_READ(task, pid); __u64 now = bpf_ktime_get_ns();
// Store wakeup timestamp for new process bpf_map_update_elem(&wakeup_times, &pid, &now, BPF_ANY);
return 0;}
// Hook: Process is assigned CPU timeSEC("tp_btf/sched_switch")int handle_sched_switch(u64 *ctx) { struct task_struct *prev = (struct task_struct *)ctx[1]; struct task_struct *next = (struct task_struct *)ctx[2];
__u32 next_pid = BPF_CORE_READ(next, pid); __u32 next_tgid = BPF_CORE_READ(next, tgid); __u32 prev_pid = BPF_CORE_READ(prev, pid); __u64 now = bpf_ktime_get_ns();
// Look up wakeup time for the process being scheduled __u64 *wakeup_time = bpf_map_lookup_elem(&wakeup_times, &next_pid); if (!wakeup_time) { return 0; }
// Calculate run queue latency __u64 latency = now - *wakeup_time;
// Get cgroup information for container association __u32 cgroup_id = get_cgroup_id(next);
// Check if process is being throttled due to CPU quota __u8 throttled = check_cgroup_throttling(cgroup_id);
// Create event for user space processing struct process_event *event = bpf_ringbuf_reserve(&events, sizeof(*event), 0); if (!event) { goto cleanup; }
event->pid = next_pid; event->tgid = next_tgid; event->cgroup_id = cgroup_id; event->wakeup_time = *wakeup_time; event->schedule_time = now; event->latency_ns = latency; event->preempted_by_pid = prev_pid; event->throttled = throttled;
// Copy process name bpf_probe_read_kernel_str(event->comm, sizeof(event->comm), BPF_CORE_READ(next, comm));
bpf_ringbuf_submit(event, 0);
cleanup: // Clean up wakeup time tracking bpf_map_delete_elem(&wakeup_times, &next_pid); return 0;}
// Helper function to get cgroup ID for container associationstatic __u32 get_cgroup_id(struct task_struct *task) { struct cgroup *cgrp = BPF_CORE_READ(task, cgroups, subsys[0], cgroup); return BPF_CORE_READ(cgrp, kn, id);}
// Helper function to check if cgroup is being throttledstatic __u8 check_cgroup_throttling(u32 cgroup_id) { __u64 *quota = bpf_map_lookup_elem(&cgroup_quotas, &cgroup_id); if (!quota) { return 0; }
// Simplified throttling check - in practice, this would // examine CPU quota vs usage statistics return *quota > 0 ? 1 : 0;}
char _license[] SEC("license") = "GPL";
User-Space Processing Application
package main
import ( "context" "encoding/binary" "fmt" "log" "os" "os/signal" "syscall" "time"
"github.com/cilium/ebpf" "github.com/cilium/ebpf/link" "github.com/cilium/ebpf/ringbuf" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promauto")
// Metrics for Atlas (Netflix's metrics backend)var ( processLatencyHistogram = promauto.NewHistogramVec( prometheus.HistogramOpts{ Name: "scheduler_process_latency_microseconds", Help: "Process run queue latency in microseconds", Buckets: []float64{1, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000}, }, []string{"container_id", "throttled"}, )
preemptionCounter = promauto.NewCounterVec( prometheus.CounterOpts{ Name: "scheduler_preemptions_total", Help: "Total number of process preemptions", }, []string{"preempted_container", "preempting_container"}, )
noisyNeighborAlerts = promauto.NewCounterVec( prometheus.CounterOpts{ Name: "noisy_neighbor_alerts_total", Help: "Total noisy neighbor alerts generated", }, []string{"container_id", "alert_type"}, ))
// Process event structure matching eBPFtype ProcessEvent struct { PID uint32 TGID uint32 CgroupID uint32 WakeupTime uint64 ScheduleTime uint64 LatencyNS uint64 PreemptedByPID uint32 Throttled uint8 Comm [16]int8}
// Container performance trackingtype ContainerMetrics struct { ID string LatencyP99 time.Duration LatencyAvg time.Duration PreemptionRate float64 ThrottlingRate float64 LastUpdate time.Time}
type SchedulerMonitor struct { objs *schedulerObjects links []link.Link reader *ringbuf.Reader containerCache map[uint32]*ContainerMetrics
// Noisy neighbor detection thresholds latencyThreshold time.Duration preemptionThreshold float64}
func NewSchedulerMonitor() (*SchedulerMonitor, error) { // Load eBPF program spec, err := ebpf.LoadCollectionSpec("scheduler_monitor.o") if err != nil { return nil, fmt.Errorf("loading eBPF spec: %w", err) }
objs := &schedulerObjects{} if err := spec.LoadAndAssign(objs, nil); err != nil { return nil, fmt.Errorf("loading eBPF objects: %w", err) }
// Set up ring buffer reader reader, err := ringbuf.NewReader(objs.Events) if err != nil { return nil, fmt.Errorf("creating ring buffer reader: %w", err) }
monitor := &SchedulerMonitor{ objs: objs, reader: reader, containerCache: make(map[uint32]*ContainerMetrics), latencyThreshold: 500 * time.Microsecond, // 500μs threshold preemptionThreshold: 10.0, // 10 preemptions/sec }
return monitor, nil}
func (sm *SchedulerMonitor) AttachPrograms() error { // Attach sched_wakeup tracepoint wakeupLink, err := link.Tracepoint(link.TracepointOptions{ Group: "sched", Name: "sched_wakeup", Program: sm.objs.HandleSchedWakeup, }) if err != nil { return fmt.Errorf("attaching sched_wakeup: %w", err) } sm.links = append(sm.links, wakeupLink)
// Attach sched_wakeup_new tracepoint wakeupNewLink, err := link.Tracepoint(link.TracepointOptions{ Group: "sched", Name: "sched_wakeup_new", Program: sm.objs.HandleSchedWakeupNew, }) if err != nil { return fmt.Errorf("attaching sched_wakeup_new: %w", err) } sm.links = append(sm.links, wakeupNewLink)
// Attach sched_switch tracepoint switchLink, err := link.Tracepoint(link.TracepointOptions{ Group: "sched", Name: "sched_switch", Program: sm.objs.HandleSchedSwitch, }) if err != nil { return fmt.Errorf("attaching sched_switch: %w", err) } sm.links = append(sm.links, switchLink)
log.Println("Successfully attached eBPF programs to scheduler tracepoints") return nil}
func (sm *SchedulerMonitor) ProcessEvents(ctx context.Context) { for { select { case <-ctx.Done(): return default: record, err := sm.reader.Read() if err != nil { log.Printf("Error reading from ring buffer: %v", err) continue }
sm.handleProcessEvent(record.RawSample) } }}
func (sm *SchedulerMonitor) handleProcessEvent(data []byte) { if len(data) < binary.Size(ProcessEvent{}) { return }
var event ProcessEvent err := binary.Read(bytes.NewReader(data), binary.LittleEndian, &event) if err != nil { log.Printf("Error parsing event: %v", err) return }
// Convert latency to microseconds latencyMicros := float64(event.LatencyNS) / 1000.0
// Get container ID from cgroup containerID := fmt.Sprintf("container_%d", event.CgroupID)
// Update Prometheus metrics throttledLabel := "false" if event.Throttled == 1 { throttledLabel = "true" }
processLatencyHistogram.WithLabelValues(containerID, throttledLabel).Observe(latencyMicros)
// Update container metrics cache sm.updateContainerMetrics(event.CgroupID, time.Duration(event.LatencyNS), event.Throttled == 1)
// Detect noisy neighbors sm.detectNoisyNeighbors(event)
// Log high latency events if time.Duration(event.LatencyNS) > sm.latencyThreshold { log.Printf("High latency detected: PID=%d, Container=%s, Latency=%v, Throttled=%v", event.PID, containerID, time.Duration(event.LatencyNS), event.Throttled == 1) }}
func (sm *SchedulerMonitor) updateContainerMetrics(cgroupID uint32, latency time.Duration, throttled bool) { metrics, exists := sm.containerCache[cgroupID] if !exists { metrics = &ContainerMetrics{ ID: fmt.Sprintf("container_%d", cgroupID), } sm.containerCache[cgroupID] = metrics }
// Update running averages (simplified implementation) metrics.LatencyAvg = (metrics.LatencyAvg + latency) / 2 if latency > metrics.LatencyP99 { metrics.LatencyP99 = latency }
if throttled { metrics.ThrottlingRate = (metrics.ThrottlingRate + 1.0) / 2 }
metrics.LastUpdate = time.Now()}
func (sm *SchedulerMonitor) detectNoisyNeighbors(event ProcessEvent) { containerID := fmt.Sprintf("container_%d", event.CgroupID) latency := time.Duration(event.LatencyNS)
// High latency alert if latency > sm.latencyThreshold && event.Throttled == 0 { noisyNeighborAlerts.WithLabelValues(containerID, "high_latency").Inc()
log.Printf("NOISY NEIGHBOR ALERT: Container %s experiencing high latency (%v) - possible external interference", containerID, latency) }
// Excessive preemption alert if event.PreemptedByPID != 0 { preemptingContainer := sm.getContainerForPID(event.PreemptedByPID) preemptionCounter.WithLabelValues(containerID, preemptingContainer).Inc()
// Check if preemption rate is too high (simplified check) metrics := sm.containerCache[event.CgroupID] if metrics != nil && metrics.PreemptionRate > sm.preemptionThreshold { noisyNeighborAlerts.WithLabelValues(preemptingContainer, "excessive_preemption").Inc()
log.Printf("NOISY NEIGHBOR ALERT: Container %s causing excessive preemptions to %s", preemptingContainer, containerID) } }
// Throttling correlation alert if event.Throttled == 1 && latency > sm.latencyThreshold { log.Printf("PERFORMANCE ALERT: Container %s hitting CPU quota limits (throttled latency: %v)", containerID, latency) }}
func (sm *SchedulerMonitor) getContainerForPID(pid uint32) string { // Simplified - in practice, would maintain PID->Container mapping return fmt.Sprintf("unknown_container_%d", pid)}
func (sm *SchedulerMonitor) Close() { for _, l := range sm.links { l.Close() } sm.reader.Close() sm.objs.Close()}
func main() { // Set up signal handling ctx, cancel := context.WithCancel(context.Background()) defer cancel()
sigChan := make(chan os.Signal, 1) signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
// Initialize scheduler monitor monitor, err := NewSchedulerMonitor() if err != nil { log.Fatalf("Failed to create scheduler monitor: %v", err) } defer monitor.Close()
// Attach eBPF programs if err := monitor.AttachPrograms(); err != nil { log.Fatalf("Failed to attach eBPF programs: %v", err) }
log.Println("Netflix Scheduler Monitor started - detecting noisy neighbors...")
// Start processing events go monitor.ProcessEvents(ctx)
// Wait for signal <-sigChan log.Println("Shutting down...") cancel()}
Key Performance Metric: Process Latency
The cornerstone of Netflix’s noisy neighbor detection is process latency, specifically run queue latency:
Run Queue Latency: The time processes spend in the scheduling queue before being dispatched to the CPU.
Latency Calculation Process
sequenceDiagram participant Process as Process participant Scheduler as Linux Scheduler participant eBPF as eBPF Program participant Metrics as Metrics System
Process->>Scheduler: Process becomes ready (sched_wakeup) Scheduler->>eBPF: Hook triggered eBPF->>eBPF: Store timestamp T1
Note over Process,Scheduler: Process waits in run queue
Scheduler->>Process: CPU assigned (sched_switch) Scheduler->>eBPF: Hook triggered eBPF->>eBPF: Calculate latency: T2 - T1 eBPF->>Metrics: Submit latency data
rect rgb(200, 230, 201) Note over eBPF,Metrics: Process latency = T2 - T1 end
Beyond Simple Latency: Context-Aware Analysis
Netflix’s solution goes beyond simple latency measurement by incorporating contextual information:
Container Association via cgroups
// Extract cgroup information for container correlationstatic __u32 get_container_id(struct task_struct *task) { struct cgroup *cgrp = BPF_CORE_READ(task, cgroups, subsys[0], cgroup); return BPF_CORE_READ(cgrp, kn, id);}
Preemption Tracking
// Track which process caused preemptionstruct preemption_event { __u32 preempted_pid; // Process that was preempted __u32 preempting_pid; // Process that caused preemption __u32 preempted_container; // Container being preempted __u32 preempting_container; // Container causing preemption __u64 timestamp;};
CPU Quota Correlation
The system distinguishes between latency caused by:
- Noisy neighbors: External processes consuming resources
- CPU quota limits: Container hitting its allocated CPU limits
// Check if latency is due to throttling vs. external interferencestatic __u8 is_throttled_latency(struct task_struct *task) { struct cgroup *cgrp = get_task_cgroup(task); struct cfs_rq *cfs_rq = &cgrp->cfs_rq;
// Check if CFS throttling is active return BPF_CORE_READ(cfs_rq, throttled) ? 1 : 0;}
Performance Impact and Optimization
Overhead Analysis
Netflix conducted extensive performance testing to ensure their eBPF monitoring didn’t become a performance bottleneck itself:
graph TB subgraph "Performance Metrics" subgraph "Hook Overhead" H1[sched_wakeup: <100ns] --> Total[Total: <600ns] H2[sched_wakeup_new: <50ns] --> Total H3[sched_switch: <450ns] --> Total end
subgraph "System Impact" Total --> Impact[CPU Overhead: <0.1%] Memory[Memory: 2-4MB] --> Impact Network[Network: Minimal] --> Impact end
subgraph "Comparison" Impact --> Better[10x better than perf] Better --> Continuous[Enables continuous monitoring] end end
style Total fill:#c8e6c9 style Impact fill:#c8e6c9 style Continuous fill:#c8e6c9
Key Optimizations Implemented
1. Efficient Data Structures
// Optimized hash map for process trackingstruct { __uint(type, BPF_MAP_TYPE_LRU_HASH); // LRU eviction __uint(max_entries, 65536); // Sized for workload __type(key, __u32); __type(value, __u64);} wakeup_times SEC(".maps");
2. Ring Buffer Communication
// High-performance ring buffer for event streamingstruct { __uint(type, BPF_MAP_TYPE_RINGBUF); __uint(max_entries, 4 * 1024 * 1024); // 4MB buffer} events SEC(".maps");
3. Sampling and Rate Limiting
// Rate limiting for high-frequency eventstype RateLimiter struct { events int64 lastReset time.Time limit int64}
func (rl *RateLimiter) Allow() bool { now := time.Now() if now.Sub(rl.lastReset) > time.Second { rl.events = 0 rl.lastReset = now }
if rl.events < rl.limit { rl.events++ return true } return false}
Performance Measurement Tool: bpftop
Netflix developed bpftop to measure eBPF program overhead in real-time:
# Example bpftop output showing scheduler monitor overhead$ sudo bpftopPID COMM TYPE PROG RUNTIME(us) EVENTS AVG_RUNTIME(ns)12345 scheduler_mon tracepoint handle_sched_* 150.2 1,234 121.7
Key metrics tracked:
- Runtime per event: <600ns per scheduler hook
- Total CPU usage: <0.1% system-wide
- Memory footprint: 2-4MB for maps and buffers
Advanced Noisy Neighbor Detection
Multi-Dimensional Analysis
Netflix’s system performs sophisticated analysis by correlating multiple signals:
// Comprehensive noisy neighbor scoringtype NoisyNeighborScore struct { ContainerID string LatencyScore float64 // Based on run queue latency PreemptionScore float64 // Based on preemption frequency ThrottlingScore float64 // Based on CPU quota hits ResourceScore float64 // Based on resource consumption OverallScore float64 // Weighted combination Confidence float64 // Statistical confidence}
func (detector *NoisyNeighborDetector) CalculateScore(containerID string, metrics *ContainerMetrics) *NoisyNeighborScore {
score := &NoisyNeighborScore{ContainerID: containerID}
// Latency scoring (0-100 scale) if metrics.LatencyP99 > detector.thresholds.LatencyHigh { score.LatencyScore = 100 } else if metrics.LatencyP99 > detector.thresholds.LatencyMedium { score.LatencyScore = 50 } else { score.LatencyScore = 0 }
// Preemption frequency scoring score.PreemptionScore = math.Min(metrics.PreemptionRate * 10, 100)
// Resource consumption scoring score.ResourceScore = math.Min(metrics.CPUUsage * 100, 100)
// Combine scores with weights score.OverallScore = (score.LatencyScore * 0.4) + (score.PreemptionScore * 0.3) + (score.ResourceScore * 0.3)
// Calculate confidence based on data quality score.Confidence = detector.calculateConfidence(metrics)
return score}
Temporal Pattern Analysis
// Detect patterns over time to reduce false positivestype TemporalAnalyzer struct { windowSize time.Duration patterns map[string]*PatternHistory}
type PatternHistory struct { Timestamps []time.Time Scores []float64 TrendSlope float64 Seasonality map[time.Duration]float64}
func (ta *TemporalAnalyzer) AnalyzePattern(containerID string, score *NoisyNeighborScore) bool {
history := ta.patterns[containerID] if history == nil { history = &PatternHistory{} ta.patterns[containerID] = history }
// Add current data point history.Timestamps = append(history.Timestamps, time.Now()) history.Scores = append(history.Scores, score.OverallScore)
// Keep only recent history ta.pruneOldData(history)
// Calculate trend history.TrendSlope = ta.calculateTrend(history.Scores)
// Detect if this is a sustained pattern vs. temporary spike return ta.isSustainedPattern(history)}
Alert Generation and Response
// Intelligent alerting systemtype AlertManager struct { alertThresholds map[string]AlertThreshold cooldownPeriods map[string]time.Time escalationRules []EscalationRule}
type AlertThreshold struct { ScoreThreshold float64 ConfidenceMin float64 SustainedDuration time.Duration}
type EscalationRule struct { Condition func(*NoisyNeighborScore) bool Action string Severity AlertSeverity Targets []string}
func (am *AlertManager) ProcessAlert(score *NoisyNeighborScore, pattern *PatternHistory) {
// Check if we're in cooldown period if lastAlert, exists := am.cooldownPeriods[score.ContainerID]; exists { if time.Since(lastAlert) < 5*time.Minute { return // Skip due to cooldown } }
// Determine alert severity severity := am.calculateSeverity(score, pattern)
// Generate alert based on severity alert := &Alert{ ContainerID: score.ContainerID, Severity: severity, Score: score, Pattern: pattern, Timestamp: time.Now(), Recommendations: am.generateRecommendations(score), }
// Send alert through appropriate channels am.sendAlert(alert)
// Update cooldown am.cooldownPeriods[score.ContainerID] = time.Now()}
func (am *AlertManager) generateRecommendations(score *NoisyNeighborScore) []string { var recommendations []string
if score.LatencyScore > 70 { recommendations = append(recommendations, "Consider increasing CPU limits or moving to dedicated nodes") }
if score.PreemptionScore > 80 { recommendations = append(recommendations, "Investigate high-priority processes causing excessive preemption") }
if score.ResourceScore > 90 { recommendations = append(recommendations, "Container may need resource limit adjustment or optimization") }
return recommendations}
Integration with Netflix Infrastructure
Atlas Metrics Integration
// Atlas metrics client for Netflixtype AtlasMetricsClient struct { baseURL string apiKey string httpClient *http.Client}
func (client *AtlasMetricsClient) SendMetrics(metrics []Metric) error { payload := AtlasPayload{ Metrics: metrics, Timestamp: time.Now().Unix(), Source: "ebpf-scheduler-monitor", }
jsonData, err := json.Marshal(payload) if err != nil { return fmt.Errorf("marshaling metrics: %w", err) }
req, err := http.NewRequest("POST", client.baseURL+"/api/v1/metrics", bytes.NewBuffer(jsonData)) if err != nil { return fmt.Errorf("creating request: %w", err) }
req.Header.Set("Content-Type", "application/json") req.Header.Set("Authorization", "Bearer "+client.apiKey)
resp, err := client.httpClient.Do(req) if err != nil { return fmt.Errorf("sending request: %w", err) } defer resp.Body.Close()
if resp.StatusCode != http.StatusOK { return fmt.Errorf("atlas API error: %d", resp.StatusCode) }
return nil}
Kubernetes Integration
# Netflix scheduler monitor deploymentapiVersion: apps/v1kind: DaemonSetmetadata: name: netflix-scheduler-monitor namespace: monitoringspec: selector: matchLabels: app: scheduler-monitor template: metadata: labels: app: scheduler-monitor spec: hostNetwork: true hostPID: true serviceAccountName: scheduler-monitor containers: - name: monitor image: netflix/scheduler-monitor:latest securityContext: privileged: true capabilities: add: ["SYS_ADMIN", "BPF"] env: - name: ATLAS_ENDPOINT valueFrom: secretKeyRef: name: atlas-credentials key: endpoint - name: ATLAS_API_KEY valueFrom: secretKeyRef: name: atlas-credentials key: api-key - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName resources: requests: cpu: 50m memory: 64Mi limits: cpu: 200m memory: 256Mi volumeMounts: - name: debugfs mountPath: /sys/kernel/debug - name: tracefs mountPath: /sys/kernel/tracing - name: bpf-maps mountPath: /sys/fs/bpf volumes: - name: debugfs hostPath: path: /sys/kernel/debug - name: tracefs hostPath: path: /sys/kernel/tracing - name: bpf-maps hostPath: path: /sys/fs/bpf tolerations: - operator: Exists effect: NoSchedule
Production Results and Impact
Performance Improvements
graph TB subgraph "Before eBPF Monitoring" B1[Reactive Problem Detection] --> B2[Manual Investigation] B2 --> B3[perf Tool Deployment] B3 --> B4[High Overhead Analysis] B4 --> B5[Post-Incident Resolution]
style B1 fill:#ffcdd2 style B4 fill:#ffcdd2 style B5 fill:#ffcdd2 end
subgraph "After eBPF Monitoring" A1[Proactive Detection] --> A2[Automated Analysis] A2 --> A3[Real-time Insights] A3 --> A4[Low Overhead Monitoring] A4 --> A5[Preventive Actions]
style A1 fill:#c8e6c9 style A3 fill:#c8e6c9 style A4 fill:#c8e6c9 style A5 fill:#c8e6c9 end
Key Metrics and Improvements
Metric | Before eBPF | After eBPF | Improvement |
---|---|---|---|
Detection Time | Hours-Days | Seconds-Minutes | 100-1000x faster |
Analysis Overhead | 5-15% CPU | <0.1% CPU | 50-150x reduction |
Coverage | Reactive only | Continuous | 24/7 monitoring |
False Positives | High | Low | Context-aware filtering |
Resolution Time | Hours | Minutes | 10-30x faster |
Business Impact
- Improved SLA Performance: Reduced latency spikes by 40%
- Operational Efficiency: 75% reduction in manual investigation time
- Infrastructure Optimization: Better resource allocation decisions
- Cost Savings: Reduced over-provisioning through accurate capacity planning
Conclusion
Netflix’s eBPF-based infrastructure observability represents a paradigm shift in how large-scale systems approach performance monitoring and noisy neighbor detection.
Key Innovations
- Continuous Monitoring: 24/7 observability without reactive deployment
- Minimal Overhead: <0.1% CPU impact enables production deployment
- Context-Aware Analysis: Distinguishes between quota limits and external interference
- Real-Time Detection: Immediate identification of performance issues
- Scalable Architecture: Handles Netflix’s massive multi-tenant infrastructure
Strategic Advantages
- Proactive Problem Resolution: Address issues before user impact
- Data-Driven Optimization: Make informed infrastructure decisions
- Operational Excellence: Reduce manual investigation and response time
- Cost Efficiency: Optimize resource allocation and reduce waste
Future Implications
This approach demonstrates the transformative potential of eBPF for:
- Enterprise Monitoring: Extending beyond Netflix to other large-scale deployments
- Cloud Provider Services: Enhanced multi-tenant isolation and monitoring
- Container Orchestration: Better Kubernetes and container performance insights
- Performance Engineering: New methodologies for system optimization
Netflix’s success with eBPF infrastructure observability provides a blueprint for organizations seeking to achieve similar levels of operational excellence and performance optimization in their own multi-tenant environments.
Resources and Further Reading
Netflix Engineering
eBPF Resources
Tools and Libraries
Performance Analysis
Based on the original article by Sergio De Simone on InfoQ