Security Correlation System Architecture: Database Design and Deployment Patterns#

Modern security operations require sophisticated correlation systems that can process, analyze, and correlate vast amounts of security data in real-time. This guide provides a comprehensive overview of system architecture patterns, database design principles, and deployment strategies for building effective security correlation platforms.

System Architecture Overview#

High-Level Architecture#

The security correlation system follows a multi-tier architecture designed for scalability, reliability, and real-time processing:

1
graph TB
2
    subgraph "Data Sources"
3
        A1[Network Logs]
4
        A2[Endpoint Logs]
5
        A3[Application Logs]
6
        A4[Threat Intelligence]
7
        A5[Vulnerability Scanners]
8
    end
9

10
    subgraph "Ingestion Layer"
11
        B1[Log Collectors]
12
        B2[API Gateways]
13
        B3[Message Queues]
14
    end
15

16
    subgraph "Processing Layer"
17
        C1[Stream Processors]
18
        C2[Correlation Engine]
19
        C3[ML/AI Analytics]
20
        C4[Rule Engine]
21
    end
22

23
    subgraph "Storage Layer"
24
        D1[Time Series DB]
25
        D2[Graph Database]
26
        D3[Document Store]
27
        D4[Data Warehouse]
28
    end
29

30
    subgraph "Analysis Layer"
31
        E1[Real-time Analytics]
32
        E2[Historical Analysis]
33
        E3[Threat Hunting]
34
        E4[Incident Response]
35
    end
36

37
    subgraph "Presentation Layer"
38
        F1[Dashboards]
39
        F2[Alerting System]
40
        F3[Reports]
41
        F4[APIs]
42
    end
43

44
    A1 --> B1
45
    A2 --> B1
46
    A3 --> B2
47
    A4 --> B2
48
    A5 --> B3
49

50
    B1 --> C1
51
    B2 --> C1
52
    B3 --> C2
53

54
    C1 --> D1
55
    C2 --> D2
56
    C3 --> D3
57
    C4 --> D4
58

59
    D1 --> E1
60
    D2 --> E2
61
    D3 --> E3
62
    D4 --> E4
63

64
    E1 --> F1
65
    E2 --> F2
66
    E3 --> F3
67
    E4 --> F4

Component Interaction Flow#

1
sequenceDiagram
2
    participant DS as Data Source
3
    participant IC as Ingestion Controller
4
    participant SP as Stream Processor
5
    participant CE as Correlation Engine
6
    participant DB as Database
7
    participant AL as Alert System
8
    participant UI as User Interface
9

10
    DS->>IC: Send raw logs
11
    IC->>SP: Forward for processing
12
    SP->>SP: Parse and normalize
13
    SP->>CE: Send processed events
14
    CE->>CE: Apply correlation rules
15
    CE->>DB: Store events and correlations
16
    CE->>AL: Generate alerts
17
    AL->>UI: Display alerts
18
    UI->>DB: Query historical data
19
    DB->>UI: Return analysis results

Database Schema Design#

Core Entity Relationships#

1
erDiagram
2
    EVENT ||--o{ EVENT_ATTRIBUTE : contains
3
    EVENT ||--o{ CORRELATION : participates_in
4
    EVENT }|--|| SOURCE : originates_from
5
    EVENT }|--|| EVENT_TYPE : classified_as
6

7
    CORRELATION ||--o{ CORRELATION_RULE : follows
8
    CORRELATION ||--o{ ALERT : generates
9

10
    ALERT }|--|| SEVERITY : has
11
    ALERT ||--o{ INCIDENT : escalates_to
12

13
    INCIDENT ||--o{ RESPONSE_ACTION : triggers
14
    INCIDENT }|--|| USER : assigned_to
15

16
    USER ||--o{ ROLE : has
17
    ROLE ||--o{ PERMISSION : grants
18

19
    THREAT_INTEL ||--o{ IOC : contains
20
    IOC ||--o{ EVENT : matches
21

22
    ASSET ||--o{ EVENT : generates
23
    ASSET }|--|| ASSET_TYPE : categorized_as
24

25
    EVENT {
26
        uuid event_id PK
27
        timestamp event_time
28
        string source_ip
29
        string dest_ip
30
        int source_port
31
        int dest_port
32
        string protocol
33
        string event_type
34
        text raw_data
35
        json normalized_data
36
        uuid source_id FK
37
        uuid event_type_id FK
38
        timestamp created_at
39
        timestamp updated_at
40
    }
41

42
    EVENT_ATTRIBUTE {
43
        uuid attribute_id PK
44
        uuid event_id FK
45
        string attribute_name
46
        string attribute_value
47
        string data_type
48
        timestamp created_at
49
    }
50

51
    SOURCE {
52
        uuid source_id PK
53
        string source_name
54
        string source_type
55
        string ip_address
56
        json configuration
57
        boolean is_active
58
        timestamp last_seen
59
        timestamp created_at
60
    }
61

62
    EVENT_TYPE {
63
        uuid event_type_id PK
64
        string type_name
65
        string category
66
        text description
67
        json schema
68
        int severity_default
69
        timestamp created_at
70
    }
71

72
    CORRELATION {
73
        uuid correlation_id PK
74
        string correlation_name
75
        uuid rule_id FK
76
        timestamp start_time
77
        timestamp end_time
78
        int event_count
79
        float confidence_score
80
        json metadata
81
        timestamp created_at
82
    }
83

84
    CORRELATION_RULE {
85
        uuid rule_id PK
86
        string rule_name
87
        text rule_definition
88
        string rule_type
89
        int time_window
90
        int threshold
91
        boolean is_active
92
        timestamp created_at
93
        timestamp updated_at
94
    }
95

96
    ALERT {
97
        uuid alert_id PK
98
        uuid correlation_id FK
99
        uuid severity_id FK
100
        string alert_title
101
        text alert_description
102
        string status
103
        json details
104
        timestamp triggered_at
105
        timestamp acknowledged_at
106
        uuid acknowledged_by FK
107
    }
108

109
    SEVERITY {
110
        uuid severity_id PK
111
        string severity_name
112
        int severity_level
113
        string color_code
114
        text description
115
    }
116

117
    INCIDENT {
118
        uuid incident_id PK
119
        string incident_number
120
        string title
121
        text description
122
        uuid assigned_to FK
123
        string status
124
        string priority
125
        timestamp created_at
126
        timestamp resolved_at
127
    }
128

129
    USER {
130
        uuid user_id PK
131
        string username
132
        string email
133
        string full_name
134
        boolean is_active
135
        timestamp last_login
136
        timestamp created_at
137
    }
138

139
    ROLE {
140
        uuid role_id PK
141
        string role_name
142
        text description
143
        timestamp created_at
144
    }
145

146
    PERMISSION {
147
        uuid permission_id PK
148
        string permission_name
149
        string resource
150
        string action
151
        text description
152
    }
153

154
    THREAT_INTEL {
155
        uuid intel_id PK
156
        string source_name
157
        string intel_type
158
        float confidence_score
159
        timestamp valid_from
160
        timestamp valid_until
161
        json metadata
162
        timestamp created_at
163
    }
164

165
    IOC {
166
        uuid ioc_id PK
167
        uuid intel_id FK
168
        string ioc_type
169
        string ioc_value
170
        string status
171
        float confidence_score
172
        timestamp first_seen
173
        timestamp last_seen
174
    }
175

176
    ASSET {
177
        uuid asset_id PK
178
        string asset_name
179
        uuid asset_type_id FK
180
        string ip_address
181
        string mac_address
182
        string hostname
183
        string operating_system
184
        json metadata
185
        boolean is_critical
186
        timestamp created_at
187
        timestamp updated_at
188
    }
189

190
    ASSET_TYPE {
191
        uuid asset_type_id PK
192
        string type_name
193
        text description
194
        json attributes_schema
195
        timestamp created_at
196
    }
197

198
    RESPONSE_ACTION {
199
        uuid action_id PK
200
        uuid incident_id FK
201
        string action_type
202
        text description
203
        string status
204
        uuid performed_by FK
205
        timestamp performed_at
206
        json results
207
    }

Database Implementation SQL#

1
-- Events table with partitioning for performance
2
CREATE TABLE events (
3
    event_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
4
    event_time TIMESTAMPTZ NOT NULL,
5
    source_ip INET,
6
    dest_ip INET,
7
    source_port INTEGER,
8
    dest_port INTEGER,
9
    protocol VARCHAR(20),
10
    event_type VARCHAR(100) NOT NULL,
11
    raw_data TEXT,
12
    normalized_data JSONB,
13
    source_id UUID REFERENCES sources(source_id),
14
    event_type_id UUID REFERENCES event_types(event_type_id),
15
    created_at TIMESTAMPTZ DEFAULT NOW(),
16
    updated_at TIMESTAMPTZ DEFAULT NOW()
17
) PARTITION BY RANGE (event_time);
18

19
-- Create monthly partitions
20
CREATE TABLE events_2024_01 PARTITION OF events
21
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
22

23
-- Indexes for performance
24
CREATE INDEX idx_events_time ON events (event_time);
25
CREATE INDEX idx_events_source_ip ON events USING HASH (source_ip);
26
CREATE INDEX idx_events_dest_ip ON events USING HASH (dest_ip);
27
CREATE INDEX idx_events_type ON events (event_type);
28
CREATE INDEX idx_events_normalized_data ON events USING GIN (normalized_data);
29

30
-- Correlation rules table
31
CREATE TABLE correlation_rules (
32
    rule_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
33
    rule_name VARCHAR(255) NOT NULL UNIQUE,
34
    rule_definition TEXT NOT NULL,
35
    rule_type VARCHAR(50) NOT NULL,
36
    time_window INTEGER NOT NULL, -- in seconds
37
    threshold INTEGER DEFAULT 1,
38
    is_active BOOLEAN DEFAULT true,
39
    created_at TIMESTAMPTZ DEFAULT NOW(),
40
    updated_at TIMESTAMPTZ DEFAULT NOW()
41
);
42

43
-- Alerts table with status tracking
44
CREATE TABLE alerts (
45
    alert_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
46
    correlation_id UUID REFERENCES correlations(correlation_id),
47
    severity_id UUID REFERENCES severities(severity_id),
48
    alert_title VARCHAR(255) NOT NULL,
49
    alert_description TEXT,
50
    status VARCHAR(50) DEFAULT 'open',
51
    details JSONB,
52
    triggered_at TIMESTAMPTZ DEFAULT NOW(),
53
    acknowledged_at TIMESTAMPTZ,
54
    acknowledged_by UUID REFERENCES users(user_id)
55
);
56

57
-- Function to update updated_at timestamp
58
CREATE OR REPLACE FUNCTION update_updated_at_column()
59
RETURNS TRIGGER AS $$
60
BEGIN
61
    NEW.updated_at = NOW();
62
    RETURN NEW;
63
END;
64
$$ language 'plpgsql';
65

66
-- Trigger for automatic timestamp updates
67
CREATE TRIGGER update_events_updated_at BEFORE UPDATE ON events
68
    FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

Deployment Architecture#

Cloud-Native Deployment#

1
graph TB
2
    subgraph "Load Balancer"
3
        LB[Application Load Balancer]
4
    end
5

6
    subgraph "API Gateway Cluster"
7
        AG1[API Gateway 1]
8
        AG2[API Gateway 2]
9
        AG3[API Gateway 3]
10
    end
11

12
    subgraph "Application Tier"
13
        subgraph "Correlation Service"
14
            CS1[Correlation Service 1]
15
            CS2[Correlation Service 2]
16
            CS3[Correlation Service 3]
17
        end
18

19
        subgraph "Analytics Service"
20
            AS1[Analytics Service 1]
21
            AS2[Analytics Service 2]
22
        end
23

24
        subgraph "Alert Service"
25
            ALS1[Alert Service 1]
26
            ALS2[Alert Service 2]
27
        end
28
    end
29

30
    subgraph "Message Queue Cluster"
31
        MQ1[Message Queue 1]
32
        MQ2[Message Queue 2]
33
        MQ3[Message Queue 3]
34
    end
35

36
    subgraph "Database Cluster"
37
        subgraph "Primary DB"
38
            DB1[PostgreSQL Primary]
39
        end
40

41
        subgraph "Read Replicas"
42
            DB2[PostgreSQL Replica 1]
43
            DB3[PostgreSQL Replica 2]
44
        end
45

46
        subgraph "Time Series DB"
47
            TS1[InfluxDB Cluster]
48
        end
49

50
        subgraph "Search Engine"
51
            ES1[Elasticsearch Cluster]
52
        end
53
    end
54

55
    subgraph "Caching Layer"
56
        RC1[Redis Cluster 1]
57
        RC2[Redis Cluster 2]
58
    end
59

60
    subgraph "Monitoring"
61
        MON1[Prometheus]
62
        MON2[Grafana]
63
        MON3[Alert Manager]
64
    end
65

66
    LB --> AG1
67
    LB --> AG2
68
    LB --> AG3
69

70
    AG1 --> CS1
71
    AG2 --> CS2
72
    AG3 --> CS3
73

74
    CS1 --> MQ1
75
    CS2 --> MQ2
76
    CS3 --> MQ3
77

78
    CS1 --> RC1
79
    AS1 --> RC2
80

81
    CS1 --> DB1
82
    AS1 --> DB2
83
    ALS1 --> DB3
84

85
    CS1 --> TS1
86
    AS1 --> ES1
87

88
    CS1 --> MON1
89
    AS1 --> MON1
90
    ALS1 --> MON1

Container Orchestration#

1
# docker-compose.yml for development environment
2
version: "3.8"
3

4
services:
5
  # Load Balancer
6
  nginx:
7
    image: nginx:alpine
8
    ports:
9
      - "80:80"
10
      - "443:443"
11
    volumes:
12
      - ./nginx.conf:/etc/nginx/nginx.conf
13
      - ./ssl:/etc/nginx/ssl
14
    depends_on:
15
      - correlation-api
16
      - analytics-api
17

18
  # Correlation Service
19
  correlation-api:
20
    build:
21
      context: ./correlation-service
22
      dockerfile: Dockerfile
23
    ports:
24
      - "8001:8000"
25
    environment:
26
      - DATABASE_URL=postgresql://user:password@postgres:5432/correlation_db
27
      - REDIS_URL=redis://redis:6379/0
28
      - MESSAGE_QUEUE_URL=amqp://rabbitmq:5672
29
    depends_on:
30
      - postgres
31
      - redis
32
      - rabbitmq
33
    deploy:
34
      replicas: 3
35
      resources:
36
        limits:
37
          memory: 1G
38
          cpus: "0.5"
39

40
  # Analytics Service
41
  analytics-api:
42
    build:
43
      context: ./analytics-service
44
      dockerfile: Dockerfile
45
    ports:
46
      - "8002:8000"
47
    environment:
48
      - DATABASE_URL=postgresql://user:password@postgres:5432/correlation_db
49
      - ELASTICSEARCH_URL=http://elasticsearch:9200
50
    depends_on:
51
      - postgres
52
      - elasticsearch
53

54
  # Database
55
  postgres:
56
    image: postgres:15-alpine
57
    environment:
58
      - POSTGRES_DB=correlation_db
59
      - POSTGRES_USER=user
60
      - POSTGRES_PASSWORD=password
61
    volumes:
62
      - postgres_data:/var/lib/postgresql/data
63
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
64
    ports:
65
      - "5432:5432"
66

67
  # Redis Cache
68
  redis:
69
    image: redis:7-alpine
70
    ports:
71
      - "6379:6379"
72
    volumes:
73
      - redis_data:/data
74

75
  # Message Queue
76
  rabbitmq:
77
    image: rabbitmq:3-management-alpine
78
    environment:
79
      - RABBITMQ_DEFAULT_USER=admin
80
      - RABBITMQ_DEFAULT_PASS=password
81
    ports:
82
      - "5672:5672"
83
      - "15672:15672"
84
    volumes:
85
      - rabbitmq_data:/var/lib/rabbitmq
86

87
  # Elasticsearch
88
  elasticsearch:
89
    image: elasticsearch:8.11.0
90
    environment:
91
      - discovery.type=single-node
92
      - xpack.security.enabled=false
93
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
94
    ports:
95
      - "9200:9200"
96
    volumes:
97
      - elasticsearch_data:/usr/share/elasticsearch/data
98

99
  # InfluxDB for time series data
100
  influxdb:
101
    image: influxdb:2.7-alpine
102
    environment:
103
      - INFLUXDB_DB=metrics
104
      - INFLUXDB_ADMIN_USER=admin
105
      - INFLUXDB_ADMIN_PASSWORD=password
106
    ports:
107
      - "8086:8086"
108
    volumes:
109
      - influxdb_data:/var/lib/influxdb2
110

111
  # Monitoring
112
  prometheus:
113
    image: prom/prometheus:latest
114
    ports:
115
      - "9090:9090"
116
    volumes:
117
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
118
      - prometheus_data:/prometheus
119

120
  grafana:
121
    image: grafana/grafana:latest
122
    ports:
123
      - "3000:3000"
124
    environment:
125
      - GF_SECURITY_ADMIN_PASSWORD=admin
126
    volumes:
127
      - grafana_data:/var/lib/grafana
128

129
volumes:
130
  postgres_data:
131
  redis_data:
132
  rabbitmq_data:
133
  elasticsearch_data:
134
  influxdb_data:
135
  prometheus_data:
136
  grafana_data:

Kubernetes Deployment#

1
apiVersion: apps/v1
2
kind: Deployment
3
metadata:
4
  name: correlation-service
5
  labels:
6
    app: correlation-service
7
spec:
8
  replicas: 3
9
  selector:
10
    matchLabels:
11
      app: correlation-service
12
  template:
13
    metadata:
14
      labels:
15
        app: correlation-service
16
    spec:
17
      containers:
18
        - name: correlation-service
19
          image: correlation-service:latest
20
          ports:
21
            - containerPort: 8000
22
          env:
23
            - name: DATABASE_URL
24
              valueFrom:
25
                secretKeyRef:
26
                  name: db-secret
27
                  key: database-url
28
            - name: REDIS_URL
29
              value: "redis://redis-service:6379/0"
30
          resources:
31
            requests:
32
              memory: "512Mi"
33
              cpu: "250m"
34
            limits:
35
              memory: "1Gi"
36
              cpu: "500m"
37
          livenessProbe:
38
            httpGet:
39
              path: /health
40
              port: 8000
41
            initialDelaySeconds: 30
42
            periodSeconds: 10
43
          readinessProbe:
44
            httpGet:
45
              path: /ready
46
              port: 8000
47
            initialDelaySeconds: 5
48
            periodSeconds: 5
49

50
---
51
apiVersion: v1
52
kind: Service
53
metadata:
54
  name: correlation-service
55
spec:
56
  selector:
57
    app: correlation-service
58
  ports:
59
    - port: 80
60
      targetPort: 8000
61
  type: ClusterIP
62

63
---
64
apiVersion: networking.k8s.io/v1
65
kind: Ingress
66
metadata:
67
  name: correlation-ingress
68
  annotations:
69
    nginx.ingress.kubernetes.io/rewrite-target: /
70
spec:
71
  rules:
72
    - host: correlation.example.com
73
      http:
74
        paths:
75
          - path: /
76
            pathType: Prefix
77
            backend:
78
              service:
79
                name: correlation-service
80
                port:
81
                  number: 80

Use Case Implementation#

Real-Time Threat Detection#

1
graph LR
2
    subgraph "Data Ingestion"
3
        A1[Firewall Logs]
4
        A2[IDS/IPS Alerts]
5
        A3[Endpoint Logs]
6
        A4[DNS Logs]
7
    end
8

9
    subgraph "Correlation Rules"
10
        B1[Brute Force Detection]
11
        B2[Lateral Movement]
12
        B3[Data Exfiltration]
13
        B4[Malware Communication]
14
    end
15

16
    subgraph "Analysis Engine"
17
        C1[Pattern Matching]
18
        C2[Statistical Analysis]
19
        C3[Machine Learning]
20
        C4[Threat Intelligence]
21
    end
22

23
    subgraph "Response Actions"
24
        D1[Alert Generation]
25
        D2[Incident Creation]
26
        D3[Automated Response]
27
        D4[Threat Hunting]
28
    end
29

30
    A1 --> B1
31
    A2 --> B2
32
    A3 --> B3
33
    A4 --> B4
34

35
    B1 --> C1
36
    B2 --> C2
37
    B3 --> C3
38
    B4 --> C4
39

40
    C1 --> D1
41
    C2 --> D2
42
    C3 --> D3
43
    C4 --> D4

User Behavior Analytics#

1
stateDiagram-v2
2
    [*] --> Normal_Behavior
3
    Normal_Behavior --> Anomaly_Detected: Threshold Exceeded
4
    Anomaly_Detected --> Risk_Assessment: Analyze Context
5
    Risk_Assessment --> Low_Risk: Score < 30
6
    Risk_Assessment --> Medium_Risk: Score 30-70
7
    Risk_Assessment --> High_Risk: Score > 70
8

9
    Low_Risk --> Normal_Behavior: Log Event
10
    Medium_Risk --> Alert_Generated: Notify SOC
11
    High_Risk --> Incident_Created: Escalate
12

13
    Alert_Generated --> Investigation: Analyst Review
14
    Incident_Created --> Response_Initiated: Immediate Action
15

16
    Investigation --> False_Positive: No Threat
17
    Investigation --> Confirmed_Threat: Threat Validated
18

19
    False_Positive --> Normal_Behavior: Update Baseline
20
    Confirmed_Threat --> Response_Initiated: Take Action
21

22
    Response_Initiated --> [*]: Case Closed

Performance Optimization#

Database Optimization#

1
-- Partitioning strategy for events table
2
CREATE TABLE events_y2024m01 PARTITION OF events
3
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
4

5
-- Indexes for common query patterns
6
CREATE INDEX CONCURRENTLY idx_events_source_ip_time
7
    ON events (source_ip, event_time DESC);
8

9
CREATE INDEX CONCURRENTLY idx_events_dest_ip_time
10
    ON events (dest_ip, event_time DESC);
11

12
-- Materialized views for aggregated data
13
CREATE MATERIALIZED VIEW hourly_event_summary AS
14
SELECT
15
    date_trunc('hour', event_time) as hour,
16
    event_type,
17
    COUNT(*) as event_count,
18
    COUNT(DISTINCT source_ip) as unique_sources
19
FROM events
20
WHERE event_time >= NOW() - INTERVAL '7 days'
21
GROUP BY date_trunc('hour', event_time), event_type;
22

23
-- Refresh materialized view periodically
24
CREATE OR REPLACE FUNCTION refresh_hourly_summary()
25
RETURNS void AS $$
26
BEGIN
27
    REFRESH MATERIALIZED VIEW CONCURRENTLY hourly_event_summary;
28
END;
29
$$ LANGUAGE plpgsql;
30

31
-- Schedule refresh every hour
32
SELECT cron.schedule('refresh-hourly-summary', '0 * * * *',
33
                     'SELECT refresh_hourly_summary();');

Caching Strategy#

1
# Redis caching for correlation rules
2
import redis
3
import json
4
from datetime import timedelta
5

6
class CorrelationRuleCache:
7
    def __init__(self, redis_url):
8
        self.redis_client = redis.from_url(redis_url)
9
        self.cache_ttl = timedelta(hours=1)
10

11
    def get_active_rules(self):
12
        """Get active correlation rules from cache"""
13
        cached_rules = self.redis_client.get("active_correlation_rules")
14
        if cached_rules:
15
            return json.loads(cached_rules)
16

17
        # Fetch from database if not in cache
18
        rules = self._fetch_rules_from_db()
19
        self.redis_client.setex(
20
            "active_correlation_rules",
21
            self.cache_ttl,
22
            json.dumps(rules)
23
        )
24
        return rules
25

26
    def invalidate_cache(self):
27
        """Invalidate correlation rules cache"""
28
        self.redis_client.delete("active_correlation_rules")
29

30
    def _fetch_rules_from_db(self):
31
        # Database query to fetch active rules
32
        pass

Stream Processing Optimization#

1
# Apache Kafka consumer for real-time event processing
2
from kafka import KafkaConsumer
3
import json
4
import concurrent.futures
5
from typing import List, Dict
6

7
class EventStreamProcessor:
8
    def __init__(self, kafka_brokers: List[str], topic: str):
9
        self.consumer = KafkaConsumer(
10
            topic,
11
            bootstrap_servers=kafka_brokers,
12
            value_deserializer=lambda x: json.loads(x.decode('utf-8')),
13
            group_id='correlation-processor',
14
            auto_offset_reset='latest',
15
            max_poll_records=1000,
16
            session_timeout_ms=30000,
17
            heartbeat_interval_ms=10000
18
        )
19
        self.thread_pool = concurrent.futures.ThreadPoolExecutor(max_workers=10)
20

21
    def process_events(self):
22
        """Process events from Kafka stream"""
23
        batch = []
24
        batch_size = 100
25

26
        for message in self.consumer:
27
            batch.append(message.value)
28

29
            if len(batch) >= batch_size:
30
                # Process batch asynchronously
31
                self.thread_pool.submit(self._process_batch, batch.copy())
32
                batch.clear()
33

34
    def _process_batch(self, events: List[Dict]):
35
        """Process a batch of events"""
36
        # Normalize events
37
        normalized_events = [self._normalize_event(event) for event in events]
38

39
        # Store in database
40
        self._store_events(normalized_events)
41

42
        # Apply correlation rules
43
        self._apply_correlation_rules(normalized_events)
44

45
    def _normalize_event(self, event: Dict) -> Dict:
46
        """Normalize event data"""
47
        # Event normalization logic
48
        pass
49

50
    def _store_events(self, events: List[Dict]):
51
        """Bulk insert events to database"""
52
        # Bulk database insertion
53
        pass
54

55
    def _apply_correlation_rules(self, events: List[Dict]):
56
        """Apply correlation rules to events"""
57
        # Correlation logic
58
        pass

Security Considerations#

Data Protection#

1
# Encryption for sensitive data
2
from cryptography.fernet import Fernet
3
import base64
4
import os
5

6
class DataEncryption:
7
    def __init__(self):
8
        # Generate or load encryption key
9
        self.key = self._get_or_create_key()
10
        self.cipher_suite = Fernet(self.key)
11

12
    def encrypt_sensitive_data(self, data: str) -> str:
13
        """Encrypt sensitive data before storing"""
14
        encrypted_data = self.cipher_suite.encrypt(data.encode())
15
        return base64.b64encode(encrypted_data).decode()
16

17
    def decrypt_sensitive_data(self, encrypted_data: str) -> str:
18
        """Decrypt sensitive data when retrieving"""
19
        encrypted_bytes = base64.b64decode(encrypted_data.encode())
20
        decrypted_data = self.cipher_suite.decrypt(encrypted_bytes)
21
        return decrypted_data.decode()
22

23
    def _get_or_create_key(self) -> bytes:
24
        """Get encryption key from environment or generate new one"""
25
        key_env = os.environ.get('ENCRYPTION_KEY')
26
        if key_env:
27
            return base64.b64decode(key_env.encode())
28
        else:
29
            # Generate new key (store securely)
30
            return Fernet.generate_key()

Access Control#

1
# Role-based access control
2
from functools import wraps
3
from flask import request, jsonify, g
4
import jwt
5

6
def require_permission(permission: str):
7
    """Decorator to check user permissions"""
8
    def decorator(f):
9
        @wraps(f)
10
        def decorated_function(*args, **kwargs):
11
            token = request.headers.get('Authorization', '').replace('Bearer ', '')
12

13
            try:
14
                payload = jwt.decode(token, app.config['SECRET_KEY'], algorithms=['HS256'])
15
                user_permissions = payload.get('permissions', [])
16

17
                if permission not in user_permissions:
18
                    return jsonify({'error': 'Insufficient permissions'}), 403
19

20
                g.current_user = payload
21
                return f(*args, **kwargs)
22

23
            except jwt.InvalidTokenError:
24
                return jsonify({'error': 'Invalid token'}), 401
25

26
        return decorated_function
27
    return decorator
28

29
# Usage example
30
@app.route('/api/correlations', methods=['GET'])
31
@require_permission('view_correlations')
32
def get_correlations():
33
    """Get correlation data with permission check"""
34
    return jsonify(correlations)

Audit Logging#

1
# Comprehensive audit logging
2
import logging
3
import json
4
from datetime import datetime
5
from flask import request, g
6

7
class AuditLogger:
8
    def __init__(self):
9
        self.logger = logging.getLogger('audit')
10
        handler = logging.FileHandler('/var/log/security-correlation/audit.log')
11
        formatter = logging.Formatter('%(asctime)s - %(message)s')
12
        handler.setFormatter(formatter)
13
        self.logger.addHandler(handler)
14
        self.logger.setLevel(logging.INFO)
15

16
    def log_access(self, resource: str, action: str, result: str):
17
        """Log access attempts"""
18
        audit_record = {
19
            'timestamp': datetime.utcnow().isoformat(),
20
            'user_id': getattr(g, 'current_user', {}).get('user_id'),
21
            'username': getattr(g, 'current_user', {}).get('username'),
22
            'resource': resource,
23
            'action': action,
24
            'result': result,
25
            'ip_address': request.remote_addr,
26
            'user_agent': request.user_agent.string
27
        }
28

29
        self.logger.info(json.dumps(audit_record))
30

31
    def log_data_access(self, table: str, operation: str, record_count: int):
32
        """Log database access"""
33
        audit_record = {
34
            'timestamp': datetime.utcnow().isoformat(),
35
            'user_id': getattr(g, 'current_user', {}).get('user_id'),
36
            'operation_type': 'database_access',
37
            'table': table,
38
            'operation': operation,
39
            'record_count': record_count
40
        }
41

42
        self.logger.info(json.dumps(audit_record))

Conclusion#

Building an effective security correlation system requires careful consideration of architecture, database design, deployment strategies, and security measures. The patterns and implementations provided in this guide offer a solid foundation for creating scalable, secure, and efficient correlation platforms.

Key takeaways:

Design for scalability from the beginning
Implement proper data modeling for security events
Use appropriate deployment patterns for your environment
Prioritize security and audit capabilities
Optimize for performance at the database and application levels
Plan for monitoring and observability

By following these architectural patterns and best practices, organizations can build robust security correlation systems that effectively detect, analyze, and respond to security threats in real-time.