Wazuh Anomaly Detection Use Cases: Advanced Security Monitoring#

This guide explores advanced anomaly detection use cases in Wazuh, demonstrating how to leverage its capabilities for identifying unusual patterns, behaviors, and potential security threats that traditional signature-based detection might miss.

Overview#

Anomaly detection in Wazuh goes beyond simple rule matching, employing statistical analysis, behavioral patterns, and machine learning to identify:

Unusual user behaviors
Abnormal system activities
Network traffic anomalies
Application behavior deviations
Data exfiltration attempts
Insider threats
Zero-day attack patterns

Core Anomaly Detection Components#

1. Statistical Analysis Engine#

Wazuh uses statistical analysis to establish baselines and detect deviations:

1
<group name="anomaly,statistical">
2

3
  <!-- Unusual login times detection -->
4
  <rule id="100001" level="10">
5
    <if_sid>5501</if_sid>
6
    <time>21:00 - 06:00</time>
7
    <description>Successful login during non-business hours</description>
8
    <options>no_full_log</options>
9
    <group>anomaly,authentication</group>
10
  </rule>
11

12
  <!-- Abnormal number of failed logins -->
13
  <rule id="100002" level="12" frequency="10" timeframe="120">
14
    <if_matched_sid>5503</if_matched_sid>
15
    <description>High number of failed login attempts - possible brute force</description>
16
    <mitre>
17
      <id>T1110</id>
18
    </mitre>
19
    <group>anomaly,authentication,brute_force</group>
20
  </rule>
21

22
  <!-- Unusual process execution frequency -->
23
  <rule id="100003" level="8" frequency="50" timeframe="60">
24
    <if_sid>5901</if_sid>
25
    <same_field>process.name</same_field>
26
    <description>Abnormally high process execution rate detected</description>
27
    <group>anomaly,process</group>
28
  </rule>
29

30
</group>

2. Behavioral Pattern Detection#

Implement behavioral analysis for user and system activities:

1
#!/usr/bin/env python3
2
import json
3
import sys
4
import time
5
import numpy as np
6
from collections import defaultdict
7
from datetime import datetime, timedelta
8

9
class BehavioralAnomalyDetector:
10
    def __init__(self):
11
        self.user_baselines = defaultdict(lambda: {
12
            'login_times': [],
13
            'commands': defaultdict(int),
14
            'files_accessed': set(),
15
            'network_connections': defaultdict(int),
16
            'data_volume': []
17
        })
18
        self.window_size = 7  # days
19
        self.anomaly_threshold = 2.5  # standard deviations
20

21
    def process_alert(self, alert):
22
        """Process incoming alert for anomaly detection"""
23
        alert_data = json.loads(alert)
24

25
        # Extract relevant fields
26
        user = alert_data.get('data', {}).get('srcuser', 'unknown')
27
        timestamp = alert_data.get('timestamp', '')
28
        rule_id = alert_data.get('rule', {}).get('id', '')
29

30
        # Detect anomalies based on rule type
31
        anomalies = []
32

33
        if rule_id in ['5501', '5715']:  # Login events
34
            anomaly = self.detect_login_anomaly(user, timestamp)
35
            if anomaly:
36
                anomalies.append(anomaly)
37

38
        elif rule_id.startswith('59'):  # Process execution
39
            command = alert_data.get('data', {}).get('command', '')
40
            anomaly = self.detect_command_anomaly(user, command)
41
            if anomaly:
42
                anomalies.append(anomaly)
43

44
        elif rule_id in ['550', '551', '552']:  # File access
45
            file_path = alert_data.get('data', {}).get('path', '')
46
            anomaly = self.detect_file_access_anomaly(user, file_path)
47
            if anomaly:
48
                anomalies.append(anomaly)
49

50
        elif rule_id.startswith('86'):  # Network activity
51
            dest_ip = alert_data.get('data', {}).get('dstip', '')
52
            bytes_sent = alert_data.get('data', {}).get('bytes', 0)
53
            anomaly = self.detect_network_anomaly(user, dest_ip, bytes_sent)
54
            if anomaly:
55
                anomalies.append(anomaly)
56

57
        return anomalies
58

59
    def detect_login_anomaly(self, user, timestamp):
60
        """Detect unusual login patterns"""
61
        try:
62
            login_hour = datetime.fromisoformat(timestamp).hour
63
            baseline = self.user_baselines[user]['login_times']
64

65
            if len(baseline) > 20:  # Need sufficient data
66
                mean_hour = np.mean(baseline)
67
                std_hour = np.std(baseline)
68

69
                if abs(login_hour - mean_hour) > self.anomaly_threshold * std_hour:
70
                    return {
71
                        'type': 'unusual_login_time',
72
                        'user': user,
73
                        'severity': 'high',
74
                        'details': f'Login at {login_hour}:00 deviates from normal pattern',
75
                        'baseline_mean': mean_hour,
76
                        'baseline_std': std_hour
77
                    }
78

79
            # Update baseline
80
            self.user_baselines[user]['login_times'].append(login_hour)
81

82
        except Exception as e:
83
            sys.stderr.write(f"Error in login anomaly detection: {e}\n")
84

85
        return None
86

87
    def detect_command_anomaly(self, user, command):
88
        """Detect unusual command execution"""
89
        command_base = command.split()[0] if command else 'unknown'
90
        user_commands = self.user_baselines[user]['commands']
91

92
        # Check if command is rare for this user
93
        total_commands = sum(user_commands.values())
94
        command_frequency = user_commands.get(command_base, 0)
95

96
        if total_commands > 100:  # Need sufficient data
97
            expected_frequency = 1.0 / len(user_commands) if user_commands else 0
98
            actual_frequency = command_frequency / total_commands
99

100
            if actual_frequency < expected_frequency * 0.1:  # Rare command
101
                anomaly = {
102
                    'type': 'rare_command_execution',
103
                    'user': user,
104
                    'severity': 'medium',
105
                    'details': f'Unusual command executed: {command_base}',
106
                    'frequency': actual_frequency,
107
                    'expected': expected_frequency
108
                }
109

110
                # Check for suspicious patterns
111
                suspicious_patterns = [
112
                    'wget', 'curl', 'nc', 'ncat', 'ssh-keygen',
113
                    'base64', 'xxd', 'dd', 'tcpdump', 'nmap'
114
                ]
115

116
                if any(pattern in command_base.lower() for pattern in suspicious_patterns):
117
                    anomaly['severity'] = 'high'
118
                    anomaly['details'] += ' - Potentially suspicious command'
119

120
                return anomaly
121

122
        # Update baseline
123
        user_commands[command_base] += 1
124
        return None
125

126
    def detect_file_access_anomaly(self, user, file_path):
127
        """Detect unusual file access patterns"""
128
        user_files = self.user_baselines[user]['files_accessed']
129

130
        # Check for sensitive file access
131
        sensitive_paths = [
132
            '/etc/passwd', '/etc/shadow', '/etc/sudoers',
133
            '/.ssh/', '/var/log/', '/etc/ssl/',
134
            '.key', '.pem', '.crt', '.p12'
135
        ]
136

137
        for sensitive in sensitive_paths:
138
            if sensitive in file_path and file_path not in user_files:
139
                return {
140
                    'type': 'sensitive_file_access',
141
                    'user': user,
142
                    'severity': 'high',
143
                    'details': f'First time access to sensitive file: {file_path}',
144
                    'file': file_path
145
                }
146

147
        # Check for unusual directory traversal
148
        if '../' in file_path or file_path.count('/') > 10:
149
            return {
150
                'type': 'directory_traversal_attempt',
151
                'user': user,
152
                'severity': 'high',
153
                'details': f'Suspicious file path pattern: {file_path}',
154
                'file': file_path
155
            }
156

157
        # Update baseline
158
        user_files.add(file_path)
159
        return None
160

161
    def detect_network_anomaly(self, user, dest_ip, bytes_sent):
162
        """Detect unusual network behavior"""
163
        user_network = self.user_baselines[user]['network_connections']
164
        user_data = self.user_baselines[user]['data_volume']
165

166
        # Check for new destination
167
        if dest_ip and dest_ip not in user_network:
168
            # Check if it's a suspicious destination
169
            if self.is_suspicious_destination(dest_ip):
170
                return {
171
                    'type': 'suspicious_network_destination',
172
                    'user': user,
173
                    'severity': 'critical',
174
                    'details': f'Connection to suspicious IP: {dest_ip}',
175
                    'destination': dest_ip
176
                }
177

178
        # Check for data exfiltration
179
        if bytes_sent > 0:
180
            user_data.append(bytes_sent)
181

182
            if len(user_data) > 20:
183
                mean_bytes = np.mean(user_data)
184
                std_bytes = np.std(user_data)
185

186
                if bytes_sent > mean_bytes + (self.anomaly_threshold * std_bytes):
187
                    return {
188
                        'type': 'potential_data_exfiltration',
189
                        'user': user,
190
                        'severity': 'critical',
191
                        'details': f'Unusually large data transfer: {bytes_sent} bytes',
192
                        'bytes_sent': bytes_sent,
193
                        'baseline_mean': mean_bytes,
194
                        'baseline_std': std_bytes
195
                    }
196

197
        # Update baseline
198
        user_network[dest_ip] += 1
199
        return None
200

201
    def is_suspicious_destination(self, ip):
202
        """Check if IP is in suspicious ranges"""
203
        suspicious_ranges = [
204
            '10.0.0.0/8',     # Private range (unusual for external)
205
            '172.16.0.0/12',  # Private range
206
            '192.168.0.0/16', # Private range
207
            # Add known malicious IPs/ranges
208
        ]
209

210
        # Implement IP range checking logic
211
        # This is a simplified example
212
        return any(ip.startswith(range.split('/')[0].rsplit('.', 1)[0])
213
                  for range in suspicious_ranges)
214

215

216
if __name__ == "__main__":
217
    # Read alert from stdin
218
    alert = sys.stdin.read()
219

220
    detector = BehavioralAnomalyDetector()
221
    anomalies = detector.process_alert(alert)
222

223
    # Output anomalies for Wazuh active response
224
    for anomaly in anomalies:
225
        print(json.dumps(anomaly))

Use Case 1: User Behavior Analytics (UBA)#

Configuration#

1
<group name="uba,anomaly">
2

3
  <!-- Detect privilege escalation attempts -->
4
  <rule id="100010" level="12">
5
    <if_sid>5303</if_sid>
6
    <match>sudo</match>
7
    <different_user />
8
    <description>User executing sudo commands for the first time</description>
9
    <mitre>
10
      <id>T1548</id>
11
    </mitre>
12
    <group>privilege_escalation,anomaly</group>
13
  </rule>
14

15
  <!-- Detect lateral movement -->
16
  <rule id="100011" level="10" frequency="5" timeframe="300">
17
    <if_sid>5706</if_sid>
18
    <different_fields>dst_ip</different_fields>
19
    <description>User accessing multiple systems in short time - possible lateral movement</description>
20
    <mitre>
21
      <id>T1021</id>
22
    </mitre>
23
    <group>lateral_movement,anomaly</group>
24
  </rule>
25

26
  <!-- Detect data staging -->
27
  <rule id="100012" level="11">
28
    <decoded_as>file_integrity</decoded_as>
29
    <field name="file_path">tmp|temp|staging</field>
30
    <field name="file_size" compare="greater">104857600</field> <!-- 100MB -->
31
    <description>Large file created in temporary directory - possible data staging</description>
32
    <mitre>
33
      <id>T1074</id>
34
    </mitre>
35
    <group>data_staging,anomaly</group>
36
  </rule>
37

38
</group>

Implementation Script#

1
#!/usr/bin/env python3
2
import json
3
import sqlite3
4
from datetime import datetime, timedelta
5
from collections import Counter
6
import statistics
7

8
class UserBehaviorAnalyzer:
9
    def __init__(self, db_path='/var/ossec/logs/uba.db'):
10
        self.conn = sqlite3.connect(db_path)
11
        self.init_database()
12

13
    def init_database(self):
14
        """Initialize UBA database"""
15
        cursor = self.conn.cursor()
16
        cursor.execute('''
17
            CREATE TABLE IF NOT EXISTS user_activities (
18
                id INTEGER PRIMARY KEY AUTOINCREMENT,
19
                timestamp DATETIME,
20
                user TEXT,
21
                action TEXT,
22
                resource TEXT,
23
                risk_score INTEGER,
24
                anomaly_type TEXT
25
            )
26
        ''')
27

28
        cursor.execute('''
29
            CREATE TABLE IF NOT EXISTS user_profiles (
30
                user TEXT PRIMARY KEY,
31
                normal_hours TEXT,
32
                common_resources TEXT,
33
                risk_level INTEGER,
34
                last_updated DATETIME
35
            )
36
        ''')
37
        self.conn.commit()
38

39
    def analyze_user_behavior(self, user, action, resource, timestamp):
40
        """Analyze user behavior and calculate risk score"""
41
        risk_score = 0
42
        anomalies = []
43

44
        # Get user profile
45
        cursor = self.conn.cursor()
46
        cursor.execute(
47
            'SELECT * FROM user_profiles WHERE user = ?',
48
            (user,)
49
        )
50
        profile = cursor.fetchone()
51

52
        # Check time-based anomalies
53
        current_hour = datetime.fromisoformat(timestamp).hour
54
        if profile and profile[1]:  # normal_hours
55
            normal_hours = json.loads(profile[1])
56
            if current_hour not in normal_hours:
57
                risk_score += 30
58
                anomalies.append('after_hours_activity')
59

60
        # Check resource access anomalies
61
        if profile and profile[2]:  # common_resources
62
            common_resources = json.loads(profile[2])
63
            if resource not in common_resources:
64
                risk_score += 20
65
                anomalies.append('unusual_resource_access')
66

67
        # Check for suspicious actions
68
        suspicious_actions = {
69
            'password_change': 25,
70
            'user_creation': 30,
71
            'permission_change': 35,
72
            'data_download': 20,
73
            'config_modification': 40
74
        }
75

76
        for suspicious, score in suspicious_actions.items():
77
            if suspicious in action.lower():
78
                risk_score += score
79
                anomalies.append(f'suspicious_action_{suspicious}')
80

81
        # Check frequency anomalies
82
        frequency_risk = self.check_frequency_anomaly(user, action)
83
        risk_score += frequency_risk
84

85
        # Store activity
86
        cursor.execute('''
87
            INSERT INTO user_activities
88
            (timestamp, user, action, resource, risk_score, anomaly_type)
89
            VALUES (?, ?, ?, ?, ?, ?)
90
        ''', (
91
            timestamp, user, action, resource,
92
            risk_score, ','.join(anomalies)
93
        ))
94
        self.conn.commit()
95

96
        # Update user profile
97
        self.update_user_profile(user, timestamp, current_hour, resource)
98

99
        return {
100
            'user': user,
101
            'risk_score': risk_score,
102
            'anomalies': anomalies,
103
            'action': action,
104
            'resource': resource,
105
            'timestamp': timestamp,
106
            'severity': self.calculate_severity(risk_score)
107
        }
108

109
    def check_frequency_anomaly(self, user, action):
110
        """Check for frequency-based anomalies"""
111
        cursor = self.conn.cursor()
112

113
        # Get recent activities
114
        one_hour_ago = datetime.now() - timedelta(hours=1)
115
        cursor.execute('''
116
            SELECT COUNT(*) FROM user_activities
117
            WHERE user = ? AND action = ? AND timestamp > ?
118
        ''', (user, action, one_hour_ago.isoformat()))
119

120
        recent_count = cursor.fetchone()[0]
121

122
        # Get historical average
123
        cursor.execute('''
124
            SELECT COUNT(*) FROM user_activities
125
            WHERE user = ? AND action = ?
126
            GROUP BY DATE(timestamp)
127
        ''', (user, action))
128

129
        daily_counts = [row[0] for row in cursor.fetchall()]
130

131
        if daily_counts:
132
            avg_hourly = statistics.mean(daily_counts) / 24
133
            if recent_count > avg_hourly * 3:  # 3x normal rate
134
                return 40  # High risk
135

136
        return 0
137

138
    def update_user_profile(self, user, timestamp, hour, resource):
139
        """Update user behavior profile"""
140
        cursor = self.conn.cursor()
141

142
        # Get existing profile
143
        cursor.execute(
144
            'SELECT normal_hours, common_resources FROM user_profiles WHERE user = ?',
145
            (user,)
146
        )
147
        result = cursor.fetchone()
148

149
        if result:
150
            normal_hours = set(json.loads(result[0]))
151
            common_resources = set(json.loads(result[1]))
152
        else:
153
            normal_hours = set()
154
            common_resources = set()
155

156
        # Update with new data
157
        normal_hours.add(hour)
158
        common_resources.add(resource)
159

160
        # Calculate risk level based on history
161
        cursor.execute('''
162
            SELECT AVG(risk_score) FROM user_activities
163
            WHERE user = ? AND timestamp > ?
164
        ''', (user, (datetime.now() - timedelta(days=7)).isoformat()))
165

166
        avg_risk = cursor.fetchone()[0] or 0
167
        risk_level = int(avg_risk / 20)  # 0-5 scale
168

169
        # Update profile
170
        cursor.execute('''
171
            INSERT OR REPLACE INTO user_profiles
172
            (user, normal_hours, common_resources, risk_level, last_updated)
173
            VALUES (?, ?, ?, ?, ?)
174
        ''', (
175
            user,
176
            json.dumps(list(normal_hours)),
177
            json.dumps(list(common_resources)),
178
            risk_level,
179
            timestamp
180
        ))
181
        self.conn.commit()
182

183
    def calculate_severity(self, risk_score):
184
        """Calculate severity level from risk score"""
185
        if risk_score >= 80:
186
            return 'critical'
187
        elif risk_score >= 60:
188
            return 'high'
189
        elif risk_score >= 40:
190
            return 'medium'
191
        elif risk_score >= 20:
192
            return 'low'
193
        else:
194
            return 'info'

Use Case 2: Network Traffic Anomaly Detection#

DGA (Domain Generation Algorithm) Detection#

1
#!/usr/bin/env python3
2
import json
3
import math
4
import re
5
from collections import Counter
6

7
class DGADetector:
8
    def __init__(self):
9
        self.tld_list = [
10
            'com', 'net', 'org', 'info', 'biz', 'co.uk',
11
            'de', 'fr', 'ru', 'cn', 'jp', 'in'
12
        ]
13
        self.english_freq = {
14
            'a': 0.0817, 'b': 0.0149, 'c': 0.0278, 'd': 0.0425,
15
            'e': 0.1270, 'f': 0.0223, 'g': 0.0202, 'h': 0.0609,
16
            'i': 0.0697, 'j': 0.0015, 'k': 0.0077, 'l': 0.0403,
17
            'm': 0.0241, 'n': 0.0675, 'o': 0.0751, 'p': 0.0193,
18
            'q': 0.0010, 'r': 0.0599, 's': 0.0633, 't': 0.0906,
19
            'u': 0.0276, 'v': 0.0098, 'w': 0.0236, 'x': 0.0015,
20
            'y': 0.0197, 'z': 0.0007
21
        }
22

23
    def analyze_domain(self, domain):
24
        """Analyze domain for DGA characteristics"""
25
        # Remove TLD
26
        domain_parts = domain.split('.')
27
        if len(domain_parts) < 2:
28
            return None
29

30
        sld = domain_parts[-2]  # Second level domain
31

32
        features = {
33
            'domain': domain,
34
            'length': len(sld),
35
            'entropy': self.calculate_entropy(sld),
36
            'consonant_ratio': self.calculate_consonant_ratio(sld),
37
            'number_ratio': self.calculate_number_ratio(sld),
38
            'bigram_score': self.calculate_bigram_score(sld),
39
            'lexical_diversity': self.calculate_lexical_diversity(sld),
40
            'suspicious_tld': self.check_suspicious_tld(domain_parts[-1])
41
        }
42

43
        # Calculate DGA probability
44
        dga_score = self.calculate_dga_score(features)
45
        features['dga_score'] = dga_score
46
        features['is_dga'] = dga_score > 0.7
47

48
        return features
49

50
    def calculate_entropy(self, text):
51
        """Calculate Shannon entropy"""
52
        if not text:
53
            return 0
54

55
        prob = [float(text.count(c)) / len(text) for c in dict.fromkeys(text)]
56
        entropy = -sum([p * math.log(p) / math.log(2.0) for p in prob if p > 0])
57
        return entropy
58

59
    def calculate_consonant_ratio(self, text):
60
        """Calculate ratio of consonants to total characters"""
61
        vowels = 'aeiouAEIOU'
62
        consonants = sum(1 for c in text if c.isalpha() and c not in vowels)
63
        return consonants / len(text) if text else 0
64

65
    def calculate_number_ratio(self, text):
66
        """Calculate ratio of numbers to total characters"""
67
        numbers = sum(1 for c in text if c.isdigit())
68
        return numbers / len(text) if text else 0
69

70
    def calculate_bigram_score(self, text):
71
        """Calculate bigram frequency score"""
72
        if len(text) < 2:
73
            return 0
74

75
        bigrams = [text[i:i+2] for i in range(len(text)-1)]
76

77
        # Common English bigrams
78
        common_bigrams = [
79
            'th', 'he', 'in', 'en', 'nt', 're', 'er', 'an',
80
            'ti', 'es', 'on', 'at', 'se', 'nd', 'or', 'ar'
81
        ]
82

83
        score = sum(1 for bg in bigrams if bg.lower() in common_bigrams)
84
        return score / len(bigrams)
85

86
    def calculate_lexical_diversity(self, text):
87
        """Calculate character diversity"""
88
        if not text:
89
            return 0
90
        return len(set(text)) / len(text)
91

92
    def check_suspicious_tld(self, tld):
93
        """Check if TLD is commonly used in DGA"""
94
        suspicious_tlds = ['tk', 'ml', 'ga', 'cf', 'click', 'download']
95
        return tld.lower() in suspicious_tlds
96

97
    def calculate_dga_score(self, features):
98
        """Calculate overall DGA probability score"""
99
        score = 0
100

101
        # High entropy indicates randomness
102
        if features['entropy'] > 3.5:
103
            score += 0.3
104
        elif features['entropy'] > 3.0:
105
            score += 0.2
106

107
        # High consonant ratio
108
        if features['consonant_ratio'] > 0.65:
109
            score += 0.2
110

111
        # Contains numbers
112
        if features['number_ratio'] > 0.1:
113
            score += 0.15
114

115
        # Low bigram score (uncommon letter combinations)
116
        if features['bigram_score'] < 0.1:
117
            score += 0.2
118

119
        # High lexical diversity
120
        if features['lexical_diversity'] > 0.8:
121
            score += 0.1
122

123
        # Suspicious TLD
124
        if features['suspicious_tld']:
125
            score += 0.15
126

127
        # Long domain name
128
        if features['length'] > 20:
129
            score += 0.1
130
        elif features['length'] > 15:
131
            score += 0.05
132

133
        return min(score, 1.0)  # Cap at 1.0
134

135

136
def main():
137
    # Read DNS query log from Wazuh
138
    alert = json.loads(sys.stdin.read())
139

140
    if 'data' in alert and 'dns' in alert['data']:
141
        domain = alert['data']['dns'].get('query', '')
142

143
        detector = DGADetector()
144
        result = detector.analyze_domain(domain)
145

146
        if result and result['is_dga']:
147
            # Generate Wazuh alert
148
            anomaly_alert = {
149
                'integration': 'dga_detector',
150
                'anomaly_type': 'suspicious_domain',
151
                'severity': 'high',
152
                'domain': domain,
153
                'dga_score': result['dga_score'],
154
                'features': result,
155
                'description': f'Possible DGA domain detected: {domain}',
156
                'mitre_attack': ['T1568.002']  # Dynamic Resolution: DGA
157
            }
158

159
            print(json.dumps(anomaly_alert))
160

161
if __name__ == '__main__':
162
    main()

Use Case 3: Application Behavior Anomaly Detection#

Web Application Anomaly Detection#

1
<group name="web,anomaly,webapp">
2

3
  <!-- Unusual request size -->
4
  <rule id="100020" level="7">
5
    <if_sid>31100</if_sid>
6
    <field name="http.request.length" compare="greater">10000</field>
7
    <description>Unusually large HTTP request detected</description>
8
    <group>web_anomaly</group>
9
  </rule>
10

11
  <!-- Unusual user agent -->
12
  <rule id="100021" level="8">
13
    <if_sid>31100</if_sid>
14
    <field name="http.user_agent">python|curl|wget|nikto|sqlmap</field>
15
    <description>Suspicious user agent detected - possible automated tool</description>
16
    <group>web_anomaly,recon</group>
17
  </rule>
18

19
  <!-- Parameter pollution -->
20
  <rule id="100022" level="9">
21
    <if_sid>31100</if_sid>
22
    <regex>\?.*([^&=]+=[^&]*&)\1{2,}</regex>
23
    <description>HTTP parameter pollution attempt detected</description>
24
    <group>web_anomaly,attack</group>
25
  </rule>
26

27
  <!-- Unusual response time -->
28
  <rule id="100023" level="6">
29
    <if_sid>31100</if_sid>
30
    <field name="http.response_time" compare="greater">5000</field>
31
    <description>Slow HTTP response - possible DoS or resource exhaustion</description>
32
    <group>web_anomaly,performance</group>
33
  </rule>
34

35
</group>

API Behavior Monitoring#

1
#!/usr/bin/env python3
2
import json
3
import time
4
from collections import defaultdict, deque
5
from datetime import datetime, timedelta
6
import numpy as np
7

8
class APIAnomalyDetector:
9
    def __init__(self):
10
        self.endpoint_stats = defaultdict(lambda: {
11
            'response_times': deque(maxlen=1000),
12
            'status_codes': defaultdict(int),
13
            'request_rates': deque(maxlen=60),  # Per minute for last hour
14
            'unique_ips': set(),
15
            'payload_sizes': deque(maxlen=1000),
16
            'error_rate': deque(maxlen=100)
17
        })
18

19
        self.user_patterns = defaultdict(lambda: {
20
            'endpoints': defaultdict(int),
21
            'request_times': deque(maxlen=1000),
22
            'auth_failures': 0,
23
            'data_volume': 0
24
        })
25

26
    def analyze_api_request(self, log_data):
27
        """Analyze API request for anomalies"""
28
        anomalies = []
29

30
        # Extract fields
31
        endpoint = log_data.get('endpoint', '')
32
        method = log_data.get('method', '')
33
        status = int(log_data.get('status', 0))
34
        response_time = int(log_data.get('response_time', 0))
35
        user = log_data.get('user', 'anonymous')
36
        ip = log_data.get('source_ip', '')
37
        payload_size = int(log_data.get('payload_size', 0))
38
        timestamp = log_data.get('timestamp', datetime.now().isoformat())
39

40
        # Update statistics
41
        stats = self.endpoint_stats[endpoint]
42
        stats['response_times'].append(response_time)
43
        stats['status_codes'][status] += 1
44
        stats['unique_ips'].add(ip)
45
        stats['payload_sizes'].append(payload_size)
46

47
        # Check response time anomaly
48
        if len(stats['response_times']) > 100:
49
            mean_rt = np.mean(stats['response_times'])
50
            std_rt = np.std(stats['response_times'])
51

52
            if response_time > mean_rt + (3 * std_rt):
53
                anomalies.append({
54
                    'type': 'slow_response',
55
                    'severity': 'medium',
56
                    'details': f'Response time {response_time}ms exceeds normal range',
57
                    'baseline_mean': mean_rt,
58
                    'baseline_std': std_rt
59
                })
60

61
        # Check error rate anomaly
62
        if status >= 400:
63
            stats['error_rate'].append(1)
64
        else:
65
            stats['error_rate'].append(0)
66

67
        if len(stats['error_rate']) == 100:
68
            error_rate = sum(stats['error_rate']) / 100
69
            if error_rate > 0.1:  # >10% error rate
70
                anomalies.append({
71
                    'type': 'high_error_rate',
72
                    'severity': 'high',
73
                    'details': f'Error rate {error_rate:.2%} exceeds threshold',
74
                    'endpoint': endpoint
75
                })
76

77
        # Check request pattern anomalies
78
        user_anomalies = self.check_user_pattern_anomalies(user, endpoint, timestamp)
79
        anomalies.extend(user_anomalies)
80

81
        # Check payload size anomaly
82
        if payload_size > 0 and len(stats['payload_sizes']) > 50:
83
            mean_size = np.mean(stats['payload_sizes'])
84
            std_size = np.std(stats['payload_sizes'])
85

86
            if payload_size > mean_size + (3 * std_size):
87
                anomalies.append({
88
                    'type': 'large_payload',
89
                    'severity': 'medium',
90
                    'details': f'Payload size {payload_size} bytes exceeds normal',
91
                    'baseline_mean': mean_size
92
                })
93

94
        # Check for API abuse patterns
95
        abuse_patterns = self.check_api_abuse_patterns(endpoint, method, user, ip)
96
        anomalies.extend(abuse_patterns)
97

98
        return anomalies
99

100
    def check_user_pattern_anomalies(self, user, endpoint, timestamp):
101
        """Check for anomalies in user behavior patterns"""
102
        anomalies = []
103
        user_data = self.user_patterns[user]
104

105
        # Update user data
106
        user_data['endpoints'][endpoint] += 1
107
        user_data['request_times'].append(timestamp)
108

109
        # Check for endpoint scanning
110
        if len(user_data['endpoints']) > 50:
111
            anomalies.append({
112
                'type': 'endpoint_scanning',
113
                'severity': 'high',
114
                'details': f'User {user} accessed {len(user_data["endpoints"])} different endpoints',
115
                'user': user
116
            })
117

118
        # Check request frequency
119
        if len(user_data['request_times']) > 10:
120
            recent_requests = [
121
                t for t in user_data['request_times']
122
                if datetime.fromisoformat(t) > datetime.now() - timedelta(minutes=1)
123
            ]
124

125
            if len(recent_requests) > 100:  # >100 requests per minute
126
                anomalies.append({
127
                    'type': 'high_request_rate',
128
                    'severity': 'high',
129
                    'details': f'User {user} made {len(recent_requests)} requests in last minute',
130
                    'user': user
131
                })
132

133
        return anomalies
134

135
    def check_api_abuse_patterns(self, endpoint, method, user, ip):
136
        """Check for API abuse patterns"""
137
        anomalies = []
138

139
        # Check for dangerous endpoints
140
        dangerous_patterns = [
141
            '/admin/', '/config/', '/debug/', '/internal/',
142
            '/api/v1/users/delete', '/api/v1/data/export'
143
        ]
144

145
        for pattern in dangerous_patterns:
146
            if pattern in endpoint:
147
                anomalies.append({
148
                    'type': 'dangerous_endpoint_access',
149
                    'severity': 'critical',
150
                    'details': f'Access to dangerous endpoint: {endpoint}',
151
                    'user': user,
152
                    'endpoint': endpoint
153
                })
154
                break
155

156
        # Check for method anomalies
157
        if method in ['DELETE', 'PUT'] and '/api/v1/' in endpoint:
158
            # Check if user typically uses these methods
159
            user_methods = self.user_patterns[user].get('methods', defaultdict(int))
160
            if user_methods[method] < 5:  # Rarely uses these methods
161
                anomalies.append({
162
                    'type': 'unusual_method',
163
                    'severity': 'medium',
164
                    'details': f'Unusual {method} request from user {user}',
165
                    'user': user,
166
                    'method': method
167
                })
168

169
        return anomalies

Use Case 4: Insider Threat Detection#

Configuration for Insider Threat Monitoring#

1
<group name="insider_threat,anomaly">
2

3
  <!-- After hours data access -->
4
  <rule id="100030" level="8">
5
    <if_sid>550,551,552</if_sid>
6
    <time>weekdays: 20:00-06:00</time>
7
    <field name="file.path">confidential|sensitive|secret</field>
8
    <description>Sensitive file accessed after business hours</description>
9
    <group>insider_threat,data_theft</group>
10
  </rule>
11

12
  <!-- Mass file download -->
13
  <rule id="100031" level="10" frequency="50" timeframe="300">
14
    <if_sid>550</if_sid>
15
    <same_user />
16
    <description>Mass file download detected - possible data exfiltration</description>
17
    <mitre>
18
      <id>T1567</id>
19
    </mitre>
20
    <group>insider_threat,exfiltration</group>
21
  </rule>
22

23
  <!-- USB device usage anomaly -->
24
  <rule id="100032" level="9">
25
    <decoded_as>usb_monitor</decoded_as>
26
    <field name="action">mount</field>
27
    <field name="device_type">storage</field>
28
    <time>weekdays: 18:00-08:00</time>
29
    <description>USB storage device connected after hours</description>
30
    <group>insider_threat,removable_media</group>
31
  </rule>
32

33
  <!-- Email to personal account -->
34
  <rule id="100033" level="9">
35
    <if_sid>3600</if_sid>
36
    <field name="mail.to">gmail.com|yahoo.com|hotmail.com|outlook.com</field>
37
    <field name="mail.attachments" compare="greater">0</field>
38
    <description>Email with attachments sent to personal account</description>
39
    <group>insider_threat,data_leak</group>
40
  </rule>
41

42
</group>

Insider Threat Scoring System#

1
#!/usr/bin/env python3
2
import json
3
import sqlite3
4
from datetime import datetime, timedelta
5
from collections import defaultdict
6
import networkx as nx
7

8
class InsiderThreatScorer:
9
    def __init__(self, db_path='/var/ossec/logs/insider_threat.db'):
10
        self.conn = sqlite3.connect(db_path)
11
        self.init_database()
12
        self.risk_weights = {
13
            'after_hours_access': 15,
14
            'sensitive_file_access': 20,
15
            'mass_download': 30,
16
            'unusual_application': 10,
17
            'privilege_escalation': 35,
18
            'data_staging': 25,
19
            'external_transfer': 40,
20
            'policy_violation': 20,
21
            'anomalous_behavior': 15
22
        }
23

24
    def init_database(self):
25
        """Initialize threat scoring database"""
26
        cursor = self.conn.cursor()
27
        cursor.execute('''
28
            CREATE TABLE IF NOT EXISTS threat_scores (
29
                id INTEGER PRIMARY KEY AUTOINCREMENT,
30
                timestamp DATETIME,
31
                user TEXT,
32
                indicator TEXT,
33
                score INTEGER,
34
                details TEXT,
35
                cumulative_score INTEGER
36
            )
37
        ''')
38

39
        cursor.execute('''
40
            CREATE TABLE IF NOT EXISTS user_risk_profiles (
41
                user TEXT PRIMARY KEY,
42
                current_score INTEGER,
43
                peak_score INTEGER,
44
                indicators_count INTEGER,
45
                first_seen DATETIME,
46
                last_updated DATETIME,
47
                risk_level TEXT
48
            )
49
        ''')
50
        self.conn.commit()
51

52
    def process_event(self, event):
53
        """Process security event and update threat score"""
54
        user = event.get('user', 'unknown')
55
        timestamp = event.get('timestamp', datetime.now().isoformat())
56
        indicators = self.extract_indicators(event)
57

58
        total_score = 0
59
        threat_details = []
60

61
        for indicator in indicators:
62
            score = self.risk_weights.get(indicator['type'], 10)
63

64
            # Apply contextual multipliers
65
            score = self.apply_context_multipliers(score, indicator, event)
66

67
            total_score += score
68
            threat_details.append({
69
                'indicator': indicator['type'],
70
                'score': score,
71
                'details': indicator.get('details', '')
72
            })
73

74
            # Store indicator
75
            self.store_indicator(user, timestamp, indicator['type'], score, indicator.get('details', ''))
76

77
        # Update user risk profile
78
        self.update_risk_profile(user, total_score, timestamp)
79

80
        # Check if threshold exceeded
81
        risk_assessment = self.assess_risk(user)
82

83
        return {
84
            'user': user,
85
            'event_score': total_score,
86
            'cumulative_score': risk_assessment['cumulative_score'],
87
            'risk_level': risk_assessment['risk_level'],
88
            'indicators': threat_details,
89
            'recommended_action': risk_assessment['recommended_action']
90
        }
91

92
    def extract_indicators(self, event):
93
        """Extract threat indicators from event"""
94
        indicators = []
95

96
        # Check time-based anomalies
97
        event_time = datetime.fromisoformat(event['timestamp'])
98
        if event_time.hour < 6 or event_time.hour > 20:
99
            if 'file' in event and 'sensitive' in event.get('file', {}).get('path', '').lower():
100
                indicators.append({
101
                    'type': 'after_hours_access',
102
                    'details': f'Accessed sensitive file at {event_time.hour}:00'
103
                })
104

105
        # Check for mass operations
106
        if event.get('operation_count', 0) > 50:
107
            indicators.append({
108
                'type': 'mass_download',
109
                'details': f'{event["operation_count"]} files accessed'
110
            })
111

112
        # Check for data staging
113
        if 'file' in event:
114
            path = event['file'].get('path', '')
115
            if any(staging in path.lower() for staging in ['temp', 'tmp', 'staging', 'export']):
116
                if event['file'].get('size', 0) > 100_000_000:  # 100MB
117
                    indicators.append({
118
                        'type': 'data_staging',
119
                        'details': f'Large file in staging area: {path}'
120
                    })
121

122
        # Check for external transfer
123
        if 'network' in event:
124
            dest_ip = event['network'].get('dest_ip', '')
125
            if not self.is_internal_ip(dest_ip):
126
                if event['network'].get('bytes_out', 0) > 50_000_000:  # 50MB
127
                    indicators.append({
128
                        'type': 'external_transfer',
129
                        'details': f'Large external transfer to {dest_ip}'
130
                    })
131

132
        # Check for privilege escalation
133
        if event.get('action') == 'privilege_escalation':
134
            indicators.append({
135
                'type': 'privilege_escalation',
136
                'details': event.get('details', '')
137
            })
138

139
        return indicators
140

141
    def apply_context_multipliers(self, base_score, indicator, event):
142
        """Apply contextual multipliers to risk score"""
143
        multiplier = 1.0
144

145
        # User role multiplier
146
        user_role = event.get('user_role', '')
147
        if user_role in ['admin', 'root', 'administrator']:
148
            multiplier *= 1.5
149
        elif user_role in ['developer', 'engineer']:
150
            multiplier *= 1.3
151

152
        # Repeat behavior multiplier
153
        if self.is_repeat_behavior(event['user'], indicator['type']):
154
            multiplier *= 0.7  # Lower score for consistent behavior
155
        else:
156
            multiplier *= 1.5  # Higher score for new behavior
157

158
        # Time-based multiplier
159
        if indicator['type'] in ['after_hours_access', 'external_transfer']:
160
            event_time = datetime.fromisoformat(event['timestamp'])
161
            if event_time.weekday() >= 5:  # Weekend
162
                multiplier *= 1.5
163

164
        return int(base_score * multiplier)
165

166
    def is_repeat_behavior(self, user, indicator_type):
167
        """Check if this is repeat behavior for the user"""
168
        cursor = self.conn.cursor()
169
        cursor.execute('''
170
            SELECT COUNT(*) FROM threat_scores
171
            WHERE user = ? AND indicator = ?
172
            AND timestamp > ?
173
        ''', (user, indicator_type, (datetime.now() - timedelta(days=30)).isoformat()))
174

175
        count = cursor.fetchone()[0]
176
        return count > 5
177

178
    def update_risk_profile(self, user, new_score, timestamp):
179
        """Update user's risk profile"""
180
        cursor = self.conn.cursor()
181

182
        # Get current profile
183
        cursor.execute(
184
            'SELECT current_score, peak_score, indicators_count FROM user_risk_profiles WHERE user = ?',
185
            (user,)
186
        )
187
        result = cursor.fetchone()
188

189
        if result:
190
            current_score = result[0] + new_score
191
            peak_score = max(result[1], current_score)
192
            indicators_count = result[2] + 1
193
        else:
194
            current_score = new_score
195
            peak_score = new_score
196
            indicators_count = 1
197

198
        # Calculate risk level
199
        if current_score >= 100:
200
            risk_level = 'critical'
201
        elif current_score >= 75:
202
            risk_level = 'high'
203
        elif current_score >= 50:
204
            risk_level = 'medium'
205
        elif current_score >= 25:
206
            risk_level = 'low'
207
        else:
208
            risk_level = 'minimal'
209

210
        # Update profile
211
        cursor.execute('''
212
            INSERT OR REPLACE INTO user_risk_profiles
213
            (user, current_score, peak_score, indicators_count, first_seen, last_updated, risk_level)
214
            VALUES (?, ?, ?, ?, COALESCE((SELECT first_seen FROM user_risk_profiles WHERE user = ?), ?), ?, ?)
215
        ''', (user, current_score, peak_score, indicators_count, user, timestamp, timestamp, risk_level))
216

217
        self.conn.commit()
218

219
    def assess_risk(self, user):
220
        """Assess user's current risk level"""
221
        cursor = self.conn.cursor()
222
        cursor.execute(
223
            'SELECT current_score, risk_level, indicators_count FROM user_risk_profiles WHERE user = ?',
224
            (user,)
225
        )
226
        result = cursor.fetchone()
227

228
        if not result:
229
            return {
230
                'cumulative_score': 0,
231
                'risk_level': 'minimal',
232
                'recommended_action': 'continue_monitoring'
233
            }
234

235
        cumulative_score, risk_level, indicators_count = result
236

237
        # Determine recommended action
238
        if risk_level == 'critical':
239
            action = 'immediate_investigation'
240
        elif risk_level == 'high':
241
            action = 'priority_review'
242
        elif risk_level == 'medium':
243
            action = 'enhanced_monitoring'
244
        else:
245
            action = 'continue_monitoring'
246

247
        # Check for rapid score increase
248
        cursor.execute('''
249
            SELECT SUM(score) FROM threat_scores
250
            WHERE user = ? AND timestamp > ?
251
        ''', (user, (datetime.now() - timedelta(hours=1)).isoformat()))
252

253
        recent_score = cursor.fetchone()[0] or 0
254
        if recent_score > 50:
255
            action = 'immediate_investigation'
256

257
        return {
258
            'cumulative_score': cumulative_score,
259
            'risk_level': risk_level,
260
            'indicators_count': indicators_count,
261
            'recommended_action': action,
262
            'recent_activity_score': recent_score
263
        }
264

265
    def is_internal_ip(self, ip):
266
        """Check if IP is internal"""
267
        internal_ranges = ['10.', '172.16.', '172.17.', '172.18.', '172.19.',
268
                          '172.20.', '172.21.', '172.22.', '172.23.', '172.24.',
269
                          '172.25.', '172.26.', '172.27.', '172.28.', '172.29.',
270
                          '172.30.', '172.31.', '192.168.']
271
        return any(ip.startswith(range) for range in internal_ranges)
272

273
    def store_indicator(self, user, timestamp, indicator_type, score, details):
274
        """Store threat indicator in database"""
275
        cursor = self.conn.cursor()
276

277
        # Get cumulative score
278
        cursor.execute(
279
            'SELECT current_score FROM user_risk_profiles WHERE user = ?',
280
            (user,)
281
        )
282
        result = cursor.fetchone()
283
        cumulative = result[0] if result else score
284

285
        cursor.execute('''
286
            INSERT INTO threat_scores
287
            (timestamp, user, indicator, score, details, cumulative_score)
288
            VALUES (?, ?, ?, ?, ?, ?)
289
        ''', (timestamp, user, indicator_type, score, details, cumulative))
290

291
        self.conn.commit()

Use Case 5: Machine Learning Integration#

Anomaly Detection with Isolation Forest#

1
#!/usr/bin/env python3
2
import json
3
import joblib
4
import numpy as np
5
from sklearn.ensemble import IsolationForest
6
from sklearn.preprocessing import StandardScaler
7
import pandas as pd
8
from datetime import datetime, timedelta
9

10
class MLAnomalyDetector:
11
    def __init__(self, model_path='/var/ossec/models/'):
12
        self.model_path = model_path
13
        self.models = {}
14
        self.scalers = {}
15
        self.feature_extractors = {
16
            'network': self.extract_network_features,
17
            'authentication': self.extract_auth_features,
18
            'file_access': self.extract_file_features,
19
            'process': self.extract_process_features
20
        }
21
        self.load_models()
22

23
    def load_models(self):
24
        """Load pre-trained models"""
25
        for model_type in ['network', 'authentication', 'file_access', 'process']:
26
            try:
27
                self.models[model_type] = joblib.load(f'{self.model_path}/{model_type}_model.pkl')
28
                self.scalers[model_type] = joblib.load(f'{self.model_path}/{model_type}_scaler.pkl')
29
            except:
30
                # Initialize new model if not found
31
                self.models[model_type] = IsolationForest(
32
                    contamination=0.1,
33
                    random_state=42,
34
                    n_estimators=100
35
                )
36
                self.scalers[model_type] = StandardScaler()
37

38
    def extract_network_features(self, event):
39
        """Extract features for network anomaly detection"""
40
        features = {
41
            'src_port': int(event.get('src_port', 0)),
42
            'dst_port': int(event.get('dst_port', 0)),
43
            'packet_size': int(event.get('packet_size', 0)),
44
            'duration': int(event.get('duration', 0)),
45
            'bytes_in': int(event.get('bytes_in', 0)),
46
            'bytes_out': int(event.get('bytes_out', 0)),
47
            'packets_in': int(event.get('packets_in', 0)),
48
            'packets_out': int(event.get('packets_out', 0)),
49
            'protocol_tcp': 1 if event.get('protocol') == 'tcp' else 0,
50
            'protocol_udp': 1 if event.get('protocol') == 'udp' else 0,
51
            'hour': datetime.fromisoformat(event.get('timestamp', datetime.now().isoformat())).hour,
52
            'is_weekend': 1 if datetime.fromisoformat(event.get('timestamp', datetime.now().isoformat())).weekday() >= 5 else 0
53
        }
54
        return features
55

56
    def extract_auth_features(self, event):
57
        """Extract features for authentication anomaly detection"""
58
        features = {
59
            'hour': datetime.fromisoformat(event.get('timestamp', datetime.now().isoformat())).hour,
60
            'day_of_week': datetime.fromisoformat(event.get('timestamp', datetime.now().isoformat())).weekday(),
61
            'is_weekend': 1 if datetime.fromisoformat(event.get('timestamp', datetime.now().isoformat())).weekday() >= 5 else 0,
62
            'auth_success': 1 if event.get('outcome') == 'success' else 0,
63
            'source_ip_octets': self.ip_to_features(event.get('source_ip', '0.0.0.0')),
64
            'auth_type_password': 1 if event.get('auth_type') == 'password' else 0,
65
            'auth_type_key': 1 if event.get('auth_type') == 'publickey' else 0,
66
            'auth_type_kerberos': 1 if event.get('auth_type') == 'kerberos' else 0
67
        }
68

69
        # Flatten IP octets
70
        ip_features = features.pop('source_ip_octets')
71
        for i, octet in enumerate(ip_features):
72
            features[f'ip_octet_{i}'] = octet
73

74
        return features
75

76
    def extract_file_features(self, event):
77
        """Extract features for file access anomaly detection"""
78
        file_path = event.get('file_path', '')
79
        features = {
80
            'hour': datetime.fromisoformat(event.get('timestamp', datetime.now().isoformat())).hour,
81
            'file_size': int(event.get('file_size', 0)),
82
            'operation_read': 1 if event.get('operation') == 'read' else 0,
83
            'operation_write': 1 if event.get('operation') == 'write' else 0,
84
            'operation_delete': 1 if event.get('operation') == 'delete' else 0,
85
            'is_hidden': 1 if file_path.startswith('.') or '/.' in file_path else 0,
86
            'is_system': 1 if any(sys_path in file_path for sys_path in ['/etc/', '/sys/', '/proc/']) else 0,
87
            'path_depth': file_path.count('/'),
88
            'extension_executable': 1 if any(file_path.endswith(ext) for ext in ['.exe', '.sh', '.bat', '.cmd']) else 0,
89
            'extension_config': 1 if any(file_path.endswith(ext) for ext in ['.conf', '.cfg', '.ini', '.yaml']) else 0
90
        }
91
        return features
92

93
    def extract_process_features(self, event):
94
        """Extract features for process anomaly detection"""
95
        features = {
96
            'hour': datetime.fromisoformat(event.get('timestamp', datetime.now().isoformat())).hour,
97
            'cpu_usage': float(event.get('cpu_usage', 0)),
98
            'memory_usage': float(event.get('memory_usage', 0)),
99
            'thread_count': int(event.get('thread_count', 1)),
100
            'ppid': int(event.get('ppid', 0)),
101
            'nice_value': int(event.get('nice', 0)),
102
            'is_system_process': 1 if int(event.get('uid', 1000)) < 1000 else 0,
103
            'has_network': 1 if event.get('network_connections', 0) > 0 else 0,
104
            'child_count': int(event.get('child_processes', 0)),
105
            'file_descriptors': int(event.get('open_files', 0))
106
        }
107
        return features
108

109
    def ip_to_features(self, ip):
110
        """Convert IP address to numerical features"""
111
        try:
112
            octets = [int(x) for x in ip.split('.')]
113
            return octets + [0] * (4 - len(octets))
114
        except:
115
            return [0, 0, 0, 0]
116

117
    def detect_anomaly(self, event_type, event_data):
118
        """Detect anomaly using appropriate model"""
119
        if event_type not in self.models:
120
            return None
121

122
        # Extract features
123
        feature_extractor = self.feature_extractors.get(event_type)
124
        if not feature_extractor:
125
            return None
126

127
        features = feature_extractor(event_data)
128

129
        # Convert to array
130
        feature_array = np.array(list(features.values())).reshape(1, -1)
131

132
        # Scale features
133
        try:
134
            scaled_features = self.scalers[event_type].transform(feature_array)
135
        except:
136
            # Fit scaler if not fitted
137
            scaled_features = self.scalers[event_type].fit_transform(feature_array)
138

139
        # Predict
140
        prediction = self.models[event_type].predict(scaled_features)[0]
141
        anomaly_score = self.models[event_type].score_samples(scaled_features)[0]
142

143
        if prediction == -1:  # Anomaly detected
144
            return {
145
                'is_anomaly': True,
146
                'anomaly_score': float(-anomaly_score),  # Convert to positive score
147
                'event_type': event_type,
148
                'features': features,
149
                'severity': self.calculate_severity(anomaly_score),
150
                'confidence': self.calculate_confidence(anomaly_score)
151
            }
152

153
        return {
154
            'is_anomaly': False,
155
            'anomaly_score': float(-anomaly_score),
156
            'event_type': event_type
157
        }
158

159
    def calculate_severity(self, anomaly_score):
160
        """Calculate severity based on anomaly score"""
161
        if anomaly_score < -0.5:
162
            return 'critical'
163
        elif anomaly_score < -0.3:
164
            return 'high'
165
        elif anomaly_score < -0.1:
166
            return 'medium'
167
        else:
168
            return 'low'
169

170
    def calculate_confidence(self, anomaly_score):
171
        """Calculate confidence level"""
172
        # Map anomaly score to confidence percentage
173
        confidence = min(100, max(0, (abs(anomaly_score) + 0.5) * 100))
174
        return round(confidence, 2)
175

176
    def update_model(self, event_type, new_data, labels=None):
177
        """Update model with new data (online learning)"""
178
        if event_type not in self.models:
179
            return
180

181
        # Extract features for all new data
182
        feature_extractor = self.feature_extractors.get(event_type)
183
        features_list = []
184

185
        for event in new_data:
186
            features = feature_extractor(event)
187
            features_list.append(list(features.values()))
188

189
        # Convert to array
190
        X = np.array(features_list)
191

192
        # Update scaler
193
        self.scalers[event_type].partial_fit(X)
194

195
        # Scale features
196
        X_scaled = self.scalers[event_type].transform(X)
197

198
        # Retrain model (in practice, you might want to use incremental learning)
199
        if labels is None:
200
            # Unsupervised retraining
201
            self.models[event_type].fit(X_scaled)
202
        else:
203
            # Semi-supervised if labels are provided
204
            # IsolationForest doesn't support this directly,
205
            # but you could use this for model selection
206
            pass
207

208
        # Save updated model
209
        joblib.dump(self.models[event_type], f'{self.model_path}/{event_type}_model.pkl')
210
        joblib.dump(self.scalers[event_type], f'{self.model_path}/{event_type}_scaler.pkl')
211

212

213
def main():
214
    # Read event from Wazuh
215
    event = json.loads(sys.stdin.read())
216

217
    # Determine event type
218
    rule_id = event.get('rule', {}).get('id', '')
219

220
    if rule_id.startswith('5'):
221
        event_type = 'authentication'
222
    elif rule_id.startswith('6'):
223
        event_type = 'network'
224
    elif rule_id.startswith('7'):
225
        event_type = 'file_access'
226
    elif rule_id.startswith('8'):
227
        event_type = 'process'
228
    else:
229
        event_type = None
230

231
    if event_type:
232
        detector = MLAnomalyDetector()
233
        result = detector.detect_anomaly(event_type, event.get('data', {}))
234

235
        if result and result['is_anomaly']:
236
            # Generate alert
237
            anomaly_alert = {
238
                'integration': 'ml_anomaly_detector',
239
                'anomaly': result,
240
                'original_event': event,
241
                'timestamp': datetime.now().isoformat(),
242
                'description': f'ML-detected anomaly in {event_type} behavior'
243
            }
244

245
            print(json.dumps(anomaly_alert))
246

247
if __name__ == '__main__':
248
    main()

Integration and Deployment#

Wazuh Manager Configuration#

1
<ossec_config>
2
  <!-- Enable anomaly detection integrations -->
3
  <integration>
4
    <name>behavioral_anomaly</name>
5
    <hook_url>file:///var/ossec/integrations/behavioral_anomaly.py</hook_url>
6
    <rule_id>5501,5503,5901,550,551,552,86001-86010</rule_id>
7
    <alert_format>json</alert_format>
8
    <options>{"threshold": 2.5, "window_days": 7}</options>
9
  </integration>
10

11
  <integration>
12
    <name>dga_detector</name>
13
    <hook_url>file:///var/ossec/integrations/dga_detector.py</hook_url>
14
    <rule_id>34001-34100</rule_id>
15
    <alert_format>json</alert_format>
16
  </integration>
17

18
  <integration>
19
    <name>ml_anomaly_detector</name>
20
    <hook_url>file:///var/ossec/integrations/ml_anomaly_detector.py</hook_url>
21
    <rule_id>all</rule_id>
22
    <alert_format>json</alert_format>
23
    <options>{"model_path": "/var/ossec/models/"}</options>
24
  </integration>
25

26
  <!-- Active response for anomalies -->
27
  <active-response>
28
    <disabled>no</disabled>
29
    <command>anomaly-response</command>
30
    <location>local</location>
31
    <rules_id>100001-100100</rules_id>
32
    <timeout>300</timeout>
33
  </active-response>
34
</ossec_config>

Active Response Script#

1
#!/bin/bash
2
ACTION=$1
3
USER=$2
4
IP=$3
5
ALERT_ID=$4
6
RULE_ID=$5
7

8
LOG_FILE="/var/ossec/logs/active-responses.log"
9

10
DATE=$(date +"%Y-%m-%d %H:%M:%S")
11
echo "[$DATE] Anomaly response triggered: Action=$ACTION User=$USER IP=$IP Alert=$ALERT_ID Rule=$RULE_ID" >> $LOG_FILE
12

13
case $RULE_ID in
14
    # High-risk anomalies - immediate action
15
    10003[0-9]|10004[0-9])
16
        if [ "$ACTION" = "add" ]; then
17
            # Block user account
18
            usermod -L "$USER" 2>/dev/null
19
            echo "[$DATE] User $USER account locked due to high-risk anomaly" >> $LOG_FILE
20

21
            # Kill user sessions
22
            pkill -KILL -u "$USER"
23

24
            # Send alert to security team
25
            /var/ossec/integrations/send_alert.py "Critical anomaly detected for user $USER"
26
        fi
27
        ;;
28

29
    # Medium-risk anomalies - enhanced monitoring
30
    10002[0-9])
31
        if [ "$ACTION" = "add" ]; then
32
            # Enable detailed logging for user
33
            echo "$USER" >> /var/ossec/logs/enhanced_monitoring.list
34

35
            # Increase audit logging
36
            auditctl -a always,exit -F arch=b64 -F uid="$USER" -S all -k anomaly_monitor
37
        fi
38
        ;;
39

40
    # Network anomalies - firewall rules
41
    10001[0-9])
42
        if [ "$ACTION" = "add" ] && [ "$IP" != "" ]; then
43
            # Add temporary firewall rule
44
            iptables -I INPUT -s "$IP" -j DROP
45
            echo "[$DATE] Blocked IP $IP due to network anomaly" >> $LOG_FILE
46

47
            # Schedule removal after timeout
48
            echo "iptables -D INPUT -s $IP -j DROP" | at now + 5 hours
49
        fi
50
        ;;
51
esac
52

53
exit 0

Monitoring and Tuning#

Performance Monitoring#

1
#!/usr/bin/env python3
2
import json
3
import sqlite3
4
from datetime import datetime, timedelta
5
import matplotlib.pyplot as plt
6
import seaborn as sns
7

8
class AnomalyPerformanceMonitor:
9
    def __init__(self):
10
        self.conn = sqlite3.connect('/var/ossec/logs/anomaly_metrics.db')
11
        self.init_database()
12

13
    def init_database(self):
14
        cursor = self.conn.cursor()
15
        cursor.execute('''
16
            CREATE TABLE IF NOT EXISTS anomaly_metrics (
17
                timestamp DATETIME,
18
                detector_type TEXT,
19
                true_positives INTEGER,
20
                false_positives INTEGER,
21
                false_negatives INTEGER,
22
                processing_time REAL,
23
                memory_usage INTEGER
24
            )
25
        ''')
26
        self.conn.commit()
27

28
    def generate_performance_report(self):
29
        """Generate performance metrics report"""
30
        # Query metrics
31
        cursor = self.conn.cursor()
32
        cursor.execute('''
33
            SELECT
34
                detector_type,
35
                SUM(true_positives) as tp,
36
                SUM(false_positives) as fp,
37
                SUM(false_negatives) as fn,
38
                AVG(processing_time) as avg_time,
39
                MAX(memory_usage) as max_memory
40
            FROM anomaly_metrics
41
            WHERE timestamp > ?
42
            GROUP BY detector_type
43
        ''', ((datetime.now() - timedelta(days=7)).isoformat(),))
44

45
        results = cursor.fetchall()
46

47
        report = {
48
            'generated_at': datetime.now().isoformat(),
49
            'period': 'last_7_days',
50
            'detectors': {}
51
        }
52

53
        for row in results:
54
            detector_type, tp, fp, fn, avg_time, max_memory = row
55

56
            # Calculate metrics
57
            precision = tp / (tp + fp) if (tp + fp) > 0 else 0
58
            recall = tp / (tp + fn) if (tp + fn) > 0 else 0
59
            f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
60

61
            report['detectors'][detector_type] = {
62
                'precision': round(precision, 3),
63
                'recall': round(recall, 3),
64
                'f1_score': round(f1_score, 3),
65
                'avg_processing_time_ms': round(avg_time, 2),
66
                'max_memory_mb': round(max_memory / 1024 / 1024, 2),
67
                'total_alerts': tp + fp
68
            }
69

70
        return report
71

72
    def plot_anomaly_trends(self):
73
        """Generate visualization of anomaly trends"""
74
        cursor = self.conn.cursor()
75
        cursor.execute('''
76
            SELECT
77
                DATE(timestamp) as date,
78
                detector_type,
79
                SUM(true_positives + false_positives) as total_anomalies
80
            FROM anomaly_metrics
81
            WHERE timestamp > ?
82
            GROUP BY DATE(timestamp), detector_type
83
            ORDER BY date
84
        ''', ((datetime.now() - timedelta(days=30)).isoformat(),))
85

86
        data = cursor.fetchall()
87

88
        # Create plot
89
        plt.figure(figsize=(12, 6))
90

91
        # Process data by detector type
92
        detectors = {}
93
        for date, detector, count in data:
94
            if detector not in detectors:
95
                detectors[detector] = {'dates': [], 'counts': []}
96
            detectors[detector]['dates'].append(date)
97
            detectors[detector]['counts'].append(count)
98

99
        # Plot lines for each detector
100
        for detector, values in detectors.items():
101
            plt.plot(values['dates'], values['counts'], label=detector, marker='o')
102

103
        plt.xlabel('Date')
104
        plt.ylabel('Anomaly Count')
105
        plt.title('Anomaly Detection Trends (Last 30 Days)')
106
        plt.legend()
107
        plt.xticks(rotation=45)
108
        plt.tight_layout()
109

110
        # Save plot
111
        plt.savefig('/var/ossec/reports/anomaly_trends.png')
112
        plt.close()

Best Practices#

1. Baseline Establishment#

Allow 2-4 weeks for initial baseline creation
Regularly update baselines to adapt to legitimate changes
Separate baselines for different user groups and time periods

2. False Positive Reduction#

1
# Whitelist management
2
class AnomalyWhitelist:
3
    def __init__(self):
4
        self.whitelist = {
5
            'users': ['backup_user', 'monitoring_user'],
6
            'processes': ['backup.sh', 'health_check.py'],
7
            'ips': ['10.0.0.10', '10.0.0.11'],  # Monitoring servers
8
            'patterns': [
9
                r'^/var/log/.*\.log$',  # Log file access
10
                r'^/tmp/systemd-.*'      # System temporary files
11
            ]
12
        }
13

14
    def is_whitelisted(self, event_type, value):
15
        return value in self.whitelist.get(event_type, [])

3. Tuning Recommendations#

Start with higher thresholds and gradually lower them
Use feedback loops to improve detection accuracy
Implement time-based and context-aware thresholds
Regular review of anomaly patterns and adjustments

4. Integration Guidelines#

Test integrations in isolated environments first
Implement gradual rollout with monitoring
Maintain separate configurations for different environments
Document all custom rules and modifications

Conclusion#

Wazuh’s anomaly detection capabilities provide a powerful layer of security beyond traditional signature-based detection. By implementing these use cases:

Behavioral Analysis: Detect deviations from normal user and system behavior
Statistical Anomalies: Identify outliers in system metrics and patterns
Machine Learning: Leverage AI for advanced threat detection
Insider Threats: Monitor and score internal security risks
Zero-Day Protection: Detect unknown threats through behavioral patterns

The key to successful anomaly detection is continuous tuning, regular baseline updates, and integration with existing security workflows. These implementations provide a foundation for building a comprehensive anomaly detection system tailored to your organization’s specific needs.