Understanding Wazuh Data Analysis and Rule Engine#

Wazuh’s ability to detect security incidents relies on its sophisticated data analysis capabilities and flexible rule engine. This article provides a technical deep dive into how Wazuh processes logs, applies decoders, and matches events against defined rules to generate meaningful security alerts.

1. Data Analysis in Wazuh#

The data analysis process in Wazuh follows a structured flow from raw log collection to alert generation. The diagram below illustrates the complete process:

1
graph TD
2
    A[Log Collection]
3
    B[Preprocessing & Normalization]
4
    C[Decoder Selection]
5
    D[JSON Decoder]
6
    E[Dynamic Fields Decoder]
7
    F[Sibling Decoders]
8
    G[Custom Decoders]
9
    H[Normalized Log Data]
10
    I[Rule Matching Engine]
11
    J[Default Rules]
12
    K[Custom Rules]
13
    L[Classification Rules]
14
    M[Alert Generation]
15
    N[Output/Notification]
16

17
    A --> B
18
    B --> C
19
    C --> D
20
    C --> E
21
    C --> F
22
    C --> G
23
    D --> H
24
    E --> H
25
    F --> H
26
    G --> H
27
    H --> I
28
    I --> J
29
    I --> K
30
    I --> L
31
    J --> M
32
    K --> M
33
    L --> M
34
    M --> N

Explanation:

Log Collection & Preprocessing: Logs from various sources are collected and preprocessed.
Decoder Selection: Depending on the log format, Wazuh selects the appropriate decoder—whether it’s a JSON decoder, one that handles dynamic fields, sibling decoders for related log events, or custom decoders defined by the user.
Normalized Data: Once decoded, logs become normalized and are passed to the rule matching engine.
Rule Matching: The engine evaluates the log data against different types of rules (default, custom, and classification).
Alert Generation: Matching rules trigger alerts which then flow to the output and notification subsystems.

Wazuh’s decoder selection mechanism is particularly powerful, as it can automatically determine the appropriate decoder for each log format, ensuring that all log data is properly parsed and normalized before rule evaluation.

2. Rules Architecture#

Rules in Wazuh follow a hierarchical structure and are defined using XML syntax. The following diagram illustrates the relationships between different rule types and their implementation:

1
graph TD
2
    A[Rule Definition]
3
    B[Default Rules]
4
    C[Custom Rules]
5
    D[Classification Rules]
6
    E[XML Syntax Structure]
7
    F[<rules> Element]
8
    G[<rule> Elements]
9
    H[Regex / PCRE2 Matching]
10
    I[Alert/Action Definition]
11

12
    A --> B
13
    A --> C
14
    A --> D
15
    A --> E
16
    E --> F
17
    F --> G
18
    G --> H
19
    B --> I
20
    C --> I
21
    D --> I

Explanation:

Rule Definition: Rules in Wazuh are defined using XML.
Types of Rules:
- Default Rules: Predefined by Wazuh for common events.
- Custom Rules: User-defined rules that extend or override defaults.
- Classification Rules: Group rules contextually to enhance event correlation.
XML Structure: Rules are structured under a <rules> element with multiple <rule> child elements.
Matching Logic: Each rule uses regex or PCRE2 patterns to match specific log events, ultimately triggering defined alerts or actions.

Rule definitions include attributes such as ID, level (severity), and description, as well as criteria for matching log events. The flexibility of XML-based rules allows security teams to fine-tune their detection capabilities to match their specific requirements.

3. Ruleset Configuration#

The Wazuh ruleset encompasses both decoders and rules in a unified XML configuration framework. The following diagram shows how these components interact:

1
graph TD
2
    A[Ruleset XML Configuration]
3
    B[Decoders Section]
4
    C[Rules Section]
5
    D[JSON Decoders]
6
    E[Dynamic Fields]
7
    F[Sibling Decoders]
8
    G[Custom Decoders]
9
    H[Default Rules]
10
    I[Custom Rules]
11
    J[Classification Rules]
12
    K[XML Elements: <ruleset>, <decoders>, <rules>]
13
    L[Alert/Response Mechanism]
14

15
    A --> K
16
    K --> B
17
    K --> C
18
    B --> D
19
    B --> E
20
    B --> F
21
    B --> G
22
    C --> H
23
    C --> I
24
    C --> J
25
    H --> L
26
    I --> L
27
    J --> L

Explanation:

Ruleset Configuration: The Wazuh ruleset is defined in an XML file that includes separate sections for decoders and rules.
Decoders Section:
- Contains various decoder types such as JSON decoders, dynamic fields, sibling decoders, and custom decoders.
Rules Section:
- Hosts the default, custom, and classification rules which determine how incoming log data is processed.
Integration:
- The combined XML elements create a cohesive ruleset that governs the alert/response mechanism of the system.

The ruleset configuration is stored in /var/ossec/etc/ossec.conf with additional rules and decoders in the /var/ossec/ruleset directory. Administrators can extend the default ruleset by adding custom rules in /var/ossec/etc/rules and custom decoders in /var/ossec/etc/decoders.

4. Rule Engine Execution Flow#

The rule engine is the heart of Wazuh’s analysis capabilities. The following diagram provides a detailed view of the execution flow from raw log ingestion to alert generation:

1
graph TD
2
    A[Incoming Log Data]
3
    B[Decoder Matching Process]
4
    C{Determine Decoder Type}
5
    D[Apply JSON Decoder]
6
    E[Apply Dynamic Fields Decoder]
7
    F[Apply Sibling Decoder]
8
    G[Apply Custom Decoder]
9
    H[Generate Normalized Log Structure]
10
    I[Enter Rule Engine]
11
    J{Evaluate Against Rules}
12
    K[Match with Default Rules]
13
    L[Match with Custom Rules]
14
    M[Match with Classification Rules]
15
    N{Rule Matched?}
16
    O[Trigger Alert]
17
    P[Log Archive / No Action]
18
    Q[Post-Processing & Notification]
19

20
    A --> B
21
    B --> C
22
    C -- JSON --> D
23
    C -- Dynamic --> E
24
    C -- Sibling --> F
25
    C -- Custom --> G
26
    D --> H
27
    E --> H
28
    F --> H
29
    G --> H
30
    H --> I
31
    I --> J
32
    J --> K
33
    J --> L
34
    J --> M
35
    K --> N
36
    L --> N
37
    M --> N
38
    N -- Yes --> O
39
    N -- No --> P
40
    O --> Q

Explanation:

Log Data Ingestion: Raw logs are received and sent to the decoder matching process.
Decoder Matching:
- A decision point determines which decoder to apply (JSON, dynamic, sibling, or custom) based on the log’s format and content.
Normalization:
- Decoders process the log and output a normalized structure.
Rule Evaluation:
- The normalized data enters the rule engine, where it is evaluated against various rules (default, custom, classification).
- The matching process may involve complex regex or PCRE2 patterns as defined in the XML configuration.
Alert Handling:
- If a rule match is found, an alert is triggered and then further processed for notifications; if no match is found, the log may be archived or ignored.

The rule engine’s ability to process high volumes of logs in real-time depends on efficient decoder selection and rule evaluation algorithms. Wazuh optimizes this process by using fast pattern matching techniques and hierarchical rule evaluation to minimize processing overhead.

Implementation Example: Creating Custom Rules#

To illustrate how the rule engine works in practice, let’s look at a simple example of creating a custom rule to detect failed SSH login attempts:

1
<rule id="100001" level="5">
2
  <if_sid>5710</if_sid>
3
  <match>^Failed password for root from</match>
4
  <description>Failed SSH login attempt for root user</description>
5
</rule>
6

7
<rule id="100002" level="10" frequency="5" timeframe="120">
8
  <if_matched_sid>100001</if_matched_sid>
9
  <same_source_ip />
10
  <description>Multiple failed SSH login attempts from same source (possible brute force)</description>
11
</rule>

In this example:

Rule 100001 is triggered when a failed password attempt for the root user is detected
Rule 100002 correlates multiple occurrences (5 within 120 seconds) from the same source IP, indicating a potential brute force attack

When log data containing “Failed password for root from” is processed, the rule engine will match it against rule 100001, generating a level 5 alert. If this occurs 5 times within 2 minutes from the same IP, rule 100002 will trigger a higher severity (level 10) alert.

Conclusion#

Wazuh’s data analysis and rule engine form the foundation of its security monitoring capabilities. The flexible decoder system normalizes logs from diverse sources, while the rule engine provides powerful pattern matching and correlation features that enable security teams to detect and respond to threats effectively.

Understanding the flow of data through the Wazuh analysis pipeline—from log collection to alert generation—helps administrators optimize their security monitoring setup and develop custom rules tailored to their specific security requirements.

For organizations seeking to enhance their security posture, mastering Wazuh’s rule engine and decoder system is essential for creating a robust, customized security monitoring solution that can adapt to evolving threats.