Day 27 - From Automated to Automatic - Event-Driven Infrastructure Management with Ansible#

Daniel Bodky

Overview#

A universal truth and recurring theme in the DevOps world is automation. From providing infrastructure to testing code to deploying to production, many parts of the DevOps lifecycle get automated already. One popular technology for managing infrastructure and configuration in an automated way is Ansible, but are we fully utilizing its capabilities yet?

This presentation will give a broad overview of Ansible and its architecture and use-cases, before exploring a relatively new feature, Event-driven Ansible (EDA). Analzying applications of event-driven Ansible, participants will see that automated management is nice, but automatic management is awesome, not just regarding DevOps principles, but also in terms of reaction times, the human tendency for minor mistakes, and toil for operators.

Participants will get first-hand insights into Ansible, its strengths, weaknesses, and the potential of event-driven automation within the DevOps world.

NOTE
The below content is a copy of the lab repository’s README for convenience.

Event-Driven Ansible Lab#

This is a lab designed to demonstrate Ansible and how Event-Driven Ansible (EDA) builds on top of its capabilities.

The setup is done with Ansible, too. It will install Ansible, EDA, Prometheus, and Alertmanager on a VM to demonstrate some of the capabilities of EDA.

Prerequisites#

To follow along with this lab in its entirety, you will need three VMs:

NOTE
If you want to skip Ansible basics and go straight to EDA, you’ll need just the eda-controller.example.com VM and can skip the others.

VM name	OS
eda-controller.example.com	CentOS/Rocky 8.9
company.example.com	CentOS/Rocky 8.9
webshop.example.com	Ubuntu 22.04

You’ll need to be able to SSH to each of these VMs as root using SSH keys.

Lab Setup#

Clone the repository and create a Python virtual environment#

1
git clone https://github.com/mocdaniel/lab-event-driven-ansible.git
2
cd lab-event-driven-ansible
3
python3 -m venv .venv
4
source .venv/bin/activate

Install Ansible and other dependencies#

1
pip install -r requirements.txt

Create the inventory file#

1
---
2
webservers:
3
  hosts:
4
    webshop.example.com:
5
      ansible_host: <ip-address>
6
      webserver: apache2
7
    company.example.com:
8
      ansible_host: <ip-address>
9
      webserver: httpd
10
eda_controller:
11
  hosts:
12
    eda-controller.example.com:
13
      ansible_host: <ip-address>

Install Needed Roles and Collections#

1
ansible-galaxy install -r requirements.yml

Run the Setup Playbook#

After you created the inventory file and filled in the IP addresses, you can run the setup playbook:

1
ansible-playbook playbooks/setup.yml

Due to a known bug with Python on MacOS, you need to run export NO_PROXY="*" on MacOS before running the playbook

Demos#

Lab 1: Ansible Basics#

Ansible from the CLI via ansible

Ansible from the CLI via `ansible`#

The first example installs a webserver on all hosts in the webservers group. The installed webserver is defined as a host variable in the inventory file hosts.yml (see above).

1
ansible \
2
   webservers  \
3
  -m package   \
4
  -a 'name="{{ webserver }}"' \
5
  --one-line

Afterwards, we can start the webserver on all hosts in the webservers group.

1
ansible \
2
   webservers  \
3
  -m service   \
4
  -a 'name="{{ webserver }}" state=started' \
5
  --one-line

Go on and check if the web servers are running on the respective hosts.

[!HINT] Ansible is idempotent - try running the commands again and see how the output differs.

Ansible from the CLI via ansible-playbook

Ansible from the CLI via `ansible-playbook`#

The second example utilizes the following playbook to gather and display information for all hosts in the webservers group, utilizing the example role from the lab repository.

1
---
2
- name: Example role
3
  hosts: webservers
4
  gather_facts: false
5
  vars:
6
    greeting: "Hello World!"
7
  pre_tasks:
8
    - name: Say Hello
9
      ansible.builtin.debug:
10
        msg: "{{ greeting }}"
11
  roles:
12
    - role: example
13
  post_tasks:
14
    - name: Say goodbye
15
      ansible.builtin.debug:
16
        msg: Goodbye!

1
ansible-playbook \
2
    playbooks/example.yml

Lab 2: Event-Driven Ansible#

Receive Generic Events via Webhook

Receive Generic Events via Webhook#

If you followed the setup instructions for the EDA lab, you should already have a running EDA instance on the eda-controller.example.com VM.

If you navigate to /etc/edacontroller/rulebook.yml on the VM, you’ll see the following rulebook:

1
---
2
- name: Listen to webhook events
3
  hosts: all
4
  sources:
5
    - ansible.eda.webhook:
6
        host: 0.0.0.0
7
        port: 5000
8
  rules:
9
    - name: Debug event output
10
      condition: 1 == 1
11
      action:
12
        debug:
13
          msg: "{{ event }}"
14

15
- name: Listen to Alertmanager alerts
16
  hosts: all
17
  sources:
18
    - ansible.eda.alertmanager:
19
        host: 0.0.0.0
20
        port: 9000
21
        data_alerts_path: alerts
22
        data_host_path: labels.instance
23
        data_path_separator: .
24
  rules:
25
    - name: Restart MySQL server
26
      condition: event.alert.labels.alertname == 'MySQL not running' and event.alert.status == 'firing'
27
      action:
28
        run_module:
29
          name: ansible.builtin.service
30
          module_args:
31
            name: mysql
32
            state: restarted
33
    - name: Debug event output
34
      condition: 1 == 1
35
      action:
36
        debug:
37
          msg: "{{ event }}"

For this part of the lab, the first rule is the one we’re interested in: It listens to a generic webhook on port 5000 and prints the event’s metadata to its logs.

To test this, we can use the curl command to send a POST request to the webhook /endpoint from the VM itself:

1
curl \
2
  -X POST \
3
  -H "Content-Type: application/json" \
4
  -d '{"foo": "bar"}' \
5
  http://localhost:5000/endpoint

If you now check the logs of the EDA controller, you should see the following output:

1
journalctl -fu eda-controller
2

3
Jan 11 16:35:29 eda-controller ansible-rulebook[56882]: {'payload': {'foo': 'bar'}, 'meta': {'endpoint': 'endpoint',
4
'headers': {'Host': 'localhost:5000', 'User-Agent': 'curl/7.76.1', 'Accept': '*/*', 'Content-Length': '21',
5
'Content-Type': 'application/x-www-form-urlencoded'}, 'source': {'name': 'ansible.eda.webhook', 'type': 'ansible.eda.webhook'},
6
'received_at': '2024-01-11T15:35:29.798401Z', 'uuid': '6ebf8dd2-60a2-455a-9383-97b81f535366'}}

A rule that always evaluates to true is not very useful, so let’s change the rule to only print the the value of foo if the foo key is present in the event’s payload, and no foo :( otherwise:

1
---
2
- name: Listen to webhook events
3
  hosts: all
4
  sources:
5
    - ansible.eda.webhook:
6
        host: 0.0.0.0
7
        port: 5000
8
  rules:
9
    - name: Foo
10
      condition: event.payload.foo is defined
11
      action:
12
        debug:
13
          msg: "{{ event.payload.foo }}"
14
    - name: No foo
15
      condition: 1 == 1
16
      action:
17
        debug:
18
          msg: "no foo :("

Send the same curl request again and check the logs, you should see a line saying bar now.

Let’s also try a curl request with a different payload:

1
curl \
2
  -X POST \
3
  -H "Content-Type: application/json" \
4
  -d '{"bar": "baz"}' \
5
  http://localhost:5000/endpoint

This time, the output should be no foo :(.

Restarting Services Automatically with EDA

Restarting Services Automatically with EDA#

The last lab is more of a demo - it shows how you can use EDA to automatically react on events observed by Prometheus and Alertmanager.

For this demo, the second ruleset in our rulebook is the one we’re interested in:

1
- name: Listen to Alertmanager alerts
2
  hosts: all
3
  sources:
4
    - ansible.eda.alertmanager:
5
        host: 0.0.0.0
6
        port: 9000
7
        data_alerts_path: alerts
8
        data_host_path: labels.instance
9
        data_path_separator: .
10
  rules:
11
    - name: Restart MySQL server
12
      condition: event.alert.labels.alertname == 'MySQL not running' and event.alert.status == 'firing'
13
      action:
14
        run_playbook:
15
          name: ./playbook.yml
16
    - name: Debug event output
17
      condition: 1 == 1
18
      action:
19
        debug:
20
          msg: "{{ event }}"

With this rule, we can restart our MySQL server if it’s not running! But how do we get the event to trigger? With Prometheus and Alertmanager!

When you ran the setup playbook, it installed Prometheus and Alertmanager on the eda-controller.example.com VM. You can access the Prometheus UI at http://<eda-controller-ip>:9090 and the Alertmanager UI at http://<eda-controller-ip>:9093.

It also installed a Prometheus exporter for the MySQL database that runs on the server.

With this setup, we can now shut down our MySQL server and see what happens - make sure to watch the output of the EDA controller’s logs:

1
systemctl stop mysql
2
journalctl -fu edacontroller

Within 30-90 seconds, you should see EDA running our playbook and restarting the MySQL server. You can track that process by watching the Prometheus/Alertmanager UIs for firing alerts.

Once you see the playbook being executed in the logs, you can check the MySQL state once more:

1
systemctl status mysql

MySQL should be up and running again!

Day 27 - From Automated to Automatic - Event-Driven Infrastructure Management with Ansible#

Overview#

Event-Driven Ansible Lab#

Prerequisites#

Lab Setup#

Clone the repository and create a Python virtual environment#

Install Ansible and other dependencies#

Create the inventory file#

Install Needed Roles and Collections#

Run the Setup Playbook#

Demos#

Lab 1: Ansible Basics#

Ansible from the CLI via ansible#

Ansible from the CLI via ansible-playbook#

Lab 2: Event-Driven Ansible#

Receive Generic Events via Webhook#

Restarting Services Automatically with EDA#

Ansible from the CLI via `ansible`#

Ansible from the CLI via `ansible-playbook`#