Controlling the style of dumped YAML using PyYAML#

I had a list of Python dictionaries I wanted to output as YAML, but I wanted to control the style of the output.

Here’s the data:

1
items = [
2
    {
3
        "date": "2020-11-28",
4
        "body": "[Datasette 0.52](https://docs.datasette.io/en/stable/changelog.html#v0-52) - `--config` is now `--setting`, new `database_actions` plugin hook, `datasette publish cloudrun --apt-get-install` option and several bug fixes.",
5
    },
6
    {
7
        "date": "2020-10-31",
8
        "body": "[Datasette 0.51](https://docs.datasette.io/en/stable/changelog.html#v0-51) - A new visual design, plugin hooks for adding navigation options, better handling of binary data, URL building utility methods and better support for running Datasette behind a proxy. [Annotated release notes](https://simonwillison.net/2020/Nov/1/datasette-0-51/).",
9
    },
10
]

By default, the YAML output by import yaml; print(yaml.dump(items)) looks like this:

1
- body: '[Datasette 0.52](https://docs.datasette.io/en/stable/changelog.html#v0-52)
2
    - `--config` is now `--setting`, new `database_actions` plugin hook, `datasette
3
    publish cloudrun --apt-get-install` option and several bug fixes.'
4
  date: '2020-11-28'
5
- body: '[Datasette 0.51](https://docs.datasette.io/en/stable/changelog.html#v0-51)
6
    - A new visual design, plugin hooks for adding navigation options, better handling
7
    of binary data, URL building utility methods and better support for running Datasette
8
    behind a proxy. [Annotated release notes](https://simonwillison.net/2020/Nov/1/datasette-0-51/).'
9
  date: '2020-10-31'

I wanted to list the date key first, and I wanted the body key to use >- YAML multi-line syntax rather than a single quoted string.

I ended up combining these two recipes from Stack Overflow. First I registered new representers with PyYaml:

1
import yaml
2
from collections import OrderedDict
3

4
class literal(str):
5
    pass
6

7
def literal_presenter(dumper, data):
8
    return dumper.represent_scalar("tag:yaml.org,2002:str", data, style=">")
9

10

11
yaml.add_representer(literal, literal_presenter)
12

13
def represent_ordereddict(dumper, data):
14
    value = []
15

16
    for item_key, item_value in data.items():
17
        node_key = dumper.represent_data(item_key)
18
        node_value = dumper.represent_data(item_value)
19

20
        value.append((node_key, node_value))
21

22
    return yaml.nodes.MappingNode(u"tag:yaml.org,2002:map", value)
23

24
yaml.add_representer(OrderedDict, represent_ordereddict)

Then I used the following Python code to output my YAML in the desired key order:

1
print(yaml.dump([OrderedDict([
2
    ("date", item["date"]),
3
    ("body", literal(item["body"]))
4
]) for item in items], width=100))

The result was:

1
- date: '2020-11-28'
2
  body: >-
3
    [Datasette 0.52](https://docs.datasette.io/en/stable/changelog.html#v0-52) - `--config` is now `--setting`,
4
    new `database_actions` plugin hook, `datasette publish cloudrun --apt-get-install` option and several
5
    bug fixes.
6
- date: '2020-10-31'
7
  body: >-
8
    [Datasette 0.51](https://docs.datasette.io/en/stable/changelog.html#v0-51) - A new visual design,
9
    plugin hooks for adding navigation options, better handling of binary data, URL building utility methods
10
    and better support for running Datasette behind a proxy. [Annotated release notes](https://simonwillison.net/2020/Nov/1/datasette-0-51/).

Using > as the line style caused the width=100 argument to be respected. When I tried this with | as the line style the indentation was not applied.