Mocking a Textract LimitExceededException with boto#

For s3-ocr issue #21 I needed to write a test that simulates what happens when Amazon Textract returns a “LimitExceededException”. When using boto this error presents itself as an exception:

botocore.errorfactory.LimitExceededException: An error occurred (LimitExceededException) when calling the StartDocumentTextDetection operation: Open jobs exceed maximum concurrent job limit

I uses moto to simulate AWS in that test suite, but moto does not yet have a mechanism for simulating Textract errors like this one.

I ended up turning to Python mocks, here provided by the the pytest-mock fixture. Here’s the test I came up with:

1
def test_limit_exceeded_automatic_retry(s3, mocker):
2
    mocked = mocker.patch("s3_ocr.cli.start_document_text_extraction")
3
    # It's going to fail the first time, then succeed
4
    mocked.side_effect = [
5
        boto3.client("textract").exceptions.LimitExceededException(
6
            error_response={},
7
            operation_name="StartDocumentTextExtraction",
8
        ),
9
        {"JobId": "123"},
10
    ]
11
    runner = CliRunner()
12
    result = runner.invoke(cli, ["start", "my-bucket", "--all"])
13
    assert result.exit_code == 0
14
    assert result.output == (
15
        "Found 0 files with .s3-ocr.json out of 1 PDFs\n"
16
        "An error occurred (Unknown) when calling the StartDocumentTextExtraction operation: Unknown - retrying...\n"
17
        "Starting OCR for blah.pdf, Job ID: 123\n"
18
    )

Here I’m patching the function identified by the string "s3_ocr.cli.start_document_text_extraction". This is a new function that I wrote specifically to make this mock easier to apply - it lives in s3_ocr/cli.py and looks like this:

1
def start_document_text_extraction(textract, **kwargs):
2
    # Wrapper function to make this easier to mock in tests
3
    return textract.start_document_text_detection(**kwargs)

The most confusing thing about working with Python mocks is figuring out the string to use to mock the right piece of code. I like this pattern of refactoring the code under test to make it as simple to mock as possible.

The code I am testing here implements automatic retries. As such, I needed the API method I am simulating to fail the first time and then succeed the second time.

Originally I had done this with a side_effect() function - see below - but then @szotten on Twitter pointed out that you can instead set mock.side_effect to a list and it will cycle through those items in turn:

1
mocked.side_effect = [
2
    boto3.client("textract").exceptions.LimitExceededException(
3
        error_response={},
4
        operation_name="StartDocumentTextExtraction",
5
    ),
6
    {"JobId": "123"},
7
]

Any exception objects in that list will be raised by the mocked function; any other kind of object will be returned.

The hardest thing to figure out was how to simulate the exception. The original error message indicated botocore.errorfactory.LimitExceededException but that’s not actually a class you can import and raise.

Instead, I used boto3.client("textract").exceptions.LimitExceededException.

Figuring out that it needed an error_response and operation_name was tricky too. I eventually tracked down the botocore ClientError constructor, which showed me what I needed to provide:

1
class ClientError(Exception):
2
    MSG_TEMPLATE = (
3
        'An error occurred ({error_code}) when calling the {operation_name} '
4
        'operation{retry_info}: {error_message}'
5
    )
6

7
    def __init__(self, error_response, operation_name):
8
        retry_info = self._get_retry_info(error_response)
9
        error = error_response.get('Error', {})
10
        msg = self.MSG_TEMPLATE.format(
11
            error_code=error.get('Code', 'Unknown'),
12
            error_message=error.get('Message', 'Unknown'),
13
            operation_name=operation_name,
14
            retry_info=retry_info,
15
        )
16
        super().__init__(msg)
17
        self.response = error_response
18
        self.operation_name = operation_name

Using a side effect function#

Prior to the tip about setting .side_effect to a list I used a side effect function instead, with a nonlocal variable to change its behaviour the second time it was called.

1
should_fail = True
2

3
def side_effect(*args, **kwargs):
4
    nonlocal should_fail
5
    if should_fail:
6
        should_fail = False
7
        raise boto3.client("textract").exceptions.LimitExceededException(
8
            error_response={},
9
            operation_name="StartDocumentTextExtraction",
10
        )
11
    else:
12
        return {"JobId": "123"}
13

14
mocked.side_effect = side_effect