511 words
3 minutes
Mocking a Textract LimitExceededException with boto

Mocking a Textract LimitExceededException with boto#

For s3-ocr issue #21 I needed to write a test that simulates what happens when Amazon Textract returns a “LimitExceededException”. When using boto this error presents itself as an exception:

botocore.errorfactory.LimitExceededException: An error occurred (LimitExceededException) when calling the StartDocumentTextDetection operation: Open jobs exceed maximum concurrent job limit

I uses moto to simulate AWS in that test suite, but moto does not yet have a mechanism for simulating Textract errors like this one.

I ended up turning to Python mocks, here provided by the the pytest-mock fixture. Here’s the test I came up with:

def test_limit_exceeded_automatic_retry(s3, mocker):
mocked = mocker.patch("s3_ocr.cli.start_document_text_extraction")
# It's going to fail the first time, then succeed
mocked.side_effect = [
boto3.client("textract").exceptions.LimitExceededException(
error_response={},
operation_name="StartDocumentTextExtraction",
),
{"JobId": "123"},
]
runner = CliRunner()
result = runner.invoke(cli, ["start", "my-bucket", "--all"])
assert result.exit_code == 0
assert result.output == (
"Found 0 files with .s3-ocr.json out of 1 PDFs\n"
"An error occurred (Unknown) when calling the StartDocumentTextExtraction operation: Unknown - retrying...\n"
"Starting OCR for blah.pdf, Job ID: 123\n"
)

Here I’m patching the function identified by the string "s3_ocr.cli.start_document_text_extraction". This is a new function that I wrote specifically to make this mock easier to apply - it lives in s3_ocr/cli.py and looks like this:

def start_document_text_extraction(textract, **kwargs):
# Wrapper function to make this easier to mock in tests
return textract.start_document_text_detection(**kwargs)

The most confusing thing about working with Python mocks is figuring out the string to use to mock the right piece of code. I like this pattern of refactoring the code under test to make it as simple to mock as possible.

The code I am testing here implements automatic retries. As such, I needed the API method I am simulating to fail the first time and then succeed the second time.

Originally I had done this with a side_effect() function - see below - but then @szotten on Twitter pointed out that you can instead set mock.side_effect to a list and it will cycle through those items in turn:

mocked.side_effect = [
boto3.client("textract").exceptions.LimitExceededException(
error_response={},
operation_name="StartDocumentTextExtraction",
),
{"JobId": "123"},
]

Any exception objects in that list will be raised by the mocked function; any other kind of object will be returned.

The hardest thing to figure out was how to simulate the exception. The original error message indicated botocore.errorfactory.LimitExceededException but that’s not actually a class you can import and raise.

Instead, I used boto3.client("textract").exceptions.LimitExceededException.

Figuring out that it needed an error_response and operation_name was tricky too. I eventually tracked down the botocore ClientError constructor, which showed me what I needed to provide:

class ClientError(Exception):
MSG_TEMPLATE = (
'An error occurred ({error_code}) when calling the {operation_name} '
'operation{retry_info}: {error_message}'
)
def __init__(self, error_response, operation_name):
retry_info = self._get_retry_info(error_response)
error = error_response.get('Error', {})
msg = self.MSG_TEMPLATE.format(
error_code=error.get('Code', 'Unknown'),
error_message=error.get('Message', 'Unknown'),
operation_name=operation_name,
retry_info=retry_info,
)
super().__init__(msg)
self.response = error_response
self.operation_name = operation_name

Using a side effect function#

Prior to the tip about setting .side_effect to a list I used a side effect function instead, with a nonlocal variable to change its behaviour the second time it was called.

should_fail = True
def side_effect(*args, **kwargs):
nonlocal should_fail
if should_fail:
should_fail = False
raise boto3.client("textract").exceptions.LimitExceededException(
error_response={},
operation_name="StartDocumentTextExtraction",
)
else:
return {"JobId": "123"}
mocked.side_effect = side_effect
Mocking a Textract LimitExceededException with boto
https://mranv.pages.dev/posts/mocking-a-textract-limitexceededexception-with-boto/
Author
Anubhav Gain
Published at
2024-05-18
License
CC BY-NC-SA 4.0