Skip to content

Refactor: Request Parsing/Response Serialization #9073

@bpandola

Description

@bpandola

Summary

Over the coming weeks and months, Moto's request parsing/response serialization pipeline will be unified and normalized across all services.

When complete, Moto will have a robust set of core methods for parsing requests and serializing responses applied consistently throughout the project. Backends will be largely free of serialization/parsing concerns, allowing their code to be more focused on an individual service's implementation details. Whole categories of defects will have been eliminated and various patterns within the codebase will have been greatly simplified, ultimately making the project more stable and easier to maintain/contribute.

Important

For those who only interact with Moto via the AWS API boundary (using a client like Boto3, or other SDK's), this work should go virtually unnoticed.

Warning

If you are poking directly at Moto's internals, be aware that this work will include a fair bit of restructuring/renaming of existing modules/methods/classes/attributes etc.

Individual service backends will be slowly integrated into the new request/response pipeline, starting with the services that will benefit the most (e.g. Query protocol-based services). Additional features will be added as needs arise; defects will be addressed as they are encountered.

All related Issues and Pull Requests will be labeled to provide an at-a-glance view of the work in progress: here

Details

Intent

Leverage the AWS service definitions in Botocore to create generalized request parsers and response serializers for Moto.

Motivation

From its inception, the core Moto logic delegated some request parsing and most response serialization duties to each individual backend service implementation. With so many mocked services implemented incrementally over the years by numerous contributors, a lot of cruft has built up around request parsing/response serialization.

The base request parsing code grew haphazardly over time, as it attempted to handle the nuances of each AWS wire protocol. Service-specific parameter parsing code was more ad hoc, resulting in numerous specialized methods peppered throughout the various backends, often duplicating similar methods in other backends.

Serialization is currently the responsibility of each individual service--with some dumping JSON, some returning XML data, and some doing both. Backend models are often polluted with serialization concerns based on the expected response format (e.g. timestamps stored as strings, because JSON doesn't have a built-in date type; or boolean values stored as strings to match the XML representation), which unnecessarily complicates the service logic and is prone to error (e.g. what does the conditional if obj.boolean_value: evaluate to when boolean_value is the string "false"?).

Because higher-level abstractions don't currently exist for request parsing or response serialization, novel approaches taken for a particular service backend don't propagate. Helpers like _get_bool_param() or even the AWSServiceSpec class haven't been applied consistently, despite their usefulness.

Generalized request parsers and response serializers based on the AWS service definitions solve these problems. They allow service backends to receive typed parameters without the need for any specialized code. Action methods can simply return Python objects with native data types to be converted into properly formatted responses further up the call stack. A bug fix for, say, boolean attributes serialized via XML will automatically propagate to all boolean attributes across all services that utilize XML.

The resulting code will be more robust, more concise, and more maintainable because it will all be in one place instead of spread across every backend module.

Consequences

  1. Action methods will have access to typed input parameters. No more calling _get_bool_param() or _get_int_param(). All input parameters will be automatically coerced to the appropriate Python data type based on the AWS service definitions.
  2. Action methods can return simple objects. Dumping JSON, building XML trees, content negotiation, and even HTTP status codes for exceptions become serialization concerns handled further up the call stack.
  3. Backend models won't need formatted attributes. The AWS service definitions contain type info as well as formatting instructions (e.g. for timestamps), so models can safely use native Python data types, e.g. bool and datetime instead of the strings "true" and "2022-09-27T18:00:00.000".
  4. Backend models won't need formatting methods. No more to_dict(), to_json(), or to_xml() methods. As long as the backend models contain attributes that match the AWS service definitions, they will be properly serialized.

These positive consequences will come at the cost of slightly more difficult troubleshooting when things aren't working as expected. The removal of so much existing code, particularly the XML templates for the Query protocol-based services, will make things more abstract and less explicit. For example, if an expected attribute isn't showing up in a response, it might be more work to trace through the generalized serializer code than it would have been to just go right to the XML template.

Implementation

A core architectural component of Moto is being refactored, so the implementation takes into consideration the need for a measured rollout and having to coincide with existing code.

  1. Single input/output for parsing/serialization.

    a. The request parsers will take in an HTTP request object and return a typed input parameter object for the specific API operation.
    b. The response serializers will take in any object (including exceptions) and return an HTTP response (status code, headers, body), formatted based on the service definition and appropriate AWS protocol.

  2. Explicit hooks for the new parsers/serializers.

    a. Access to the new request parsing will be "opt-in" based on explicitly setting an attribute on a service backend's Response class.

    b. The following new base classes will facilitate integration of existing services into the new response serialization flow:

    ActionResult - a thin wrapper over any object, dict, etc. returned from a responses.py method.
    ServiceException - a base class mapping Moto service exceptions to Botocore error models.

    Code hooks in the BaseResponse class will pass instances of these classes directly to the appropriate response serializer. Any ActionResult returned or any raised exception derived from ServiceException will trigger the new serialization flow, allowing for gradual integration one backend service (or even one method or exception) at a time, without interfering with the existing flow.

  3. Convention over configuration. In the simplest case, where backend classes closely align with the Botocore model, everything should just magically work.

  4. Extensive customization points. The responses.py files will be the request/response boundary. To the extent that some backends will want to transform AWS model input into something easier to work with or need to transform a backend model into something more fit for serialization, that will be supported via configuration encapsulated in a service's Response class.

More concrete implementation details will obviously be available by perusing the code. Anything stated here, even if not yet committed, is planned. Implementation details are subject to change as needs/issues are identified during the rollout.

Migration

Although there is consistency in Moto with respect to how backends are implemented, i.e. the models.py and responses.py files being present, there is less consistency with respect to which of those files handles things like input validation, pagination, exceptions, etc. Sometimes these concerns are shared across both files.

Importantly, regardless of whether a service backend employs an Anemic Domain Model, a Rich Domain Model, an N-Tier Architecture with Data Transfer Objects, etc., etc., the generalized implementation of the parsers/serializers combined with the various extension points should provide the tools necessary to make things work.

When migrating backends to the new request/response flow, there's going to be a trade-off between handling everything in responses.py vs. actually "fixing" the model. As a general rule-of-thumb, we should error on the side of a minimal viable changeset. If the migration exposes an obvious bug in the backend logic, we should consider fixing that in a separate PR, without the added noise of the parsing/serialization changes.

Impact

Given Moto's test coverage and the nature of this refactor, impacts to end-users should be minimal. There is a chance that some behavior (correct or incorrect) may be encoded in a current XML template, not covered by existing tests, and missed during the migration; but that should be rare.

Questions, Concerns, Want to Help?

Drop a comment here to start a discussion.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions