Skip to content

Conversation

@igorgatis
Copy link

This PR fixes the issue #19862.

Currently, the generated code for python-pydantic-v1 tries to decode the response using utf-8. This fails when the response uses character sequence that is not utf-8, like so:

  File ".../.venv/lib/python3.11/site-packages/my_api/api_client.py", line 219, in __call_api
    e.body = e.body.decode('utf-8')
             ^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 78: invalid continuation byte

This change tries to use charset from the response. It leaves content unchanged when decoding fails.

PR checklist

  • Read the contribution guidelines.
  • Pull Request title clearly describes the work in the pull request and Pull Request description provides details about how to validate the work. Missing information here may result in delayed response from the community.
  • Run the following to build the project and update samples:
    ./mvnw clean package 
    ./bin/generate-samples.sh ./bin/configs/*.yaml
    ./bin/utils/export_docs_generators.sh
    
    (For Windows users, please run the script in Git BASH)
    Commit all changed files.
    This is important, as CI jobs will verify all generator outputs of your HEAD commit as it would merge with master.
    These must match the expectations made by your contribution.
    You may regenerate an individual generator by passing the relevant config(s) as an argument to the script, for example ./bin/generate-samples.sh bin/configs/java*.
    IMPORTANT: Do NOT purge/delete any folders/files (e.g. tests) when regenerating the samples as manually written tests may be removed.
  • File the PR against the correct branch: master (upcoming 7.x.0 minor release - breaking changes with fallbacks), 8.0.x (breaking changes without fallbacks)
  • If your PR is targeting a particular programming language, @mention the technical committee members, so they are more likely to review the pull request.

@igorgatis
Copy link
Author

@fa0311 @multani

if content_type is not None:
match = re.search(r"charset=([a-zA-Z\-\d]+)[\s;]?", content_type)
return match.group(1) if match else "utf-8"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR.

instead of using this, what about using the function that usually comes with the Python HTTP libraries instead, e.g. https://stackoverflow.com/questions/14592762/a-good-way-to-get-the-charset-encoding-of-an-http-response-in-python ?

(so that we don't need to reinvent the wheel and you may need to add another method in rest.py to return the charset)

@wing328 wing328 modified the milestones: 7.10.0, 7.11.0 Nov 18, 2024
@wing328 wing328 modified the milestones: 7.11.0, 7.12.0 Jan 20, 2025
@igorgatis igorgatis closed this by deleting the head repository Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants