Skip to content

metadata output not exactly in utf8 encoding... #33

@sdm7g

Description

@sdm7g

metadata output seems to be in ascii with other unicode characters encoded as numerical character entities. Legal for default utf8 encoding, as ascii is a subset, but this is not what I, and I think most people want or expect.
( This may be the same issue reported as #32 . This was also reported to me by Columbia.edu and I was able to reproduce it on both my and their OAI feeds. )

I initially tried adding encoding="UTF-8" to etree.tostring call in metadata.py but this worked under python3.x, but failed under python2.x .

adding encoding="unicode" appears to be the correct fix that seems to work under both python2.x and python3.x .

Under python2.x , encoding="UTF-8" returns a <type "str"> that contains unicode characters, which then may give an error when coercing to <type "unicode"> . encoding="unicode" returns <type "unicode"> .

See: https://github.com/sdm7g/oai-harvest/blob/fix-pyoai/oaiharvest/metadata.py#L51-L53

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions