Skip to content

Github blocks downloading the dbt schema #410

@ekini

Description

@ekini

The tap fails consistently in some conditions with

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/meltano/.meltano/extractors/tap-dbt/venv/lib/python3.10/site-packages/tap_dbt/client.py", line 29, in load_openapi
    return yaml.safe_load(response.text)
  File "/meltano/.meltano/extractors/tap-dbt/venv/lib/python3.10/site-packages/yaml/__init__.py", line 125, in safe_load
    return load(stream, SafeLoader)
  File "/meltano/.meltano/extractors/tap-dbt/venv/lib/python3.10/site-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
  File "/meltano/.meltano/extractors/tap-dbt/venv/lib/python3.10/site-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
  File "/meltano/.meltano/extractors/tap-dbt/venv/lib/python3.10/site-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "/meltano/.meltano/extractors/tap-dbt/venv/lib/python3.10/site-packages/yaml/composer.py", line 58, in compose_document
    self.get_event()
  File "/meltano/.meltano/extractors/tap-dbt/venv/lib/python3.10/site-packages/yaml/parser.py", line 118, in get_event
    self.current_event = self.state()
  File "/meltano/.meltano/extractors/tap-dbt/venv/lib/python3.10/site-packages/yaml/parser.py", line 193, in parse_document_end
    token = self.peek_token()
  File "/meltano/.meltano/extractors/tap-dbt/venv/lib/python3.10/site-packages/yaml/scanner.py", line 129, in peek_token
    self.fetch_more_tokens()
  File "/meltano/.meltano/extractors/tap-dbt/venv/lib/python3.10/site-packages/yaml/scanner.py", line 223, in fetch_more_tokens
    return self.fetch_value()
  File "/meltano/.meltano/extractors/tap-dbt/venv/lib/python3.10/site-packages/yaml/scanner.py", line 577, in fetch_value
    raise ScannerError(None, None,
yaml.scanner.ScannerError: mapping values are not allowed here
  in "<unicode string>", line 9, column 25:
            background-color: #f1f1f1;

If we look closely, this is what it receives:

>>> requests.get(OPENAPI_URL, timeout=10)
<Response [403]>
>>> requests.get(OPENAPI_URL, timeout=10).text
'\r\n<!DOCTYPE html>\r\n<html>\r\n  <head>\r\n    <meta content="origin" name="referrer">\r\n    <title>Forbidden &middot; GitHub</title>\r\n    <style type="text/css" media="screen">\r\n      body {\r\n        background-color: #f1f1f1;\r\n        margin: 0;\r\n      }\r\n      body,\r\n      input,\r\n      button {\r\n        font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;\r\n      }\r\n      .container { margin: 30px auto 40px auto; width: 800px; text-align: center; }\r\n      a { color: #4183c4; text-decoration: none; font-weight: bold; }\r\n      a:hover { text-decoration: underline; }\r\n      h1, h2, h3 { color: #666; }\r\n      ul { list-style: none; padding: 25px 0; }\r\n      li {\r\n        display: inline;\r\n        margin: 10px 50px 10px 0px;\r\n      }\r\n      .logo { display: inline-block; margin-top: 35px; }\r\n      .logo-img-2x { display: none; }\r\n      @media\r\n      only screen and (-webkit-min-device-pixel-ratio: 2),\r\n      only screen and (   min--moz-device-pixel-ratio: 2),\r\n  only screen and (     -o-min-device-pixel-ratio: 2/1),\r\n      only screen and (        min-device-pixel-ratio: 2),\r\n      only screen and (                min-resolution: 192dpi),\r\n      only screen and (                min-resolution: 2dppx) {\r\n    .logo-img-1x { display: none; }\r\n        .logo-img-2x { display: inline-block; }\r\n      }\r\n    </style>\r\n  </head>\r\n  <body>\r\n\r\n    <div class="container">\r\n      <h1>Access to this site has been restricted.</h1>\r\n\r\n      <p>\r\n <br>\r\n        If you believe this is an error,\r\n        please contact <a href="https://support.github.com/">Support</a>.\r\n     </p>\r\n\r\n      <div id="s">\r\n        <a href="https://githubstatus.com/">GitHub Status</a> &mdash;\r\n        <a href="https://twitter.com/githubstatus">@githubstatus</a>\r\n      </div>\r\n    </div>\r\n  </body>\r\n</html>\r\n'

Which is definitely not a YAML file, also note the 403 response code.

This happens most likely because the schema gets downloaded using the default User-Agent header set by requests library - https://github.com/MeltanoLabs/tap-dbt/blob/main/tap_dbt/client.py#L28

Maybe it should at least use the user-agent set in the tap config. Or, even better, the schema should be committed to the repo, as this dependency on Github availability reduces the whole system reliability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomers

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions