Description
It looks a bit like an oxymoron to me, but when fully resolving apache-beam using gcp extras dependencies, httplib2 is forced to be on a version that doesn't allow it to call google, and any pipeline using google services (I haven't checked others), fails.
I have done the full back-tracing of the problem, let me try to explain my findings.
A quick way to reproduce this, is by using pipenv to install all the dependencies. It will make sure to resolve sub-dependencies, pipenv install apache-beam[gcp]
, and then run python -c 'from google.cloud import bigquery;client=bigquery.Client(); list(client.list_projects())'
. The error is the same when running a pipeline, but I kept it simple.
It will throw an error like this one:
/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google/auth/_default.py:66:
UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We
recommend that most server applications use service accounts instead. If your application continues
to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled"
error. For more information about service accounts, see https://cloud.google.com/docs/authentication/
warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
Traceback (most recent call last):
File "<string>",
line 1, in <module>
File "/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google/cloud/iterator.py",
line 218, in _items_iter
for page in self._page_iter(increment=False):
File "/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google/cloud/iterator.py",
line 247, in _page_iter
page = self._next_page()
File "/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google/cloud/iterator.py",
line 347, in _next_page
response = self._get_next_page_response()
File "/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google/cloud/iterator.py",
line 396, in _get_next_page_response
query_params=params)
File "/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google/cloud/_http.py",
line 299, in api_request
headers=headers, target_object=_target_object)
File "/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google/cloud/_http.py",
line 193, in _make_request
return self._do_request(method, url, headers, data, target_object)
File "/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google/cloud/_http.py",
line 223, in _do_request
body=data)
File "/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google_auth_httplib2.py",
line 187, in request
self._request, method, uri, request_headers)
File "/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google/auth/credentials.py",
line 122, in before_request
self.refresh(request)
File "/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google/oauth2/credentials.py",
line 136, in refresh
self._client_secret))
File "/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google/oauth2/_client.py",
line 237, in refresh_grant
response_data = _token_endpoint_request(request, token_uri, body)
File "/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google/oauth2/_client.py",
line 106, in _token_endpoint_request
method='POST', url=token_uri, headers=headers, body=body)
File "/home/javier/.local/share/virtualenvs/bqssltest-obub2LuN/lib/python2.7/site-packages/google_auth_httplib2.py",
line 119, in __call__
raise exceptions.TransportError(exc)
google.auth.exceptions.TransportError:
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:726)
The reason why I think this problem hasn't been posted before is because people is ignoring pip's output, which clearly states that there are some dependenciy issues:
(bqssltest) javier@ffukn897:~/projects/spinoffs/bqssltest$ pip install 'apache-beam[gcp]==2.7.0'
...
google-gax
0.15.16 has requirement future<0.17dev,>=0.16.0, but you'll have future 0.17.1 which is incompatible.
gapic-google-cloud-pubsub-v1
0.15.4 has requirement oauth2client<4.0dev,>=2.0.0, but you'll have oauth2client 4.1.3 which is incompatible.
googledatastore
7.0.1 has requirement httplib2<0.10,>=0.9.1, but you'll have httplib2 0.11.3 which is incompatible.
googledatastore
7.0.1 has requirement oauth2client<4.0.0,>=2.0.1, but you'll have oauth2client 4.1.3 which is incompatible.
proto-google-cloud-pubsub-v1
0.15.4 has requirement oauth2client<4.0dev,>=2.0.0, but you'll have oauth2client 4.1.3 which is incompatible.
proto-google-cloud-datastore-v1
0.90.4 has requirement oauth2client<4.0dev,>=2.0.0, but you'll have oauth2client 4.1.3 which is incompatible.
...
These warnings are caused by the version pinning in the GCP requirements, in specific googledatastore==7.0.1
has a direct requirement of httplib2 [required: >=0.9.1,<0.10, installed: 0.9.2]
. There is another version pinning of httplib2 directly by apache-beam, but doesn't cause the problem because it's asking for <=0.11.3
.
I have no idea why googledatastore is pinned on that version, it seems that someone is aware of the problem with datastore as googledatastore==7.0.2
is released with just that constraint removed.
The only thing missing here is to upgrade this line to use 7.0.2
:
https://github.com/apache/beam/blob/master/sdks/python/setup.py#L143
Can anyone do it and release a minor version? From previous experience I know it's way faster to merge a PR by a long running collaborator than by someone random on the internet.
Imported from Jira BEAM-6149. Original Jira may contain additional context.
Reported by: txomon.