Description
/!\ PLEASE INCLUDE THE FULL STACKTRACE AND CODE SNIPPET
Short description
Dataset the300w_lp
cannot be loaded due to Google Drive changes.
Environment information
-
Operating System: macos
-
Python version: 3.11.9
-
tensorflow-datasets
/tfds-nightly
version:tensorflow-datasets==4.9.5
-
tensorflow
/tf-nightly
version:tensorflow==2.15.1
-
Does the issue still exists with the last
tfds-nightly
package (pip install --upgrade tfds-nightly
) ?
Yes
Reproduction instructions
tfds.load("the300w_lp", with_info=True)
If you share a colab, make sure to update the permissions to share it.
Link to logs
If applicable, https://gist.github.com/Inokinoki/36ee1c47cf4ee2b0bef4754900189335
Expected behavior
Load the dataset correctly.
Additional context
I investigated the issue, it seems that Google Drive has a redirect with a warning for non-scanned files:

curl -L "https://drive.google.com/uc?export=download&id=0B7OEHD3T4eCkVGs0TkhUWFN6N1k"
<!DOCTYPE html><html><head><title>Google Drive - Virus scan warning</title><meta http-equiv="content-type" content="text/html; charset=utf-8"/><style nonce="Cnthv5s43ZEpklfe8-kwQA">.goog-link-button{position:relative;color:#15c;text-decoration:underline;cursor:pointer}.goog-link-button-disabled{color:#ccc;text-decoration:none;cursor:default}body{color:#222;font:normal 13px/1.4 arial,sans-serif;margin:0}.grecaptcha-badge{visibility:hidden}.uc-main{padding-top:50px;text-align:center}#uc-dl-icon{display:inline-block;margin-top:16px;padding-right:1em;vertical-align:top}#uc-text{display:inline-block;max-width:68ex;text-align:left}.uc-error-caption,.uc-warning-caption{color:#222;font-size:16px}#uc-download-link{text-decoration:none}.uc-name-size a{color:#15c;text-decoration:none}.uc-name-size a:visited{color:#61c;text-decoration:none}.uc-name-size a:active{color:#d14836;text-decoration:none}.uc-footer{color:#777;font-size:11px;padding-bottom:5ex;padding-top:5ex;text-align:center}.uc-footer a{color:#15c}.uc-footer a:visited{color:#61c}.uc-footer a:active{color:#d14836}.uc-footer-divider{color:#ccc;width:100%}.goog-inline-block{position:relative;display:-moz-inline-box;display:inline-block}* html .goog-inline-block{display:inline}*:first-child+html .goog-inline-block{display:inline}sentinel{}</style><link rel="icon" href="//ssl.gstatic.com/docs/doclist/images/drive_2022q3_32dp.png"/></head><body><div class="uc-main"><div id="uc-dl-icon" class="image-container"><div class="drive-sprite-aux-download-file"></div></div><div id="uc-text"><p class="uc-warning-caption">Google Drive can't scan this file for viruses.</p><p class="uc-warning-subcaption"><span class="uc-name-size"><a href="/open?id=0B7OEHD3T4eCkVGs0TkhUWFN6N1k">300W-LP.zip</a> (2.6G)</span> is too large for Google to scan for viruses. Would you still like to download this file?</p><form id="download-form" action="https://drive.usercontent.google.com/download" method="get"><input type="submit" id="uc-download-link" class="goog-inline-block jfk-button jfk-button-action" value="Download anyway"/><input type="hidden" name="id" value="0B7OEHD3T4eCkVGs0TkhUWFN6N1k"><input type="hidden" name="export" value="download"><input type="hidden" name="confirm" value="t"><input type="hidden" name="uuid" value="4fcfdc71-ca23-4264-8c6a-1322c7b1c73e"></form></div></div><div class="uc-footer"><hr class="uc-footer-divider"></div></body></html>%
Using the new URL with confirm=t
can resolve this issue.
Activity