Skip to content

Failed to download and load the300w_lp dataset through the current Google Drive URL #5525

Open
@Inokinoki

Description

/!\ PLEASE INCLUDE THE FULL STACKTRACE AND CODE SNIPPET

Short description

Dataset the300w_lp cannot be loaded due to Google Drive changes.

Environment information

  • Operating System: macos

  • Python version: 3.11.9

  • tensorflow-datasets/tfds-nightly version: tensorflow-datasets==4.9.5

  • tensorflow/tf-nightly version: tensorflow==2.15.1

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ?

Yes

Reproduction instructions

tfds.load("the300w_lp", with_info=True)

If you share a colab, make sure to update the permissions to share it.

Link to logs
If applicable, https://gist.github.com/Inokinoki/36ee1c47cf4ee2b0bef4754900189335

Expected behavior
Load the dataset correctly.

Additional context
I investigated the issue, it seems that Google Drive has a redirect with a warning for non-scanned files:

image
curl -L "https://drive.google.com/uc?export=download&id=0B7OEHD3T4eCkVGs0TkhUWFN6N1k"         
<!DOCTYPE html><html><head><title>Google Drive - Virus scan warning</title><meta http-equiv="content-type" content="text/html; charset=utf-8"/><style nonce="Cnthv5s43ZEpklfe8-kwQA">.goog-link-button{position:relative;color:#15c;text-decoration:underline;cursor:pointer}.goog-link-button-disabled{color:#ccc;text-decoration:none;cursor:default}body{color:#222;font:normal 13px/1.4 arial,sans-serif;margin:0}.grecaptcha-badge{visibility:hidden}.uc-main{padding-top:50px;text-align:center}#uc-dl-icon{display:inline-block;margin-top:16px;padding-right:1em;vertical-align:top}#uc-text{display:inline-block;max-width:68ex;text-align:left}.uc-error-caption,.uc-warning-caption{color:#222;font-size:16px}#uc-download-link{text-decoration:none}.uc-name-size a{color:#15c;text-decoration:none}.uc-name-size a:visited{color:#61c;text-decoration:none}.uc-name-size a:active{color:#d14836;text-decoration:none}.uc-footer{color:#777;font-size:11px;padding-bottom:5ex;padding-top:5ex;text-align:center}.uc-footer a{color:#15c}.uc-footer a:visited{color:#61c}.uc-footer a:active{color:#d14836}.uc-footer-divider{color:#ccc;width:100%}.goog-inline-block{position:relative;display:-moz-inline-box;display:inline-block}* html .goog-inline-block{display:inline}*:first-child+html .goog-inline-block{display:inline}sentinel{}</style><link rel="icon" href="//ssl.gstatic.com/docs/doclist/images/drive_2022q3_32dp.png"/></head><body><div class="uc-main"><div id="uc-dl-icon" class="image-container"><div class="drive-sprite-aux-download-file"></div></div><div id="uc-text"><p class="uc-warning-caption">Google Drive can't scan this file for viruses.</p><p class="uc-warning-subcaption"><span class="uc-name-size"><a href="/open?id=0B7OEHD3T4eCkVGs0TkhUWFN6N1k">300W-LP.zip</a> (2.6G)</span> is too large for Google to scan for viruses. Would you still like to download this file?</p><form id="download-form" action="https://drive.usercontent.google.com/download" method="get"><input type="submit" id="uc-download-link" class="goog-inline-block jfk-button jfk-button-action" value="Download anyway"/><input type="hidden" name="id" value="0B7OEHD3T4eCkVGs0TkhUWFN6N1k"><input type="hidden" name="export" value="download"><input type="hidden" name="confirm" value="t"><input type="hidden" name="uuid" value="4fcfdc71-ca23-4264-8c6a-1322c7b1c73e"></form></div></div><div class="uc-footer"><hr class="uc-footer-divider"></div></body></html>%

Using the new URL with confirm=t can resolve this issue.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions