Skip to content

Google Docs/Sheets/Slides not working in the V2 SDK Google Drive source connector  #74

Open
@ninalopatina

Description

@ninalopatina

Describe the bug
Google Docs/Sheets/Slides not working in the V2 SDK Google Drive source connector

To Reproduce

Ingesting from Google Drive, partitioning via Unstructured API, embedding via OpenAI,and writing to AstraDB

runner = GoogleDriveRunner(
processor_config=ProcessorConfig(
verbose=True,
output_dir=os.environ['GOOGLE_DRIVE_OUTPUT'],
num_processes=2,
),
read_config=ReadConfig(),
partition_config=PartitionConfig(
partition_by_api=True,
api_key=os.getenv("UNSTRUCTURED_API_KEY")
),
connector_config=SimpleGoogleDriveConfig(
access_config=GoogleDriveAccessConfig(
service_account_key=os.getenv("GOOGLE_DRIVE_ACCOUNT_KEY")
),
recursive=True,
drive_id=os.getenv("GOOGLE_DRIVE_FOLDER_ID"),
),
chunking_config=ChunkingConfig(chunk_elements=True),
embedding_config=EmbeddingConfig(
provider="langchain-openai",
api_key=os.getenv("OPENAI_API_KEY"),
),
writer=get_writer(),
writer_kwargs={},
)

Expected behavior
As in V1, I expect the file to be parsed

Screenshots

KeyError Traceback (most recent call last)
in <cell line: 1>()
33 stager_config=WeaviateUploadStagerConfig(),
34 uploader_config=WeaviateUploaderConfig(),
---> 35 ).run()

7 frames
/usr/local/lib/python3.10/dist-packages/unstructured/ingest/v2/processes/connectors/google_drive.py in map_file_data(f)
131 file_id = f["id"]
132 filename = f.pop("name")
--> 133 url = f.pop("webContentLink")
134 version = f.pop("version", None)
135 permissions = f.pop("permissions", None)

KeyError: 'webContentLink'

Environment Info
This doesn't only happen in my env but also for anyone else that tries this snippet

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions