Description
Describe the bug
Google Docs/Sheets/Slides not working in the V2 SDK Google Drive source connector
To Reproduce
Ingesting from Google Drive, partitioning via Unstructured API, embedding via OpenAI,and writing to AstraDB
runner = GoogleDriveRunner(
processor_config=ProcessorConfig(
verbose=True,
output_dir=os.environ['GOOGLE_DRIVE_OUTPUT'],
num_processes=2,
),
read_config=ReadConfig(),
partition_config=PartitionConfig(
partition_by_api=True,
api_key=os.getenv("UNSTRUCTURED_API_KEY")
),
connector_config=SimpleGoogleDriveConfig(
access_config=GoogleDriveAccessConfig(
service_account_key=os.getenv("GOOGLE_DRIVE_ACCOUNT_KEY")
),
recursive=True,
drive_id=os.getenv("GOOGLE_DRIVE_FOLDER_ID"),
),
chunking_config=ChunkingConfig(chunk_elements=True),
embedding_config=EmbeddingConfig(
provider="langchain-openai",
api_key=os.getenv("OPENAI_API_KEY"),
),
writer=get_writer(),
writer_kwargs={},
)
Expected behavior
As in V1, I expect the file to be parsed
Screenshots
KeyError Traceback (most recent call last)
in <cell line: 1>()
33 stager_config=WeaviateUploadStagerConfig(),
34 uploader_config=WeaviateUploaderConfig(),
---> 35 ).run()
7 frames
/usr/local/lib/python3.10/dist-packages/unstructured/ingest/v2/processes/connectors/google_drive.py in map_file_data(f)
131 file_id = f["id"]
132 filename = f.pop("name")
--> 133 url = f.pop("webContentLink")
134 version = f.pop("version", None)
135 permissions = f.pop("permissions", None)
KeyError: 'webContentLink'
Environment Info
This doesn't only happen in my env but also for anyone else that tries this snippet
Additional context
Add any other context about the problem here.