Skip to content
This repository was archived by the owner on Mar 1, 2024. It is now read-only.
This repository was archived by the owner on Mar 1, 2024. It is now read-only.

[Feature Request]: I'd like to specify the appropriate Reader for each file found while using SharePointReader #933

@ferdinandosimonetti

Description

@ferdinandosimonetti

Feature Description

Hi, actually I'm obtaining my test Documents by scanning a local directory

filename_fn = lambda filename: {"file_name": filename}

DocxReader = download_loader("DocxReader")
PptxReader = download_loader("PptxReader")
PandasExcelReader = download_loader("PandasExcelReader")
PDFReader = download_loader("PDFReader")

mytime("start multiple file types read")
dir_reader = SimpleDirectoryReader(docpath, file_metadata=filename_fn, filename_as_id=True, file_extractor={
  ".docx": DocxReader(),
  ".pptx": PptxReader(),
  ".xlsx": PandasExcelReader(),
  ".pdf": PDFReader()
})
documents = dir_reader.load_data()

but the real documents are stored inside a Sharepoint site and directory (that I, unfortunately, can't test now).
I was wondering if there's a way to use SharePointReader while retaining the ability to customize Document id/metadata, as well as the specific Reader for each file format.

Reason

There's no mention in SharePointReader's README of additional parameters like file_extractor, file_metadata, filename_as_id

Value of Feature

Being able to specify a (more) appropriate Reader for each file format could lead to better content interpretation afterwards, I suppose.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions