Fix #288#307
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request addresses #288 by refactoring the indexing API’s upload/update flows to consistently attach the original uploaded filename to indexed documents (so /list_files can return it), and by adding/adjusting tests to cover the behavior.
Changes:
- Added
_apply_uploaded_file_metadatahelper to bind processed chunks to the API file ID and store the uploaded filename in metadata. - Updated
/v1/files(upload) and/v1/files/{fileId}(update) to use the helper for consistent ID/metadata handling. - Enhanced test coverage by adding unit/integration tests and switching result-metadata fixtures to use
DocumentMetadata.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/mmore/run_index_api.py |
Introduces the metadata helper and applies it in upload/update endpoints to persist filename and preserve chunk suffixes. |
tests/test_live_retriever_api.py |
Adds tests validating filename persistence in /list_files and updates metadata fixtures to use DocumentMetadata. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
| from mmore.run_index_api import ( |
JCHAVEROT
left a comment
There was a problem hiding this comment.
Hi @fabnemEPFL, looks good to me !
I tested by doing two HTTP POSTs, the first one using the master branch API and the second one using two one from your branch (the two APIs pointing to the same DB), and in the results below we can clearly see that your code fixed the issue !
curl -s 'http://localhost:8000/list_files?collection_name=my_docs' | jq
[
{
"id": "mytest1",
"filename": "Unknown"
},
{
"id": "mytest2",
"filename": "mmore.pdf"
}
]Also great that you also reused DocumentMetadata introduced in a recent PR in some tests 👍
Co-authored-by: Copilot <copilot@github.com>
This pull request refactors the handling of uploaded file metadata in the indexing API to ensure that the original filename is preserved and associated with each document, and improves the test coverage for this behavior. The main changes include extracting metadata processing into a helper function, updating the upload and update endpoints to use this helper, and adding new and updated tests to verify correct metadata handling.
Metadata handling improvements:
_apply_uploaded_file_metadatahelper function inrun_index_api.pyto consistently bind processed chunks to the API file ID and persist the original filename in document metadata.upload_fileandupdate_fileendpoints to use_apply_uploaded_file_metadata, ensuring consistent metadata assignment and filename preservation for uploaded documents. [1] [2]Testing enhancements:
test_apply_uploaded_file_metadata_preserves_chunk_suffixunit test to verify that the helper function correctly updates document IDs and stores the filename.test_uploaded_file_has_filename_in_list_filesintegration test to ensure that the/list_filesAPI returns the correct filename for uploaded files.DocumentMetadataclass for document metadata, improving type safety and clarity. [1] [2] [3]Imports and code cleanup:
test_live_retriever_api.pyto reflect the new helper function andDocumentMetadatausage.