[QUO-899][QUO-898] Support document metadata in Q-SDK#81
Conversation
QUO-899 Modify SDK document arg to accept a union of List of string and list of doc types
List of Doc type - should include page_content, and metadata properties or a a list of strings - what it currently is QUO-898 Add documents metadata to detections endpoint
Allow users to pass metadata about their documents (e.g. as a dictionary). For example - Tavily mentioned they want to report where documents are being retrieved from. |
There was a problem hiding this comment.
@crekhari I have a suggestion down below. Also can you make the equivalent PR in quotient-typescript since we have that now? Or ask @waldnzwrld for help
quotientai/client.py
Outdated
| self.logger.error(f"Documents must be a list of strings or dictionaries with 'page_content' and optional 'metadata' keys. Metadata keys must be strings") | ||
| return None | ||
| else: | ||
| self.logger.error(f"Documents must be a list of strings or dictionaries with 'page_content' and optional 'metadata' keys, got {type(doc)}") |
There was a problem hiding this comment.
Something we need to keep in mind: we should not log these errors if there is nothing that the user can do about them, as that will be very confusing and noisy for people to see.
My understanding is these log if we get back basically invalid data from the API and there's a mismatch with the what the client expects (?)
I would suggest we don't log anything here, and instead raise an exception with information about what was returned from the API and what the client expected, and suggest what the user should do about it.
There was a problem hiding this comment.
Talked in office, and user errors should be surfaced as quickly and "loudly" as possible. Logging to stout is not as loud as raising exceptions. We want this to be loud because it would suck if the user doesnt see the stout error msg because its cluttered with all their other logs, and then are wondering why logs are not being saved.
| from dataclasses import dataclass | ||
| from datetime import datetime | ||
| import time | ||
| from pydantic import BaseModel |
There was a problem hiding this comment.
Why are we importing pydantic here? All other classes in the sdk are instantiated without it?
There was a problem hiding this comment.
@waldnzwrld had a quick chat in the office: it'd be good to use pydantic for client-side validation over the dataclasses. Dataclasses are useful but don't do type validation.
@crekhari can you make a separate PR to replace stuff with pydantic entirely?
There was a problem hiding this comment.
It'd be better for us to replace everything in one go rather than doing it piece-meal
There was a problem hiding this comment.
Wholeheartedly agree that if we're going to do it, we should just go for it.
If nobody has opened an issue for it, I will
There was a problem hiding this comment.
There was a problem hiding this comment.
Thx for opening the issue, Ill assign it to myself for now, but its a lowish priority for me rn
waldnzwrld
left a comment
There was a problem hiding this comment.
Could you please update the tests in tests/resources/logs and tests/client to make sure your changes are behaving as expected?
Will do! |
| @@ -1,2 +1,3 @@ | |||
| [pytest] | |||
| addopts = -s | |||
| asyncio_default_fixture_loop_scope = function | |||
There was a problem hiding this comment.
I get a Deprecation warning when running tests (poetry run pytest --cov=quotientai tests/ --cov-report term-missing):
PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
Setting it explicitly makes it go away. Setting the event loop scope to function means a new event loop will be created for each test function that uses an async fixture
quotientai/client.py
Outdated
| except Exception as _: | ||
| self._logger.error(f"Documents must be a list of strings or dictionaries with 'page_content' and optional 'metadata' keys. Metadata keys must be strings") | ||
| return None | ||
| else: | ||
| self._logger.error(f"Documents must be a list of strings or dictionaries with 'page_content' and optional 'metadata' keys, got {type(doc)}") |
There was a problem hiding this comment.
I think this still needs to be addressed? #81 (comment)
Co-authored-by: Freddie Vargus <freddie@quotientai.co>
waldnzwrld
left a comment
There was a problem hiding this comment.
Non blocking but it is good to make sure that we have a happy case path in tests as well
| mock_random.return_value = 0.6 | ||
| assert logger._should_sample() is False | ||
|
|
||
| def test_log_with_invalid_document_dict(self): |
There was a problem hiding this comment.
We should add a happy case test, (A test where the document data is valid) for both the sync and async client arch
freddiev4
left a comment
There was a problem hiding this comment.
LGTM one request below and think good to 🚢
quotientai/async_client.py
Outdated
| raise QuotientAIError("Documents must be a list of strings or dictionaries with 'page_content' and optional 'metadata' keys. Metadata keys must be strings") | ||
| else: | ||
| raise QuotientAIError(f"Documents must be a list of strings or dictionaries with 'page_content' and optional 'metadata' keys, got {type(doc)}") |
There was a problem hiding this comment.
Can we make these more actionable? Like in TS
|
After this get's merged, can you cut a release @crekhari ? Do you know what the steps are? |
Example code with document metadata:
Validation errors are shown via logger.