Open
Description
Problem
When the user chooses to learn a folder, and at least one item in the folder cannot be read, we fail without indicating which file(s) cannot be learned. This is a bad user experience.
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/nbformat/reader.py", line 20, in parse_json
nb_dict = json.loads(s, **kwargs)
File "/opt/conda/lib/python3.10/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/opt/conda/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/conda/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 16 column 2 (char 16)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/base.py", line 39, in process_message
await self._process_message(message)
File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/learn.py", line 118, in _process_message
await self.learn_dir(load_path, args.chunk_size, args.chunk_overlap)
File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/learn.py", line 150, in learn_dir
doc_chunks = await dask_client.compute(delayed)
File "/opt/conda/lib/python3.10/site-packages/distributed/client.py", line 330, in _result
raise exc.with_traceback(tb)
File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/document_loaders/directory.py", line 46, in split_document
return splitter.split_documents([document])
File "/opt/conda/lib/python3.10/site-packages/langchain/text_splitter.py", line 161, in split_documents
return self.create_documents(texts, metadatas=metadatas)
File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/document_loaders/splitter.py", line 31, in create_documents
for chunk in self.split_text(text, metadata):
File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/document_loaders/splitter.py", line 22, in split_text
return splitter.split_text(text)
File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/document_loaders/splitter.py", line 48, in split_text
nb = nbformat.reads(text, as_version=4)
File "/opt/conda/lib/python3.10/site-packages/nbformat/__init__.py", line 89, in reads
nb = reader.reads(s, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/nbformat/reader.py", line 76, in reads
nb_dict = parse_json(s, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/nbformat/reader.py", line 26, in parse_json
raise NotJSONError(message) from e
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n Here is ...
Proposed Solution
When a single file fails to be embedded, try other files.
Produce output indicating which file(s) failed.
Capture errors for affected files in a log file. Make the log file available in the root directory and have Jupyternaut tell the user about it.
Additional context
See #422 for a similar issue concerning the /generate
command.