Skip to content

When running /learn on a directory, capture and skip files that couldn't be learned #423

Open
@JasonWeill

Description

@JasonWeill

Problem

When the user chooses to learn a folder, and at least one item in the folder cannot be read, we fail without indicating which file(s) cannot be learned. This is a bad user experience.

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/nbformat/reader.py", line 20, in parse_json
    nb_dict = json.loads(s, **kwargs)
  File "/opt/conda/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/opt/conda/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/conda/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 16 column 2 (char 16)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/base.py", line 39, in process_message
    await self._process_message(message)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/learn.py", line 118, in _process_message
    await self.learn_dir(load_path, args.chunk_size, args.chunk_overlap)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/learn.py", line 150, in learn_dir
    doc_chunks = await dask_client.compute(delayed)
  File "/opt/conda/lib/python3.10/site-packages/distributed/client.py", line 330, in _result
    raise exc.with_traceback(tb)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/document_loaders/directory.py", line 46, in split_document
    return splitter.split_documents([document])
  File "/opt/conda/lib/python3.10/site-packages/langchain/text_splitter.py", line 161, in split_documents
    return self.create_documents(texts, metadatas=metadatas)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/document_loaders/splitter.py", line 31, in create_documents
    for chunk in self.split_text(text, metadata):
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/document_loaders/splitter.py", line 22, in split_text
    return splitter.split_text(text)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/document_loaders/splitter.py", line 48, in split_text
    nb = nbformat.reads(text, as_version=4)
  File "/opt/conda/lib/python3.10/site-packages/nbformat/__init__.py", line 89, in reads
    nb = reader.reads(s, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/nbformat/reader.py", line 76, in reads
    nb_dict = parse_json(s, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/nbformat/reader.py", line 26, in parse_json
    raise NotJSONError(message) from e
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n Here is ...

Proposed Solution

When a single file fails to be embedded, try other files.

Produce output indicating which file(s) failed.

Capture errors for affected files in a log file. Make the log file available in the root directory and have Jupyternaut tell the user about it.

Additional context

See #422 for a similar issue concerning the /generate command.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestscope:RAGIssues concerning RAG, e.g. /learn and /askscope:chat-uxIssues concerning the chat user experience

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions