Skip to content

Nbformat/nbformat_minor not well extracted with HTTP handler #727

Open
@LetMeR00t

Description

@LetMeR00t

🐛 Bug

I'm currently trying to create a connector between Jupyter (using papermill) and another product named "Cortex" from the Strangee project.
I encountered an issue during my development. I'm currently testing the HTTP handler by trying to execute a notebook located on a JupyterHub instance which has a "demo" user for who a "cortex_job" server is configured.

import papermill as pm

pm.execute_notebook(
    "http://192.168.1.117:8000/user/demo/cortex_job/api/contents/notebook1.ipynb?token=SECRET",
    "http://192.168.1.117:8000/user/demo/cortex_job/api/contents/Folder1/notebook2.ipynb?token=SECRET",
    parameters = dict(var1 = "toto")
)

Everything is working fine to recover the notebook but I get an error message:

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[1], line 3
      1 import papermill as pm
----> 3 pm.execute_notebook(
      4     "http://192.168.1.117:8000/user/demo/cortex_job/api/contents/notebook1.ipynb?token=SECRET",
      5     "http://192.168.1.117:8000/user/demo/cortex_job/api/contents/Folder1/notebook2.ipynb?token=SECRET",
      6     parameters = dict(var1 = "toto")
      7 )

File /usr/local/lib/python3.10/dist-packages/papermill/execute.py:89, in execute_notebook(input_path, output_path, parameters, engine_name, request_save_on_cell_execute, prepare_only, kernel_name, language, progress_bar, log_output, stdout_file, stderr_file, start_timeout, report_mode, cwd, **engine_kwargs)
     86 if cwd is not None:
     87     logger.info("Working directory: {}".format(get_pretty_path(cwd)))
---> 89 nb = load_notebook_node(input_path)
     91 # Parameterize the Notebook.
     92 if parameters:

File /usr/local/lib/python3.10/dist-packages/papermill/iorw.py:512, in load_notebook_node(notebook_path)
    502 def load_notebook_node(notebook_path):
    503     """Returns a notebook object with papermill metadata loaded from the specified path.
    504 
    505     Args:
   (...)
    510 
    511     """
--> 512     nb = nbformat.reads(papermill_io.read(notebook_path), as_version=4)
    513     nb_upgraded = nbformat.v4.upgrade(nb)
    514     if nb_upgraded is not None:

File /usr/local/lib/python3.10/dist-packages/nbformat/__init__.py:91, in reads(s, as_version, capture_validation_error, **kwargs)
     89 nb = reader.reads(s, **kwargs)
     90 if as_version is not NO_CONVERT:
---> 91     nb = convert(nb, as_version)
     92 try:
     93     validate(nb)

File /usr/local/lib/python3.10/dist-packages/nbformat/converter.py:62, in convert(nb, to_version)
     60 except AttributeError as e:
     61     msg = f"Notebook could not be converted from version {version} to version {step_version} because it's missing a key: {e}"
---> 62     raise ValidationError(msg) from None
     64 # Recursively convert until target version is reached.
     65 return convert(converted, to_version)

ValidationError: Notebook could not be converted from version 1 to version 2 because it's missing a key: cells

When looking into the code, we can see the HTTP handler way of working, which is getting the all response content:

image

Which gives:

{
   "name":"notebook1.ipynb",
   "path":"notebook1.ipynb",
   "last_modified":"2023-07-12T11:43:37.265003Z",
   "created":"2023-07-12T11:43:37.265003Z",
   "content":{
      "cells":[
         {
            "cell_type":"markdown",
            "id":"e0882b67",
            "metadata":{
               
            },
            "source":"# My title\n\n## My subtitle\n\nHello world!"
         },
         {
            "cell_type":"code",
            "execution_count":1,
            "id":"e92789a6",
            "metadata":{
               "tags":[
                  "parameters"
               ],
               "trusted":true
            },
            "outputs":[
               
            ],
            "source":"var1 = 3\nvar2 = 5"
         },
         {
            "cell_type":"code",
            "execution_count":2,
            "id":"d49d5a2b",
            "metadata":{
               "trusted":true
            },
            "outputs":[
               {
                  "name":"stdout",
                  "output_type":"stream",
                  "text":"var1 is 3, var2 is 5\n"
               }
            ],
            "source":"print(\"var1 is {0}, var2 is {1}\".format(var1,var2))"
         }
      ],
      "metadata":{
         "celltoolbar":"Tags",
         "kernelspec":{
            "display_name":"Python 3 (ipykernel)",
            "language":"python",
            "name":"python3"
         },
         "language_info":{
            "codemirror_mode":{
               "name":"ipython",
               "version":3
            },
            "file_extension":".py",
            "mimetype":"text/x-python",
            "name":"python",
            "nbconvert_exporter":"python",
            "pygments_lexer":"ipython3",
            "version":"3.10.6"
         }
      },
      "nbformat":4,
      "nbformat_minor":5
   },
   "format":"json",
   "mimetype":"None",
   "size":1188,
   "writable":true,
   "type":"notebook"
}

As you can notice, the nbformat variable is set to 4 but papermill found out that it was 1 (default value).

This assumption is coming from here (under the library nbformat which is reading the notebook):

image

As you can see, the version is taken from the root node "nbformat" instead of "content.nbformat" which is causing the issue.

Do you know if this a bug on your side or on the nbformat library maybe ? I tested it with a LocalHandler and it's working fine as the output is:

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e0882b67",
   "metadata": {},
   "source": [
    "# My title\n",
    "\n",
    "## My subtitle\n",
    "\n",
    "Hello world!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e92789a6",
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "var1 = 3\n",
    "var2 = 5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "d49d5a2b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "var1 is 3, var2 is 5\n"
     ]
    }
   ],
   "source": [
    "print(\"var1 is {0}, var2 is {1}\".format(var1,var2))"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

A solution could be to load the JSON answer and get the "content" node before returning the result in the HTTP handler

Thank you

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions