Skip to content

Commit 1eb5355

Browse files
CristianLarafacebook-github-bot
authored andcommitted
Fix images for Docusaurus (#3512)
Summary: # Images by path In the old tutorials we included images as base64 attachments but in the new ones we are specifying them via filepath. Relative filepaths in the notebook were breaking during the conversion process because the tutorial MDX file ends up in a different filesystem location while the image file was being left behind. We fix this by copying the image file to the tutorials docs location and updating the relative file path as appropriate. A half-baked implementation of this was already present in the script but was being bypassed. # Images as base64 attachments The old tutorials were including images using base64 attachments which were not properly supported by the conversion script. These image attachments are stored in the notebook cell's "attachments" field in base64 format with their associated mime_type and referenced in the markdown via attachment name. The pattern we search for in the Markdown is `![alt_text](attachment:attachment_name title)` with three groups: - group 1 = alt_text (optional) - group 2 = attachment_name - group 3 = title (optional) To represent this in MD we replace the attachment reference with the base64 encoded string as `![{alt_text}](data:{mime_type};base64,{img_as_base64})` This fix won't automatically propogate to the broken old tutorials, in a separate commit I'll fix them by updating the pre-built tutorial mdx stored in the `docusaurus-versions` branch Pull Request resolved: #3512 Reviewed By: mpolson64 Differential Revision: D71322665 Pulled By: CristianLara fbshipit-source-id: f25fe4c1fb057539f79e40eeebef3fa040b3f84d
1 parent 06b93fd commit 1eb5355

File tree

3 files changed

+85
-16
lines changed

3 files changed

+85
-16
lines changed

scripts/convert_ipynb_to_mdx.py

+84-15
Original file line numberDiff line numberDiff line change
@@ -196,10 +196,53 @@ def create_buttons(
196196
return f'<LinkButtons\n githubUrl="{github_url}"\n colabUrl="{colab_url}"\n/>\n\n'
197197

198198

199-
def handle_images_found_in_markdown(
199+
def handle_image_attachments(
200+
markdown: str,
201+
attachments: dict[str, dict[str, str]],
202+
) -> str:
203+
"""
204+
Image attachments are stored in the notebook cell's "attachments" field in base64
205+
format with their associated mime_type and referenced in the markdown via
206+
attachment name.
207+
208+
The pattern we search for in the Markdown is
209+
`![alt_text](attachment:attachment_name title)` with three groups:
210+
211+
- group 1 = alt_text (optional)
212+
- group 2 = attachment_name
213+
- group 3 = title (optional)
214+
215+
To represent this in MD we replace the attachment reference with the base64 encoded
216+
string as `![{alt_text}](data:{mime_type};base64,{img_as_base64})`
217+
218+
Args:
219+
markdown (str): The markdown content containing image attachments.
220+
attachments (Dict[str, Dict[str, str]]): A dictionary of attachments with their
221+
corresponding MIME types and base64 encoded data.
222+
223+
Returns:
224+
str: The markdown content with images converted to base64 format.
225+
"""
226+
markdown_image_pattern = re.compile(
227+
r"""!\[([^\]]*)\]\(attachment:(.*?)(?=\"|\))(\".*\")?\)"""
228+
)
229+
searches = re.finditer(markdown_image_pattern, markdown)
230+
for search in searches:
231+
alt_text, attachment_name, _ = search.groups()
232+
mime_type, base64 = next(iter(attachments[attachment_name].items()))
233+
start, end = search.span()
234+
markdown = (
235+
markdown[:start]
236+
+ generate_img_base64_md(base64, mime_type, alt_text)
237+
+ markdown[end:]
238+
)
239+
return markdown
240+
241+
242+
def handle_image_paths_found_in_markdown(
200243
markdown: str,
201244
new_img_dir: Path,
202-
lib_dir: Path,
245+
nb_path: Path,
203246
) -> str:
204247
"""
205248
Update image paths in the Markdown, and copy the image to the docs location.
@@ -210,6 +253,9 @@ def handle_images_found_in_markdown(
210253
- group 1 = path/to/image.png
211254
- group 2 = "title"
212255
256+
We explicitly exclude matching if the path starts with `attachment:` as this
257+
indicates that the image is embedded as a base64 attachment not a file path.
258+
213259
The first group (the path to the image from the original notebook) will be replaced
214260
with ``assets/img/{name}`` where the name is `image.png` from the example above. The
215261
original image will also be copied to the new location
@@ -219,12 +265,15 @@ def handle_images_found_in_markdown(
219265
markdown (str): Markdown where we look for Markdown flavored images.
220266
new_img_dir (Path): Path where images are copied to for display in the
221267
MDX file.
222-
lib_dir (Path): The location for the Bean Machine repo.
268+
lib_dir (Path): The location for the repo.
269+
nb_path (Path): The location for the notebook.
223270
224271
Returns:
225272
str: The original Markdown with new paths for images.
226273
"""
227-
markdown_image_pattern = re.compile(r"""!\[[^\]]*\]\((.*?)(?=\"|\))(\".*\")?\)""")
274+
markdown_image_pattern = re.compile(
275+
r"""!\[[^\]]*\]\((?!attachment:)(.*?)(?=\"|\))(\".*\")?\)"""
276+
)
228277
searches = list(re.finditer(markdown_image_pattern, markdown))
229278

230279
# Return the given Markdown if no images are found.
@@ -250,11 +299,11 @@ def handle_images_found_in_markdown(
250299

251300
# Copy the original image to the new location.
252301
if old_path.exists():
302+
# resolves if an absolute path is used
253303
old_img_path = old_path
254304
else:
255-
# Here we assume the original image exists in the same directory as the
256-
# notebook, which should be in the tutorials folder of the library.
257-
old_img_path = (lib_dir / "tutorials" / old_path).resolve()
305+
# fall back to path relative to the notebook
306+
old_img_path = (nb_path.parent / old_path).resolve()
258307
new_img_path = str(new_img_dir / name)
259308
shutil.copy(str(old_img_path), new_img_path)
260309

@@ -359,7 +408,7 @@ def get_source(cell: NotebookNode) -> str:
359408
def handle_markdown_cell(
360409
cell: NotebookNode,
361410
new_img_dir: Path,
362-
lib_dir: Path,
411+
nb_path: Path,
363412
) -> str:
364413
"""
365414
Handle the given Jupyter Markdown cell and convert it to MDX.
@@ -368,17 +417,17 @@ def handle_markdown_cell(
368417
cell (NotebookNode): Jupyter Markdown cell object.
369418
new_img_dir (Path): Path where images are copied to for display in the
370419
Markdown cell.
371-
lib_dir (Path): The location for the Bean Machine library.
420+
lib_dir (Path): The location for the library.
421+
nb_path (Path): The location for the notebook.
372422
373423
Returns:
374424
str: Transformed Markdown object suitable for inclusion in MDX.
375425
"""
376426
markdown = get_source(cell)
377427

378-
# Update image paths in the Markdown and copy them to the Markdown tutorials folder.
379-
# Skip - Our images are base64 encoded, so we don't need to copy them to the docs
380-
# folder.
381-
# markdown = handle_images_found_in_markdown(markdown, new_img_dir, lib_dir)
428+
# Handle the different ways images are included in the Markdown.
429+
markdown = handle_image_paths_found_in_markdown(markdown, new_img_dir, nb_path)
430+
markdown = handle_image_attachments(markdown, cell.get("attachments", {}))
382431

383432
markdown = sanitize_mdx(markdown)
384433
mdx = mdformat.text(markdown, options={"wrap": 88}, extensions={"myst"})
@@ -411,6 +460,26 @@ def handle_cell_input(cell: NotebookNode, language: str) -> str:
411460
return f"```{language}\n{cell_source}\n```\n\n"
412461

413462

463+
def generate_img_base64_md(
464+
img_as_base64: int | str | NotebookNode,
465+
mime_type: int | str | NotebookNode,
466+
alt_text: str = "",
467+
) -> str:
468+
"""
469+
Generate a markdown image tag from a base64 encoded image.
470+
471+
Args:
472+
img_as_base64 (int | str | NotebookNode): The base64 encoded image data.
473+
mime_type (int | str | NotebookNode): The MIME type of the image.
474+
alt_text (str, optional): The alternative text for the image. Defaults to an
475+
empty string.
476+
477+
Returns:
478+
str: A markdown formatted image tag.
479+
"""
480+
return f"![{alt_text}](data:{mime_type};base64,{img_as_base64})"
481+
482+
414483
def handle_image(
415484
values: list[dict[str, int | str | NotebookNode]],
416485
) -> list[tuple[int, str]]:
@@ -431,7 +500,7 @@ def handle_image(
431500
index = value["index"]
432501
mime_type = value["mime_type"]
433502
img = value["data"]
434-
output.append((index, f"![](data:image/{mime_type};base64,{img})\n\n"))
503+
output.append((index, f"{generate_img_base64_md(img, mime_type)}\n\n"))
435504
return output
436505

437506

@@ -880,7 +949,7 @@ def transform_notebook(path: Path, nb_metadata: object) -> str:
880949

881950
# Handle a Markdown cell.
882951
if cell_type == "markdown":
883-
mdx += handle_markdown_cell(cell, img_folder, LIB_DIR)
952+
mdx += handle_markdown_cell(cell, img_folder, path)
884953

885954
# Handle a code cell.
886955
if cell_type == "code":

tutorials/closed_loop/closed_loop.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -460,7 +460,7 @@
460460
"\n",
461461
"Internally, Ax uses a class named `Scheduler` to orchestrate the trial deployment, polling, data fetching, and candidate generation.\n",
462462
"\n",
463-
"![Scheduler state machine](../../assets/scheduler_state_machine.png)\n",
463+
"![Scheduler state machine](scheduler_state_machine.png)\n",
464464
"\n",
465465
"The `OrchestrationConfig` provides users with control over various orchestration settings:\n",
466466
"* `parallelism` defines the maximum number of trials that may be run at once. If your external system supports multiple evaluations in parallel, increasing this number can significantly decrease experimentation time. However, it is important to note that as parallelism increases, optimiztion performance often decreases. This is because adaptive experimentation methods rely on previously observed data for candidate generation -- the more tirals that have been observed prior to generation of a new candidate, the more accurate Ax's model will be for generation of that candidate.\n",

0 commit comments

Comments
 (0)