Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Website #2653

Open
wants to merge 99 commits into
base: main
Choose a base branch
from
Open

Upgrade Website #2653

wants to merge 99 commits into from

Conversation

CristianLara
Copy link
Contributor

@CristianLara CristianLara commented Dec 16, 2024

Motivation

We are already using most of the correct tooling but it has become very out of date and is tied together with scripts that are difficult to maintain. Upgrade our tooling by updating to latest versions and dramatically simplifying our setup/deploy scripts.

Includes:

  • Upgrade Docusaurus from V1 to V3
  • Utilize new MDX support in Docusaurus by converting our tutorials to MDX (previously converted to html)
  • Tutorials link to Colab for easy execution
  • Website versioning is now handled natively through Docusaurus
  • New website versions are only created for major and minor releases (not patches)
  • Build and Deploy workflows have been dramatically simplified
  • Change Sphinx theme to ReadTheDocs
  • API Reference is still built using Sphinx but is now hosted by ReadTheDocs. ReadTheDocs also handles versioning.
  • Legacy website will be frozen and available at archive.botorch.org
    • Served by Vercel from the archive branch
  • Website now supports dark mode
Before After
image image
image image
image image
image image

The commits for most of the major changes are grouped into themed PRs on my fork if that makes the review easier: https://github.com/CristianLara/botorch/pulls?q=is%3Apr+is%3Aclosed

Considerations

  • Our tutorials are natively embedded and versioned in the website using Docusaurus, but our sphinx-generated API reference is not. ReadTheDocs will create new version of the sphinx docs for every new release tag in our github repo.
  • The main branch will now contain copies of the website documentation for every version. This includes the converted tutorial MDX but not the original ipynb files, those are accessible linked to from github using release tags.
  • Continue using same integrations for Algolia Search and Google Analytics
  • The number and type of github workflows we have remain the same, but the complexity of the workflows and the supporting scripts has been simplified.

Test Plan

I moved the old website directory from `website/` to `website-old/` and ran `npx @docusaurus/migrate migrate website-old website`

Various things are still broken, fixes coming in next commits
The migration script completely ignored our previous landing page. Here we migrate it ourselves manually and drop most of our custom styling in favor of the new docusaurus styling.
These are not yet configured for this fork
I noticed this hidden docusaurus folder is new after the upgrade
`yarn upgrade @docusaurus/core@latest ││ @docusaurus/preset-classic@latest`

And then manually upgrading other dependencies.

Docusaurus v3 by default interprets all markdown files as mdx. Our md docs are not compatible with mdx yet and require some hand editing to fix. I don
t think we're planning to use mdx in our manually handwritten md docs so I've added an option to the docusaurus config to treat .md files as pure markdown.
This controls the theme for codeblocks. The specified themes are closer to the colors used in the old website
The old `<!--DOCUSAURUS_CODE_TABS-->` comments are deprecated, use the new `<Tabs>` component and convert this .md file to .mdx to utilize it.
Remove unused tags and components, improve styles
Based on old footer
Created this project under my personal algolia account specifically for the version of this site that will be hosted under my personal website.
Docusaurus has moved from mathjax to katex. We also make some small tweaks to some of the math in our md to match updated synatx for katex. I spot checked each of our md docs pages to find offenders.
This was heavily modified as a result of the docusaurus upgrade
Dramatically simplify the process of deploying and versioning. We remove the shell script and perform the fewer steps from within github actions.

A big change here is the move to using docusaurus's built-in versioning system. Before, we were abusing docusaurus's system to take a snapshot of the entire website. For now we are only handling our md docs, but in future commits we will handle versioning tutorials and sphinx.
This is necessary to host from my personal website.
[Actions] Update publish website workflow
The Ax and Botorch repos have this same script but it's named slightly differently T_T
Provides a consistent structure such that notebooks are always nested within a directory regardless of whether the notebook has additional data/assets.
This is a direct copy/paste from the Ax fork. https://github.com/CristianLara/Ax/blob/15242657d3fe19f5655566c881c4156a0b64591e/scripts/convert_ipynb_to_mdx.py

It doesn't fully work with the tutorials in this repo yet, fixes in the next commits
The automated migration script left this behind
The automated migration script didn't update this reference
KaTeX requires these to be wrapped by $$ to be parsed as math.

We also update our regex for unescaping braces in math to support math wrapped by double dollar signs and containing newlines.
<br> tags are normal in ipynb but fail mdx parsing.
- \begin{}...\end{} blocks don't necessarily have to be {align} blocks. Update the regex to handle any kind. But make sure that the end and begin blocks are of the same type to handle nested blocks.
- Make sure $$ symbols are not escaped and include line breaks before and after. I encountered a few examples in our notebooks that didn't match this formatting expected by mdx.
This page is not autogenerated so it's the only file in the docs/tutorials/ directory that we track in github.

I manually converted this to MDX from our old page which was a react component because files in the docs/ directory must be either md or mdx.
github-actions bot and others added 8 commits December 16, 2024 18:10
The tutorial structure was modified to have each one in a subdirectory. This provides a more consistent file structure when there are tutorials that contain supporting data such as plots.
These were created using the deploy on release workflow in github to test the versioned deploy functionality.
sphinx-rtd-theme v3.0.0 supports sphinx v8

This comment was marked as resolved.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Dec 16, 2024
@facebook-github-bot
Copy link
Contributor

@CristianLara has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link

codecov bot commented Dec 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.98%. Comparing base (aef25d7) to head (9315052).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2653   +/-   ##
=======================================
  Coverage   99.98%   99.98%           
=======================================
  Files         200      200           
  Lines       18157    18157           
=======================================
  Hits        18155    18155           
  Misses          2        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This comment was marked as outdated.

@facebook-github-bot
Copy link
Contributor

@CristianLara has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

.github/workflows/publish_website.yml Outdated Show resolved Hide resolved

git add --all
git commit -m "Create version ${{ inputs.new_version }} of site in Docusaurus"
git push --force origin HEAD:main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this bot pushing to? main branch? If so, that'll get blocked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct it pushes to the main branch... and you're also right that the GH bot respects branch protection rules. Thanks for pointing this out, I didn't have branch protections in my fork.

There's a discussion about this with some workarounds here: https://github.com/orgs/community/discussions/25305#discussioncomment-10728028

Copy link
Contributor

@saitcakmak saitcakmak Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to be pushing to the main branch to publish the website? Previously, we would've simply pushed the updated website to gh-pages branch, which would automatically trigger the deploy action there. Would changing this to push to gh-pages work here (gh-pages is not protected)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blessed Docusaurus way of versioning is to make versioned copies of the docs directly in the main branch. Running yarn build then builds all the versions together such that all versions are properly aware of each other.

Previously we were handling our own versioning by always building a single version of the website (two actually if latest and vX.Y.Z for the different path handling), merging with the previously built versions in gh-pages, and performing quite a bit of surgery so that previous versions can point to the new one.

I was initially concerned that storing all versions in the main branch would result in a lot of bloat and disk space, but I don't think this will be an issue because:

  1. We aren't storing duplicate ipynb files. We're essentially converting them to markdown files which are very small. We still provide download links to the ipynb files by linking directly to the tagged/release version stored in github history.
  2. Going forward we will no longer create new versions of the website for patch releases, only major and minor releases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some concrete numbers: in commit f13759a I have two test versions of the botorch website and each copy of the versioned docs is around 5MB

(venv) ➜  website git:(f13759a0b) du -hs versioned_docs/*                          git:(f13759a0b|⚑3) 
4.9M    versioned_docs/version-v0.13.0
4.9M    versioned_docs/version-v0.14.0

(venv) ➜  website git:(f13759a0b) du -hs versioned_sidebars/*                      git:(f13759a0b|⚑3) 
8.0K    versioned_sidebars/version-v0.13.0-sidebars.json
8.0K    versioned_sidebars/version-v0.14.0-sidebars.json

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that merging a PR to main in OSS without importing to fbcode is either not possible or will cause problems?

Yes, I believe this will cause issues with ShipIt since the repos won't be in sync. Unless there is a way to exclude some of the files/directories in the GitHub repo, but I'm not sure that's possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I've read through the ShipIt documentation and understand the issue now.

Something we can do is have a branch docusaurus-versions that is only for keeping track of docusaurus versions. It would be a fork of main but include autogenerated docusaurus versioned files. The behavior would be:

  1. We always build the website from the docusaurus-versions branch
  2. Before building the website we always sync docusaurus-versions with main
  3. For new releases our github workflow will create the new docusaurus version in the docusaurus-versions branch

The main branch would then only be responsible for tracking the latest documentation, and our github workflows can continue to automate the release process from the docusaurus-versions branch without touching main.

I'll make a PR in my fork to test this out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented my proposal above. There is now a release branch docusaurus-versions in my fork which is used by our workflows to deploy the website and keep track of docusaurus versions. If this approach is accepted I'll create this branch for this repo before merging.

47a42f2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example of the updated workflow running to deploy a test version in my Ax fork using the release branch: https://github.com/CristianLara/Ax/actions/runs/12397040263/job/34606475835

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the bot to commit to a separate docusaurus-versions branch seems reasonable to me.

Re the file size issues: I think I mixed together parts of the old setup with the new setup in my head. If we're only adding a commit for the new website versions but not for the nightly builds, this shouldn't be an issue.

Comment on lines 44 to 45
git config --global user.name "github-actions[bot]"
git config --global user.email "41898282+github-actions[bot]@users.noreply.github.com"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this bot require access to some secrets? How does it get permissions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@CristianLara CristianLara Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the solution I choose for #2653 (comment) I may use a different approach for auth though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the GH actions bot in this way for creating a commit is an example provided by the official actions/checkout documentation: https://github.com/actions/checkout?tab=readme-ov-file#push-a-commit-to-a-pr-using-the-built-in-token

Comment on lines 59 to 62
uses: JamesIves/github-pages-deploy-action@releases/v4
with:
branch: gh-pages # The branch the action should deploy to.
folder: website/build # The folder the action should deploy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a third party action necessary for deployment? Is this to avoid directly pushing to the gh-pages branch (though we seem to be pushing somewhere above)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not necessary. There's actually a more official one that I can switch to: https://github.com/actions/deploy-pages

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same action that's triggered when we push to gh-pages branch, right? If so, that'd be my preferred solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, when we push to the gh-pages branch github will build an artifact from that repo and deploy it using the https://github.com/actions/deploy-pages action

Here's a recent gh-pages deploy in this repo that shows this action being used: https://github.com/pytorch/botorch/actions/runs/12371594841/job/34528079206

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly deploying the build artifact (instead of committing it to gh-pages branch) is also what the OSS team recommended after reviewing our workflow.

I'll update this to to use https://github.com/actions/upload-artifact + https://github.com/actions/deploy-pages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 99196eb. See the PR for test plan details: CristianLara#10.

tl;dr We will now deploy to github pages directly via the official github actions by directly deploying the build output. The gh-pages branch is no longer used at all and can be deleted.

Copy link
Contributor Author

@CristianLara CristianLara Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to modify my fork's GH Pages settings to build from github actions instead of build from branch, will need to make the same settings change in this main repo after merging. [edit] Added this step to my release plan

.readthedocs.yaml Show resolved Hide resolved
scripts/convert_ipynb_to_mdx.py Outdated Show resolved Hide resolved
).resolve()
print(f"{path.stem}")
mdx = transform_notebook(path, metadata)
print("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we still need 1000 lines of custom conversion code for the notebooks :( We can't use a third party tool like nbconvert to avoid the need for maintaining our own scripts for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docusaurus only natively accepts MD and MDX for docs. There's a hacky way to force html into the page and that's what we were previously doing but it's much less flexible and doesn't get picked up by docusaurus's sidebar or versioning system.

There is an MD output option in nbconvert but we would lose the interactive plots and html/react features of MDX. It might be possible to shave down some of this script by pre-processing using nbconvert but I think the loss of ipynb metadata might actually make some parts more difficult to parse/convert.

This file is mostly a copy of a script written by the BeanMachine team so there may be pieces of it that are not relevant to our tutorials and can trim. Here's the original: https://github.com/facebookresearch/beanmachine/blob/main/website/scripts/convert_ipynb_to_mdx.py

And here's the PR where I originally added support for it in the Ax repo first with the most context: CristianLara/Ax#13

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely the most complex part of the new website and I also would prefer to avoid it if there is a better alternative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is mostly a copy of a script written by the BeanMachine team so there may be pieces of it that are not relevant to our tutorials and can trim

Yeah at the very least we'd want to cut this down to only the parts that are essential to our setup.

CristianLara and others added 8 commits December 16, 2024 18:27
This was introduced for testing purposed. We can still manually trigger the workflows from the github UI
This is the preferred method of deploying a website to gh-pages since it won't blow up the webiste build git history. These are also the exact actions and steps that github itself uses when we push to the gh-pages branch.
[Actions] Deploy website using artifact instead of commit
This is not used in our repositories. Remove it to simplify the script.
The botorch tutorials are currently configured to output plots as images but our conversion script supports plotly. Let's enable plotly support to keep the conversion consistent between the botorch and ax repos since they use the same script.

We can update the tutorials to output plotly in the future.
For a variety of reasons we cannot have the github actions bot commit changes directly to the `main`
branch, but we still want the process of creating new docusaurus versions to be automated in our
release process.

The approach here is to have a release branch called `docusaurus-versions` which will contain all
of the docusaurus data for versioning. The behavior is:
    1. We always build the website from the docusaurus-versions branch
    2. Before building the website we always sync docusaurus-versions with main
    3. For new releases our github workflow will create the new docusaurus version in the docusaurus-versions branch

The main branch would then only be responsible for tracking the latest documentation, and our
github workflows can continue to automate the release process from the docusaurus-versions branch without touching main.
Docusaurus versions will be tracked in the release branch `docusaurus-versions`
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only 3.10?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to pick one version to build the website with. Do you prefer a different version, like 3.12?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, might as well use a newer version, o/w we'd have to update it soon anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came from the existing actions which all use 3.10. I've updated this action and all the others to use 3.12 instead: 9315052

python: "3.10"
jobs:
post_install:
# Install latest botorch if not on a released version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we set ALLOW_LATEST_GPYTORCH_LINOP = true here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sait had the same question 🙂 answered here: #2653 (comment)

The gist is that readthedocs does not have a way for us to programmatically set an env variable to be consumed by the install step in their CI. My solution here was to install pinned deps first then conditionally install the latest versions as a post_install step.

fetch-depth: 0
fetch-tags: true
ref: ${{ github.sha }}
- name: Check if major or minor version changed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would that not be the case? This workflow is executed upon creating a GitHub release, so this should involve updating at least the minor version? Or is the idea that there may be some post-release update bugfix or the like?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the patch versions. v.. The idea is to not release website for the patch versions (primarily for Ax, since we'll follow semver there after 1.0), since these are not intended to include any new features, but only minimal bug / documentation fixes.

DOCS_DIR = LIB_DIR.joinpath("docs")
TUTORIALS_DIR = DOCS_DIR.joinpath("tutorials")

ORGANIZATION = "cristianlara"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to change this to ORGANIZATION = "pytorch" before merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants