Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add black formatter as pre-commit step #189

Merged
merged 5 commits into from
Feb 19, 2024

Conversation

maxjakob
Copy link
Contributor

@maxjakob maxjakob commented Feb 9, 2024

We want to ensure consistent formatting of all notebooks and code. Hence we check on each commit and in CI that everything was formatted using black.

I added a make target to apply the pre-commit hooks to all files so that users can fix formatting issues locally.

Copy link

gitnotebooks bot commented Feb 9, 2024

Found 12 changed notebooks. Review the changes at https://gitnotebooks.com/elastic/elasticsearch-labs/pull/189

@maxjakob
Copy link
Contributor Author

maxjakob commented Feb 9, 2024

I was really happy that we have nbtest to make sure the notebooks are still doing the same as before 💚

Copy link
Contributor Author

maxjakob commented Feb 9, 2024

Commented on notebook notebooks/document-chunking/with-index-pipelines.ipynb Cell 13 Line 8

# Setup the pipeline
CHUNK_SIZE = 400

client.ingest.put_pipeline(
    id="chunk_text_to_passages",
    processors=[
        {
            "script": {

@miguelgrinberg check it out, this is how GitNotebooks works. Let me know what you think. It's enabled for now but we can also remove it again if we decide against it

Respond and view the context here.

@maxjakob maxjakob force-pushed the formatter-pre-commit branch 2 times, most recently from 85ca59f to 28b83a9 Compare February 13, 2024 10:30
@maxjakob maxjakob force-pushed the formatter-pre-commit branch from 28b83a9 to 3121724 Compare February 13, 2024 10:49
@maxjakob
Copy link
Contributor Author

@miguelgrinberg We are running out of disk space in CI because of all the pip installs in the notebooks (not because of data in ES as I previously thought). This makes the virtual env directory get larger with every nbtest run. As a consequence, Elasticsearch stops accepting data and notebooks that run later start failing.

I also noticed that there is a bunch of unnecessary software on the GitHub runner taking up space, like the Android SDK (8GB) and other stuff. Somebody made an action that can remove these extras. This frees up the 8GB in ~1 minutes and resolves the issue.

But the maintainers admit that "This action is a hack, really", so I'm interested in your opinion on whether to keep it. The only alternative I see right now is to wipe the pip cache in between nbtest runs. And I would guess that will increase the runtime by more than the new action so my preference would probably be to keep it as in this PR.

@joemcelroy
Copy link
Member

keep to remove those extras, freeing up the 8gb space :)

@miguelgrinberg
Copy link
Collaborator

miguelgrinberg commented Feb 19, 2024

I vote against. Not a good idea to incorporate actions from unknown sources into our CI from a security point of view. We can manually delete anything directly in our own buld logic if necessary.

Would it be the same to add these directly into our build to avoid running 3rd party code?

            sudo rm -rf /usr/share/dotnet
            sudo rm -rf /usr/local/lib/android
            sudo rm -rf /opt/ghc
            sudo rm -rf /opt/hostedtoolcache/CodeQL

@maxjakob
Copy link
Contributor Author

That's fine for me. I guess the only advantage the action has is that if the directories move we don't have to adapt. It's unlikely and we would find out soon enough if we run out of disk space again.

@miguelgrinberg
Copy link
Collaborator

My guess is that if the directories move then the action would break in the same way our own deletions would, and then it would take some time for the author of the action to fix things up, so there would be down time either way. And doing it ourselves removes the risk of a future update to the 3rd party action doing bad things.

@maxjakob maxjakob merged commit 051e9e3 into elastic:main Feb 19, 2024
5 checks passed
@maxjakob maxjakob deleted the formatter-pre-commit branch February 19, 2024 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants