Factor executor code+history from marin-community/marin#3
Factor executor code+history from marin-community/marin#3ryan-williams wants to merge 307 commits intomainfrom
Conversation
# Conflicts: # tests/test_executor.py
# Conflicts: # .github/workflows/quickstart-test.yaml
…imPajama build (marin-community/marin#447) * wip * reduce the number of "shard groups" to something manageable, makes SlimPajama build * lint
* Local Fix * Local Fix
…arin#1364) * Initial Gemstone Scraping * Not Quite Working, But getting there * Only run evals of the cooled down WSD models (Too Many otherwise lol) * Final Version * International Corpus of English is Private * This just can't be dry run due to config loading
…running again (marin-community/marin#1442) * Refactor executor to run steps directly * working on apply autoscaler patch to ray 2.45 * fix deps for uv * tests? * wip * ok it runs! * fix build * stupid ray * oops * sigh * missed the lock * grr * grr2 * more uv run * no space?!? * tpu tests at least? * wtf du * more du * am faf * again * this? * grr * pr comments
…1479) * actually schedule statusactor on the headnode * Update marin/utilities/ray_utils.py Co-authored-by: William Held <Wbh230@nyu.edu> --------- Co-authored-by: William Held <Wbh230@nyu.edu>
…nity/marin#1490) * avoid race condition in executor * oops
…ty/marin#1453) * Use network_endpoints for expected TPU workers * Refactor executor to run steps directly * working on apply autoscaler patch to ray 2.45 * fix deps for uv * tests? * wip * ok it runs! * fix build * Refine TPU monitor * stupid ray * oops * sigh * missed the lock * grr * grr2 * more uv run * no space?!? * tpu tests at least? * wtf du * more du * am faf * again * this? * grr * background thread for monitor * simple main to try it out * auto-create * update docker and some deps * Add entrypoint resource options and TPU support to ray run * east 5, update clusterwqa * src layout? * src layout * tweaks to east5a cluster, enw image * make dockerfile work with uv and src layout * uv is causing problems * make uv want to install cpu torch (i think?) * increase max concurrent connections for setting up TPU nodes to 64 * wip * update cluster configs with latest image, hopefully faster tpu setup * wip * uv lock * wip * wip * remove tpu monitor for now * lock * no locked * sigh * i hate python * i hate python * lock * mkdocs * sigh * hrm, seems not ideal
812031d to
bb40edf
Compare
dlwh
left a comment
There was a problem hiding this comment.
lgtm. we hsould fix licenses but I'm having codex do a thing for Marin and we can port here
| @@ -0,0 +1,201 @@ | |||
| Apache License | |||
There was a problem hiding this comment.
given we're forking a chunk of Marin we should probably be sure to put Stanford on the copyright of Marin and make sure it's credited here.
There was a problem hiding this comment.
Good point, did my best to mimic/adapt Marin's copyright and AUTHORS.md in #4 (specifically 112e5f0). lmk (here or there) if that seems OK?
eric-czech
left a comment
There was a problem hiding this comment.
LGTM other than figuring out what to do with #3 (comment) and #3 (comment)
There was a problem hiding this comment.
I've improved the history-rewriting in #4, which I propose we shift focus to. I've linked this PR's 3 open threads over there, happy to continue discussing them here or there.
I responded on the JSON ser/de thread here/below; I partly made a new PR because force-pushing here would orphan that thread, which has good discussion.
alxmrs
left a comment
There was a problem hiding this comment.
I started reviewing this yesterday and then got side tracked. Here's what I've noticed so far and I'll continue the review now.
| ```bash | ||
| git clone https://github.com/Open-Athena/thalas.git | ||
| cd thalas | ||
| uv venv --python 3.11 |
There was a problem hiding this comment.
TIL. I always use uv sync + the .python-version file -- this is nice to test multiple versions.
There was a problem hiding this comment.
Interesting, yea I'm fumbling through how best to manage multiple venvs in a project, in uv world. I've used Pyenv for many years, which uses .python-version in a way that conflicts with uv. Open to suggestions.
There was a problem hiding this comment.
I'm surprised to hear that .python-version conflicts with uv! I use that feature all the time -- it seems to work for me. Maybe I misunderstand how you're using it? IIRC, when you uv init a new project it generates .python-version for you.
| uv venv --python 3.11 | ||
| source .venv/bin/activate | ||
| uv pip install -e .[dev] | ||
| pre-commit install |
There was a problem hiding this comment.
optional: you could use uvx to install the pre-commit hooks (without specifically having to install pre-commit locally):
| pre-commit install | |
| uvx pre-commit install |
There was a problem hiding this comment.
Interesting, afaict it's better to have the version pinned/managed in pyproject.toml, otherwise we're either not pinning, or have to specify version during each invocation?
alxmrs
left a comment
There was a problem hiding this comment.
LGTM. It's nice we can see all these sources in one PR.
ryan-williams
left a comment
There was a problem hiding this comment.
I believe I've addressed all comments over in #4, will close this now, but feel free to respond to existing threads here if it's easier and further discussion is warranted.
| ```bash | ||
| git clone https://github.com/Open-Athena/thalas.git | ||
| cd thalas | ||
| uv venv --python 3.11 |
There was a problem hiding this comment.
Interesting, yea I'm fumbling through how best to manage multiple venvs in a project, in uv world. I've used Pyenv for many years, which uses .python-version in a way that conflicts with uv. Open to suggestions.
| uv venv --python 3.11 | ||
| source .venv/bin/activate | ||
| uv pip install -e .[dev] | ||
| pre-commit install |
There was a problem hiding this comment.
Interesting, afaict it's better to have the version pinned/managed in pyproject.toml, otherwise we're either not pinning, or have to specify version during each invocation?
| @@ -0,0 +1,201 @@ | |||
| Apache License | |||
There was a problem hiding this comment.
Good point, did my best to mimic/adapt Marin's copyright and AUTHORS.md in #4 (specifically 112e5f0). lmk (here or there) if that seems OK?
ryan-williams
left a comment
There was a problem hiding this comment.
Addressing a couple more comments
Update: superseded by #4
filter-marin-executor.shto extract executor codegit-filter-repoon a Marin clone:#XXXtomarin-community/marin#XXXREADME.md)src/{marin,thalas}/, update importsskipa macOS-failing testmainmainSee gist for scripts and more info.