Include MTPT endpoint creation in DatabricksLM#151
Conversation
f61faf0 to
43d0a8e
Compare
| print(predict(q="why did a chicken cross the kitchen?")) | ||
| ``` | ||
| """ | ||
| if not model.startswith("databricks/"): |
There was a problem hiding this comment.
would it be possible to also check if we're in a databricks notebook env and log a warning / exit if not?
There was a problem hiding this comment.
yes good call!
It's quite a hassle to validate Databricks notebook environment though: https://github.com/mlflow/mlflow/blob/a0e03e1004989740f10b101bd91582fcda733749/mlflow/utils/databricks_utils.py#L184, which cross references about 8 methods. So I am taking a workaround to use a try-except block to raise an error message to prompt users to use databricks notebookm which should be safer and more generic.
bbqiu
left a comment
There was a problem hiding this comment.
while unit tests don't really do much to test functionality in this case, would it be possible to add some to at least codify some behavior?
ex. create_pt_endpoint would create the a PT endpoint w/ the right name or tear down will delete the right endpoint
|
@bbqiu Yea definitely thought about it, but it's actually a no-op testing because we are just verifying "we can successfully assign a mocked endpoint to self._endpoint", and the So basically these unit tests won't give us any confidence that the code works. For that purpose, I am working on some actual tests, which will be shipped here: https://github.com/databricks-eng/ai-oss-integration-tests-runner. |
bbqiu
left a comment
There was a problem hiding this comment.
sounds good, LGTM once you add a warning logging + resolve test errors. thank you chen!
DSPy sometimes require a high throughput from the endpoint, especially when optimizer/evaluator is in use. Serving team is suggesting using MTPT (multi tenant provisioned throughput) to resolve the issue, and this PR introduces a program-based way to spin up the endpoint.
Testing is a bit tricky, basically unit test + mock literally tests nothing, so i am fully relying on manual integration testing.
Limitation: this endpoint creation must happen in Databricks platform, e.g., notebook, jobs and so on, because otherwise there is no cluster attached. I can technically add cluster creation code into the implementation, but so far i don't think that's worth the mess and risk that users have unintended cluster running for a long time. Most users, based on our conversation, runs their DSPy code on Databricks notebook, so this PR should be sufficient for a long term, if not forever.
Sample code: