-
Notifications
You must be signed in to change notification settings - Fork 6.3k
[serve.llm] Prefix-aware scheduler [1/N] Adding Prefix-aware tree data structure #52747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Please add comprehensive descriptions to your PR. Example Left a few comments.
break | ||
return i | ||
|
||
# def insert(self, text: str, tenant: str) -> Node: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the commented codes?
|
||
|
||
@serve.deployment(name="TreeDeployment") | ||
class PrefixTree: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this again, I don't think this PrefixTree needs to be a serve deployment since we are not planning on scaling it. It is a single actor shared between many processes basically. We can make it a remote actor that is accessible by name via .get_actor method. This will also reduce the remote call overhead I think because there would not be the extra layers of replica scheduler, etc for this deployment
def to_string(self) -> str: | ||
return f"PrefixTree(root={self.root.__str__()}, tenants={self.tenants}, tenant_char_count={self.tenant_char_count}, tenant_nodes={self.tenant_nodes})" | ||
|
||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asked AI to optimize this code. The suggestion was:
@staticmethod
def shared_prefix_count(a: str, b: str) -> int:
"""Count the number of shared characters at the beginning of two strings."""
return len(os.path.commonprefix([a, b]))
This uses C under the hood and does not have the python overhead. So in super long strings of A and B should be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also make this a private function.
def __init__(self) -> None: | ||
self.lock: RLock = RLock() | ||
self.root: Node = Node() | ||
self.tenants: Set[str] = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment on what each one of these .tenants
, . tenant_char_count
, tenant_nodes
mean?
Node in a prefix tree that tracks tenant access time. | ||
|
||
Each node represents a segment of text and can belong to multiple tenants. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you show example of a tree representation with this node structure?
# # Update tenant last access time | ||
# matched_node.tenant_last_access_time[tenant] = timestamp_ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's up with this commented code?
self.tenant_char_count.pop(tenant, None) | ||
return total_chars_removed | ||
|
||
def remove_tenant_single_node(self, tenant: str, node: Node) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a private function. rename to _remove_tenant_single_node
) | ||
|
||
# Sort nodes by last access time (oldest first) | ||
nodes_to_evict = sorted( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a heap so that you don't have to do this sorting ?
|
||
return total_chars_removed | ||
|
||
def get_smallest_tenant(self) -> Optional[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again asked AI to rewrite this in a better way, looks better and more readable (rule of thumb: you should be allergic to seeing hardcoded indices (e.g. foo[0]) in python)
def get_smallest_tenant(self) -> Optional[str]:
"""Get the tenant with the smallest total character count."""
with self.lock:
return min(self.tenant_char_count, key=self.tenant_char_count.get, default=None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I let @GeneDer and @eicherseiji review the tests.
@pytest.mark.asyncio | ||
async def test_add_tenant(): | ||
"""Test adding tenants to the tree.""" | ||
tree = serve.run(PrefixTree.bind()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make PrefixTree a datastucture that for test purposes is not a deployment or ray actor. See how LLMServer is implemented for example.
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.