Skip to content

[serve.llm] Prefix-aware scheduler [1/N] Adding Prefix-aware tree data structure #52747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jujipotle
Copy link
Contributor

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

jujipotle added 5 commits May 2, 2025 10:35
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
Signed-off-by: Justin Ji <[email protected]>
@kouroshHakha kouroshHakha added the go add ONLY when ready to merge, run all tests label May 4, 2025
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Please add comprehensive descriptions to your PR. Example Left a few comments.

break
return i

# def insert(self, text: str, tenant: str) -> Node:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the commented codes?



@serve.deployment(name="TreeDeployment")
class PrefixTree:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this again, I don't think this PrefixTree needs to be a serve deployment since we are not planning on scaling it. It is a single actor shared between many processes basically. We can make it a remote actor that is accessible by name via .get_actor method. This will also reduce the remote call overhead I think because there would not be the extra layers of replica scheduler, etc for this deployment

def to_string(self) -> str:
return f"PrefixTree(root={self.root.__str__()}, tenants={self.tenants}, tenant_char_count={self.tenant_char_count}, tenant_nodes={self.tenant_nodes})"

@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asked AI to optimize this code. The suggestion was:

@staticmethod
def shared_prefix_count(a: str, b: str) -> int:
    """Count the number of shared characters at the beginning of two strings."""
    return len(os.path.commonprefix([a, b]))

This uses C under the hood and does not have the python overhead. So in super long strings of A and B should be faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also make this a private function.

def __init__(self) -> None:
self.lock: RLock = RLock()
self.root: Node = Node()
self.tenants: Set[str] = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment on what each one of these .tenants, . tenant_char_count, tenant_nodes mean?

Node in a prefix tree that tracks tenant access time.

Each node represents a segment of text and can belong to multiple tenants.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show example of a tree representation with this node structure?

Comment on lines +208 to +209
# # Update tenant last access time
# matched_node.tenant_last_access_time[tenant] = timestamp_ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's up with this commented code?

self.tenant_char_count.pop(tenant, None)
return total_chars_removed

def remove_tenant_single_node(self, tenant: str, node: Node) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a private function. rename to _remove_tenant_single_node

)

# Sort nodes by last access time (oldest first)
nodes_to_evict = sorted(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a heap so that you don't have to do this sorting ?


return total_chars_removed

def get_smallest_tenant(self) -> Optional[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again asked AI to rewrite this in a better way, looks better and more readable (rule of thumb: you should be allergic to seeing hardcoded indices (e.g. foo[0]) in python)

def get_smallest_tenant(self) -> Optional[str]:
    """Get the tenant with the smallest total character count."""
    with self.lock:
        return min(self.tenant_char_count, key=self.tenant_char_count.get, default=None)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I let @GeneDer and @eicherseiji review the tests.

@kouroshHakha kouroshHakha changed the title [serve.llm] Add prefix tree class as precursor for prefix-aware scheduler [serve.llm] Prefix-aware scheduler [1/N] Adding Prefix-aware tree data structure May 4, 2025
@pytest.mark.asyncio
async def test_add_tenant():
"""Test adding tenants to the tree."""
tree = serve.run(PrefixTree.bind())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make PrefixTree a datastucture that for test purposes is not a deployment or ray actor. See how LLMServer is implemented for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants