Skip to content

Add few key points on the README + schema #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mehd-io
Copy link

@mehd-io mehd-io commented Feb 28, 2025

Thank you for creating smallpond! While exploring the codebase for a technical deep dive article, I noticed some key architectural aspects weren't featured in the README. This PR improves that to better showcase smallpond's capabilities.

Changes

  • Emphasized distributed processing
  • Added Ray integration details: Explicitly mentioned Ray's role in the distributed execution model
  • Clarified storage options: Specified support for local filesystem and 3FS
  • Added architecture diagram: Included a mermaid flowchart visualizing high-level architecture
  • Improved architecture description

Let me know your thoughts, happy to cut some parts.

@mehd-io mehd-io changed the title Feat/improve readme Add few key points on the README + schema Feb 28, 2025
@wangrunji0408
Copy link
Collaborator

Thanks for the PR and the detailed write-up! Here are some of my thoughts:

  • The architecture diagram is clear and informative, but we prefer to keep the README concise. It would be a better fit in the project documentation under an "Architecture" section.
  • Currently, S3 is not supported as a storage option. While the local filesystem is available for development and testing, production-level deployments rely on high-performance distributed file systems like 3FS.
  • Smallpond actually has two backends: the built-in scheduler and Ray. We're working on making it easier for users to choose between them, so we avoid emphasizing Ray as the primary dependency.

@mehd-io
Copy link
Author

mehd-io commented Mar 6, 2025

@wangrunji0408

Thanks for the PR and the detailed write-up! Here are some of my thoughts:

  • The architecture diagram is clear and informative, but we prefer to keep the README concise. It would be a better fit in the project documentation under an "Architecture" section.

Good point, moved to the docs.

  • Currently, S3 is not supported as a storage option. While the local filesystem is available for development and testing, production-level deployments rely on high-performance distributed file systems like 3FS.

Sorry, I overlooked that on my side. It’s fixed now.

  • Smallpond actually has two backends: the built-in scheduler and Ray. We're working on making it easier for users to choose between them, so we avoid emphasizing Ray as the primary dependency.

I understand, but I feel it’s okay to highlight Ray for now, as it is the primary target (and the easiest?) for people to use at a production level. When it's easier to use a built-in scheduler, we can add more docs on these two paths.

I also noticed that we didn’t reference the hosted docs later in the README, so I updated the URLs. I initially missed the “Docs” badge, and I assume most people will as well. To make it more visible, I highlighted it again at the top of the README. I also recommend adding it under the “About” section as a Website link.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants