-
Notifications
You must be signed in to change notification settings - Fork 3
Description
When we talk about Singer, it's important to be sure that we're talking about the same things. One way to accomplish that is to organize around language. This proposal is to define a known set of "buckets" that we can use to help us as a group organize what is a Spec change, and what is a standard. What follows is my gut feeling of how these divisions could work, and is by no means completely correct or complete. 😄
Without further ado, I'd like to propose...
The Layered Model of Singer
Looking at Singer, there are a lot of design choices baked in around a core value of simplicity. The reasoning for this has always been to give developers the freedom and flexibility to make it what they want, since all data sources are vastly different, and one cannot effectively design for all future cases in the ELT space.
As we discuss evolving Singer as a whole and as a community, it will be important to take care to not lose the core value of simplicity that has allowed the space for best practices to be invented like those encoded in the current existing frameworks/libraries.
Approaching the stack as a layered model can give us a means of aligning where an idea fits, and a tool iterate organically to "upgrade" concepts from a framework feature to a codified standard to a spec change if it makes sense.
Layer 1: Specification
This is the current specification as it stands, some principles of features here:
- Language agnostic and implementation independent
- Focus on the std-out portions of using Singer (serialization format, message types, required keys for messages, etc.)
Layer 2: Standards and Best Practices
These are being tracked in #10, but as far as the initial design decisions of Singer go, this conceptually includes things like Command-Line Arguments, Catalog, Metadata Keys/Custom Metadata, Standard State Keys, etc.
Some principles here:
- These are also language agnostic and implementation independent
- They help standardize the nitty-gritty to make writing frameworks and libraries more easy
- They strive to make code more portable, readable, and usable for users and devs alike.
Layer 3: LIbraries and Frameworks
This is where we get into the language specific stuff. Libraries like singer-python and/or singer-clojure or frameworks like the MeltanoSDK take the standards plus best practices and encode them in a way that makes sense for the patterns of each language. This is also a good place to be a test bed for things that might become standards.
Principles:
- Language specific
- Generic use cases
- These influence the way that code is written for their specific language
Layer 4: Tooling/Orchestration/UX/Infrastructure
I'm not quite sure about this one, but these are things that don't seem to fit in the other layers, and kind of make up an analog to the "Application Layer" of the OSI layered model. This layer is included to be a spot to hold things that are in use on a specific industry, use case, deployment method, etc., but not quite ready to be standardized.
[Aside:] This layer could use the most work, but it seemed worth including here. My gut says that it's likely harder to standardize these kinds of things, since it'll be where our orgs' respective product offerings fall into a lot of the time, and with that comes IP concerns, specifics for our target users (e.g., technical vs. non-technical), a specific slice of the industry, and/or a more narrow set of use cases. That said, tools like singer-discover would also fall here, and fit into a standardization conversation more easily.
That's what I currently have been kicking around for this idea, and am excited to get it out there for feedback, very curious about thoughts on the specific categories as well as whether this approach is a good idea. I'd like to eventually get this defined enough to make it into a SIP to officially propose a model. Thanks for checking it out! I appreciate all feedback 🚀