Skip to content

Implement Typed Documents and TypeRegistry #282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 38 commits into
base: decaf
Choose a base branch
from
Open

Conversation

jterapin
Copy link
Contributor

@jterapin jterapin commented Mar 6, 2025

Description: Implementation for Typed Documents and TypeRegistry. Currently only supports JSON documents.

Customer Experience

Customer can instantiate a Document by using standard Ruby objects as its foundation:

string_doc = Smithy::Schema::Document.new('Hello World!')
string_doc.data
=> "Hello World!"

map_doc = Smithy::Schema::Document.new({ :name => 'Hamburger' })
map_doc.data
=> { "name" => "Hamburger" }

Customers can instantiate a Document with a runtime shape and its schema:

# given the following runtime shape:
typed_bird = BirdService::Types::Bird.new(name: 'Pidgey')
=> #<struct BirdService::Types::Bird name="Pidgey">

# provide runtime shape and its schema
bird_doc = Document.new(typed_bird, BirdService::Schema::Bird).data
=> { "name" => "Pidegey" }

When a runtime shape is used to create a Document, we capture the absolute shape id from its schema.

bird_doc.discriminator
=> "example.com#Bird"

Customers can reconstruct the runtime shape of a document by using as_typed method with a shape schema:

bird_doc.as_typed(BirdService::Schema::Bird)
=> #<struct BirdService::Types::Bird name="Pidgey">

Customers can use the TypeRegistry to find schema based on shape id:

BirdService::Schema::TypeRegistry.schema_by_id('somebird')
=> #<BirdService::Schema::Bird>

Remaining ToDos

  • Add RBS support for public interfaces ONLY
  • Add more Documentation (maybe)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@jterapin jterapin changed the title Implement Typed Documents and TypeRegistry [WIP] Implement Typed Documents and TypeRegistry Mar 6, 2025
@jterapin jterapin changed the title [WIP] Implement Typed Documents and TypeRegistry Implement Typed Documents and TypeRegistry Apr 15, 2025
end

opts = opts.except(:schema)
# case 1 - extract data from runtime shape, schema is required to know to properly extract
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'll remove it before merging.

module Smithy
module Schema
# A Smithy document type, representing typed or untyped data from Smithy data model.
# ## Document types
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could add more details (e.g. code examples) but I might punt that to the Smithy-Ruby Wiki instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in-code examples/documentation are preferred but it doesn't have to be done this moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - I'd agree examples and more detail would be better as in code documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add them now - since information is very fresh in my head.

Comment on lines 18 to 23
# @option options [Boolean] :use_timestamp_format Whether to use the `timestampFormat`
# trait or ignore it when creating a {Document} with given schema. The `timestampFormat`
# trait is ignored by default.
# @option options [Boolean] :use_json_name Whether to use the `jsonName` trait or ignore
# it when creating a {Document} with given schema. The `jsonName` trait is ignored
# by default.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was pretty torn on whether I want to have these options around. These options only applies when we create a document with a given schema to decide whether we want to honor specific traits in the @data

My idea is that once @data is set - we can simply do JSON.dump(document.data) to load the data for a request - in lieu of having to iterate over the document data to adhere to traits during SERDE process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are referring to the use_x options, I think they're pretty useless, and can always be added later. I'd start with them removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that there's test cases now, I might see what they ended up doing to determine whether i want to keep these options around.

@jterapin jterapin marked this pull request as ready for review April 15, 2025 03:48
@jterapin jterapin requested a review from mullermp April 15, 2025 03:49
Copy link
Contributor

@mullermp mullermp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put some comments to start with, didn't dig into any testing code.

# with the given typed shape.
# @param [Class] type
# @return [Boolean]
def schema_by_type?(type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is schema_by_type used or useful? Isn't this also a shape not a schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll update the method names to be aligned with the shapes references.

These method will be used in case #2 mentioned in this comment: https://github.com/smithy-lang/smithy-ruby/pull/282/files#r2027627236

When a customer decides to pass a runtime shape for the document param - we need a way to find its shape-id so we can load the discriminator for the service to find it in their type registry

This is a proposed solution instead of having runtime shapes know its own schema.

@@ -161,6 +161,16 @@ def service=(service)
define_operation_methods
end

# @return [Schema::TypeRegistry]
def type_registry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intention of placing this on base and why is it needed here? This will add a method to clients. I think we should bias to make this feature for schema gems only for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all gems should have the type register available - though I'm not sure it belongs as a method on client class - maybe as a method on Schema module, eg: Schema.type_registery

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need access to TypeRegistry during SERDE process - when it comes to deserializing typed documents. I haven't quite figured out where would be a good "home" for this.

I'm thinking of few options...
1 - Update Smithy::Client::HandlerContext to have access to the type registry (when deserializing typed documents - we can find the associated shape) - like: context.type_registry
2 - Keep the reference on the client so we don't need to add another attribute on the HandlerContext - we could do: context.client.type_registry to access during SERDE process
3 - Should ServiceShape have an attribute called type_registry so we could do: client.service.type_registry for easily access?
3 - Just omit the reference for now until we need it. If you need it for whatever reason, you gotta use this constant: BirdService::Schema.TYPE_REGISTRY

If we decide to omit this now, when do we need it?
Any JSON-esque protocol since they support JSON docs

What do y'all think?

let(:typed_shapes) do
fixture['shapes'].select do |_k, v|
%w[union structure].include?(v['type']) &&
!v['traits']&.include?('smithy.api#trait')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have any union/structure wouldn't it?

return if data.nil?

case data
when Smithy::Schema::Structure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already "done"? If it's a structure then it's already typed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on that? what do you mean by being "done" - I still want to re-format the typed shape into a document format.

Copy link
Contributor

@alextwoods alextwoods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - looking good overall!

@@ -161,6 +161,16 @@ def service=(service)
define_operation_methods
end

# @return [Schema::TypeRegistry]
def type_registry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all gems should have the type register available - though I'm not sure it belongs as a method on client class - maybe as a method on Schema module, eg: Schema.type_registery

module Smithy
module Schema
# A Smithy document type, representing typed or untyped data from Smithy data model.
# ## Document types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - I'd agree examples and more detail would be better as in code documentation.

case data
when Smithy::Schema::Structure
schema = opts[:schema]
if schema.nil? || !valid_schema?(schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does still seem weird to me that the run time type doesn't know its own schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to have that information at hand but we had some previous discussion on whether this should be a thing or not - based on context of typed shapes being PORO . I don't have a strong opinion but I think starting out without is fine until we have a use case where we absolutely need them to.

Copy link
Contributor Author

@jterapin jterapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responding back to comments first before I start revising. Thanks!

@@ -161,6 +161,16 @@ def service=(service)
define_operation_methods
end

# @return [Schema::TypeRegistry]
def type_registry
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need access to TypeRegistry during SERDE process - when it comes to deserializing typed documents. I haven't quite figured out where would be a good "home" for this.

I'm thinking of few options...
1 - Update Smithy::Client::HandlerContext to have access to the type registry (when deserializing typed documents - we can find the associated shape) - like: context.type_registry
2 - Keep the reference on the client so we don't need to add another attribute on the HandlerContext - we could do: context.client.type_registry to access during SERDE process
3 - Should ServiceShape have an attribute called type_registry so we could do: client.service.type_registry for easily access?
3 - Just omit the reference for now until we need it. If you need it for whatever reason, you gotta use this constant: BirdService::Schema.TYPE_REGISTRY

If we decide to omit this now, when do we need it?
Any JSON-esque protocol since they support JSON docs

What do y'all think?

module Smithy
module Schema
# A Smithy document type, representing typed or untyped data from Smithy data model.
# ## Document types
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add them now - since information is very fresh in my head.

Comment on lines 18 to 23
# @option options [Boolean] :use_timestamp_format Whether to use the `timestampFormat`
# trait or ignore it when creating a {Document} with given schema. The `timestampFormat`
# trait is ignored by default.
# @option options [Boolean] :use_json_name Whether to use the `jsonName` trait or ignore
# it when creating a {Document} with given schema. The `jsonName` trait is ignored
# by default.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that there's test cases now, I might see what they ended up doing to determine whether i want to keep these options around.

# with the given typed shape.
# @param [Class] type
# @return [Boolean]
def schema_by_type?(type)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll update the method names to be aligned with the shapes references.

These method will be used in case #2 mentioned in this comment: https://github.com/smithy-lang/smithy-ruby/pull/282/files#r2027627236

When a customer decides to pass a runtime shape for the document param - we need a way to find its shape-id so we can load the discriminator for the service to find it in their type registry

This is a proposed solution instead of having runtime shapes know its own schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants