Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB-enhanced: Add support for polymorphic subtypes to mapper #2861

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bmaizels
Copy link
Contributor

Should resolve #1870

Motivation and Context

Will assist services that are attempting to implement single-table design in DynamoDB to be able to use the enhanced client to read entities of different subtypes (and schemata) with a single query.

Description

Changes are fully documented in README.md edit.

Testing

Unit tests and functional tests.

japicmp threw a false positive as two new methods were added to a public interface, but a default implementation was provided for both of them. I attempted to update the version of japicmp to 15.4, and that enabled this module to pass, but a different module then flagged the same rule. After discussion with @millems decision was made to disable the rule globally for now.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read the CONTRIBUTING document
  • Local run of mvn install succeeds
  • My code follows the code style of this project
  • My change requires a change to the Javadoc documentation
  • I have updated the Javadoc documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed
  • A short description of the change has been added to the CHANGELOG
  • My change is to implement 1.11 parity feature and I have updated LaunchChangelog

License

  • I confirm that this pull request can be released under the Apache 2 license

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 16 Code Smells

95.0% 95.0% Coverage
0.2% 0.2% Duplication

@DynamoDbSubtypes({
@Subtype(name = "CUSTOMER", subtypeClass = Customer.class),
@Subtype(name = "ORDER", subtypeClass = Order.class)})
public class CustomerRelatedEntity {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be abstract and/or the annotation enforce that ( I see its abstract in code below)


Class<?> subtypeClass();
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing new line :)

@@ -253,6 +254,85 @@ how Lombok's 'onMethod' feature is leveraged to copy the attribute based DynamoD
}
```

### Using subtypes to assist with single-table design

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to highlight here a sample multi-schema table for matching the proposed annotations

pk sk ...
customer Marta ...
order pizza ...

Comment on lines +33 to +34
public @interface DynamoDbSubtypeName {
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly the proposal, it expects dynamoDB to have actual subtype in the db record, so we use this annotation to map one of the colums to the class field. But I think that might not always be the use case. One can achieve multi-table design without having a type like specific field in dynamo.
Consider the following dynamoDB schema using composite keys:

partitionKey sortKey someRandomColumn customerId orderid
US CUSTOMER#customerId1 "John" customerId1
US ORDER#customer1#001 "Great Item" customerId1 001
US ORDER#customer1#002 "Another Item" customerId1 002
CA CUSTOMER#customerId5 "Maria" customerId5
CA ORDER#customer5#001 "Amazing Item" customerId5 001

Where partitionKey is always country, and sort key is customer id for Customer and composite for Order with a ORDER# prefix followed by customerid, and the actual order id.

In this example we shouldn't need to have any type in dynamoDB java class, but be able to infer it from the sortKey composition.

I completely understand that this is a P1 maybe, but it might be easy to leave the door open to extend it in that direction.

Maybe by making the annotation having control over if the underlying attribute should be persisted or not, like @DynamoDbIgnore (or maybe it already work alongside it? )

Suggested change
public @interface DynamoDbSubtypeName {
}
public @interface DynamoDbSubtypeName {
/**
* Indicated if this attribute is to be persisted in DynamoDB record or not, in which case is only used by mapper creating the sub-type entities. Default is true, the value is persisted as attribute.
*/
boolean persist();
}

Then we could use it like so

@DynamoDbBean
@DynamoDbSubtypes({
        @Subtype(name = "CUSTOMER", subtypeClass = Customer.class),
        @Subtype(name = "ORDER", subtypeClass = Order.class)})
public class CustomerRelatedEntity {
   private String customerId;

   @DynamoDbPartitionKey
   String getCountry();
   
}
@DynamoDbBean
public class Customer extends CustomerRelatedEntity {
 
   @DynamoDbSortKey
   String getSortKey() {
     return String.format( "%s#%s" , getSubType() , customerId; 
   }
   
   @DynamoDbSubtypeName ( persist = false )
   String getSubType() {
     return "CUSTOMER";
   }

}
@DynamoDbBean
public class Order extends CustomerRelatedEntity {
   private String orderId;

   @DynamoDbSortKey
   String getSortKey() {
     return String.format( "%s#%s#%s" , getSubType() ,customerId, orderId); 
   }
   
   @DynamoDbSubtypeName ( persist = false )
   String getSubType() {
     return "ORDER";
   }
}


I'm sorry for extensive comment, thank you for your work on this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an excellent point, and I must confess was on my mind too as I was submitting the PR going through the mental checklist in my mind as to what could be done better. It would be trivial to do this I think if we only had to support mapping an object to an attributeValue map as we could simply introspect the class of the instance and know the correct subtype, but we need to give the mapper the capability of determining the subtype in reverse given just a bunch of keys and AttributeValues. I'd want to start with what we think the actual developer experience should be to express that in an intuitive way, would it be passing in a lambda function that takes a string and returns the subtype? That's still limiting because it implies that the subtype must be determined from a single attribute and that attribute must be [or naturally convert into] a string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for response, I see your point on still being limiting. I believe no single entity definition will live across several fields. If a combination of several attributes is needed to determine a single subtype, than that schema actually has sub-subtypes and can be solved at schema level naming itself.

As a developer (and this is just my 2 cents, I'm biased with my use case :)), I would like to say what the sybtype is, without coupling it with an actual @DynamoDBAttribute. I see 2 options, the mentioned optional nature of persisting a @DynamosubTypeName, either with an attribute like above, or just letting it work together with @DynamoDbIgnore like so

@DynamoDbSubtypeName
@DynamoDbIgnore
   String getSubType() {
     return "ORDER";
   }

In the example above I use the annotation @DynamoDbSortKey to generate a complex sortKey, that is not a single field from my java class, so the enhanced-sdk gives me that power without forcing any added complexity to users who don't need it.
All in all, maybe @DynamoDbSubtypeName should always be without the persistence, it is just a method annotation that says how to generate the subtype name.
If the subtype is a 1-1 match for a field in the @DynamoDBBean, the method would just return that field.

For the reverse part you mention, I'm not sure I follow. The subtype name will always be something static, so even if we process something like id-123#someValue#SUBTYPE#something_else, that lambda would always return "SUBTYPE". Maybe we don't actually need to process the string, its always a constant? Again , probably I missed your point there :)

Regardless, if we do need to specify such transformation function/lambda I think having a method reference in an attribute, like for example jUnit @MethodSource for parameterized testing, is user friendly and common in Java. Something like:

@DynamoDbSubtypeName
@SubTypeGenerator("getFancySubtype")
   String getSubType() {
     return "ORDER";
   }
   
   private String getFancySubtype(final String input) {
       return doSomethingToMy(input);
   }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miguelcss In your example where you are not persisting the subtype name in the database, when a getItem is performed and a record is read from the database, how would the mapper determine the correct subtype to map the record into? Bear in mind all it would have at the point it must decide is a Map<String, AttributeValue>. I believe the only answer that makes sense would be that it would have to infer it based on the value of the composite key, but the rub is how could we let the mapper know how to make that determination by itself since it can't instantiate any structured objects without knowing the correct subtype first.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, got it, thanks for detailing! Then I think the annotation for a "typeGenerator" or "typeParser" is good, in the sense that it makes it easier to specify a mandatory method for doing that computation in the annotation and validate if such method is indeed provided at compile time. Tbh I think lambda is also good, just wondering about how to make it "easy" for users to understand they need to , and where to, provide it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miguelcss It would have to be a static method or a lambda function as there is no way to instantiate the root class (as it may be abstract) just for the purposes of subtype determination. Optimistically, we could assume that the determination could be made from a single string attribute. I've tried very hard up to this point to shield users of the enhanced client from having to deal directly with AttributeValue maps, as really that's the whole point of this library, hence my rationale for starting with something simple that most users will intuitively get (because it basically looks like Jackson) with the possibility of extending it later for more advanced/customized cases.

The way it works right now if you implement getSubtype to do something fancy, then at least that computed value will be persisted in the DB so that it can be read and the correct subtype used on the reverse lookup. The difficulties are going to be for customers that already have data that they want to map to, they will have to backfill the subtype into their existing data-set. The other sharp edge is that because it's being persisted (and hence reverse-mapped) a setter is required, but in this case the setter has nothing to do other than verify that the subtype was correctly inferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it just occurred to me that a power user could write an extension and leverage the afterRead hook to compute and add the explicit subtype to the attributeValue map thus enabling the mapper to correctly determine the subtype. I'll have to think more on this and do a little POC just to double check that it really works.

@DynamoDbSubtypes({
@Subtype(name = "CUSTOMER", subtypeClass = Customer.class),
@Subtype(name = "ORDER", subtypeClass = Order.class)})
public class CustomerRelatedEntity {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java code would be way neater if this could be an interface and classes implementing it. Is that a feasible option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvlato could you elaborate on exactly what you are proposing? Maybe with an example?

@siwonred
Copy link

siwonred commented Jan 22, 2022

Is there any update? I'm really looking forward to this PR :-)

@bmaizels
Copy link
Contributor Author

Sorry I've been away from this a while. One of the action items I took away from an internal review was to write an actual proof of concept test case that proves this is easy to use for single table design. As it turns out, there are some problems with it. The biggest problem is the current design forces you to map the subtype discriminator as a string attribute in your table which must be persisted in the database. I initially thought there was a workaround to this by using the afterRead and beforeWrite hooks in the extension framework to fudge it, but when I tried to actually build a POC around it, it turned out this simply isn't possible and even if it was it would not be meeting our usability goals. I had to go back to the drawing board a bit, and figure out a more flexible way of discriminating the entity and came up with a few proposals. I'd love to get feedback around which people prefer and find more intuitive.

Any of these options could be potentially combined, but increases the risk of usage confusion. Option 1b is the status quo.

Option 1a : Mapped discriminator type; discriminator values on subclass

@DynamoDbBean
@DynamoDbSupertype(subtypes = {Cat.class, Dog.class})
public class Animal {
  // ...
  
  @DynamoDbSubtypeDiscriminator
  public String getType() {
    // ...
  }
  
  public void setType(String type) {}
}

@DynamoDbBean
@DynamoDbSubtype("CAT")
class Cat {
  // ...
}

Option 1b : Mapped discriminator type; discriminator values on superclass

@DynamoDbBean
@DynamoDbSupertype({
   @Subtype(discriminatorValue = "CAT", subtypeClass = Cat.class),
   @Subtype(discriminatorValue = "DOG", subtypeClass = Dog.class) })
public class Animal {
  // ...
  
  @DynamoDbSubtypeDiscriminator
  public String getType() {
    // ...
  }
  
  public void setType(String type) {}
}

@DynamoDbBean
class Cat {
  // ...
}

Option 2a : Unmapped discriminator attribute; discriminator values on subclass

@DynamoDbBean
@DynamoDbSupertype(subtypes = {Cat.class, Dog.class},
                   discriminatorAttribute = "type")
public class Animal {
  // ...
}

@DynamoDbBean
@DynamoDbSubtype("CAT")
class Cat {
  // ...
}

Option 2b : Unmapped discriminator attribute; discriminator values on superclass

@DynamoDbBean
@DynamoDbSupertype(
   subtypes = {
     @Subtype(discriminatorValue = "CAT", subtypeClass = Cat.class),
     @Subtype(discriminatorValue = "DOG", subtypeClass = Dog.class)},
   discriminatorAttribute = "type")
public class Animal {
  // ...
}

@DynamoDbBean
class Cat {
  // ...
}

Option 3a : Discriminator function; discriminator values on subclass

@DynamoDbBean
@DynamoDbSupertype(subtypes = {Cat.class, Dog.class},
                   discriminatedBy = AnimalDiscriminator.class)
public class Animal {
  // ...
}

public class AnimalDiscriminator implements DynamoDbEntityDiscriminator {
   @Override
   public String computeDiscriminatorValue(Map<String, AttributeValue> entity) {
     // ...
   }
}

@DynamoDbBean
@DynamoDbSubtype("CAT")
class Cat {
  // ...
}

Option 3b : Discriminator function; discriminator values on superclass

@DynamoDbBean
@DynamoDbSupertype(
   subtypes = {
     @Subtype(discriminatorValue = "CAT", subtypeClass = Cat.class),
     @Subtype(discriminatorValue = "DOG", subtypeClass = Dog.class)},
   discriminatedBy = AnimalDiscriminator.class)
public class Animal {
  // ...
}

public class AnimalDiscriminator implements DynamoDbEntityDiscriminator {
   @Override
   public String computeDiscriminatorValue(Map<String, AttributeValue> entity) {
     // ...
   }
}

@DynamoDbBean
class Cat {
  // ...
}

@JudahBrick
Copy link

I like option 1b

@DavidSeptimus-Klotho
Copy link

DavidSeptimus-Klotho commented May 18, 2022

@bmaizels it sure has been a while. Thanks for providing an update.

With respect to your proposals, I'm generally in favor of the 'b' variant -- keeping the implementation pattern in alignment with the pattern used by Jackson's polymorphism implementation. In addition, by having all of the polymorphism configuration applied to the superclass, we're making it clear that the polymorphism configuration belongs to the superclass alone. It makes sense that if a consumer instantiates a separate TableSchema/Table using the subclass, none of that polymorphism configuration is applicable.

As for 1, 2, or 3:

  • 1 is (in my opinion) the least optimal for a couple of key reasons:
    1. It relies on the consumer to correctly implement the type property in accordance with the intent of the library writer.
    2. Regardless of the correctness of type's implementation, there's a risk that any consumers of the polymorphic entity will mistake type for a typical mutable bean property and treat it as such in their code.
  • 2 would be a solid default option that requires minimal effort for consumers to implement and keeps the discriminator attribute out of the entity class by default.
  • 3 makes sense as an advanced option for anyone wanting to implement their own discriminator (e.g. extracting the type from a composite attribute as mentioned by @miguelcss above).

My vote would be for 2b + 3b and have the initialization code throw an exception if both discriminatorAttribute and discriminatedBy are defined in a single @DynamoDbSupertype declaration.

@bmaizels
Copy link
Contributor Author

@DavidSeptimus I'm sold on 2 and 3 being generally better and more useful than 1, and I also agree about variant 'b' being conceptually more sound for the reasons you articulated very clearly. There's a complication with using a named attribute (option 2), and that is when converting from object to map there is no longer an explicit way to determine the entity type other than through class introspection. With option 1 this was not a problem because the entity type was mapped on the object and the mapper knew how to read it. There's an additional risk here that the actual class of the object is itself a child of the class that is a declared subtype, in which case I'll have to make sure the class chain is traversed until a match is found. No big deal really just a 'gotcha' I'll have to write behavior/tests for. I think it's time for me to get coding again and go for options 2b and 3b combined. Watch this space and thanks for your patience. By the way if anyone is frustrated by this and needs a good workaround I was linked a good blog that explains the workaround I would recommend: https://davidagood.com/dynamodb-enhanced-client-java-heterogeneous-item-collections/

@kaldav
Copy link

kaldav commented Jun 19, 2023

Is there any update? I'm really looking forward to this PR too!

aws-sdk-java-automation pushed a commit that referenced this pull request Jan 16, 2024
…ease

Activating SRA auth for services with specific signer behaviors, and …
@mpordomingo
Copy link

mpordomingo commented Jan 22, 2024

Hi all, are there any updates on this feature? I'm keen on using it too!

@chrylis
Copy link

chrylis commented Jun 19, 2024

I'm implementing a feature (activity notifications, where there are mention/reaction/reply records but shared base properties) that needs exactly this data model, so I'm bumping this ticket.

@abdulbasit1149
Copy link

Any update on this?

@DevCam
Copy link

DevCam commented Feb 12, 2025

Hi, any plans to merge this feature? It would be super useful !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DynamoDB Enhanced Client Polymorphic Types