Skip to content

Conversation

@oguzhanunlu
Copy link
Member

@oguzhanunlu oguzhanunlu commented Apr 15, 2025

This PR adds vendorAsSnakeCase and nameAsSnakeCase methods to SchemaKey exactly as scala-analytics-sdk does so that we don't need to copy-paste it across projects.

Note that rdb loader uses a different regex for snake case conversion. All other loaders can reuse these new methods without worry as they use analytics sdk currently.

Copy link
Contributor

@istreeter istreeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @oguzhanunlu I have two separate criticisms with how this has been done:

1. How it fits into the public API of this library

There is a very specific reason you are making a change to iglu-scala-core: This is about snake casing vendor name and schema name for the purpose of warehouse column names. Iglu-core is not the place to put generic string methods for generic purposes.

In my opinion.... the public API of this library should make it more clear how/when this function should be used. For example, you could add new methods on the SchemaKey class:

  def vendorAsSnakeCase: String = ???
  def nameAsSnakeCase: String = ???

(you might think of better names, but you see what I mean)

2. Where you copied the code from

As already discussed elsewhere, we have two implementations of how we snake-casify vendor/name for the warehouse:

  1. RDB Loader uses its own implementation
  2. All other loaders use the common implementation from scala-analytics-sdk

Here, you have copied the RDB Loader implementation. Which seems like the wrong choice, because it means no other loader can ever use this shared implementation you're adding to iglu-scala-core. It would be a breaking change to a loader, if we change how columns are named.

There are edge-cases where the two implementations given different results. Can you think of some examples of vendor/schema names for which choice of implementation is important?

@oguzhanunlu oguzhanunlu changed the title Add StringUtils Add vendorAsSnakeCase and nameAsSnakeCase Apr 17, 2025
@oguzhanunlu
Copy link
Member Author

Thanks for the detailed comment @istreeter ! I agree and pushed a new commit to address your feedback and updated PR description.

For the question, rdb loader uses same regex for both vendor and name but analytics sdk doesn't. e.g. if vendor is comAcme, rdb loader will convert that to com_acme but other loaders will convert that to comacme.

@oguzhanunlu oguzhanunlu requested a review from istreeter April 17, 2025 08:33
@istreeter
Copy link
Contributor

In your opinion, does this need unit tests?

@oguzhanunlu
Copy link
Member Author

I think it does because a future change to regex can break all loaders (probably can't due to loaders' test suites but you see my point). I had the same thinking for other conversion methods (toSchemaUri, toPath etc.) as well but seeing that they are not tested made me suspicious regarding our approach (maybe we thought downstream projects' tests are enough for them?)

@istreeter
Copy link
Contributor

I had the same thinking for other conversion methods (toSchemaUri, toPath etc.) as well but seeing that they are not tested....

Actually toSchemaUri is tested! Try changing the implementation to something different, and you will see that tests fail.

(I admit that toPath does not appear to have testing).

This PR is all about moving this code from analytics sdk upstream into a different library. Again, the corresponding code in analytics-sdk is tested. Try changing the implementation in analytics-sdk and you will see that tests fail.

In general, if we add code, we make sure there is unit test coverage.

@oguzhanunlu
Copy link
Member Author

hey @istreeter , I just added 2 unit tests covering all changes applied by regexes for vendor and name

@oguzhanunlu oguzhanunlu merged commit 3fe9acc into develop Apr 17, 2025
1 check passed
@oguzhanunlu oguzhanunlu deleted the snake-case branch April 17, 2025 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants