Skip to content

Schema type checker #335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

abigailalice
Copy link
Contributor

@abigailalice abigailalice commented Aug 2, 2024

This should close #274 and #186, and potentially #291. The semantics are primarily based on @KaneTW's gist in #274, except with two improvements:

  1. It handles array types
  2. It returns an error if two columns in a TableSchema will have identical names, instead of passing silently

It also displays information in a slightly different way, trying to show as many errors as it can rather than bailing on the first. It also adds a number of test cases to ensure it behaves correctly for the primitive types, higher kinded types, and a few tests to ensure it fails in cases where it should fail. I'm mostly putting this forward to see what other people think, as I'm not sure if it should have other changes before being accepted. Some of my (admittedly bikesheddy) thoughts on the current state:

  1. I wasn't able to re-export it from Rel8 without creating an import cycle, so it's only exported via its own module being public. For some reason the module wouldn't compile unless it imported Rel8, getting a type error that Columns wasn't injective otherwise. I'm assuming the issue is that something not being in scope is effecting type checking, but I've never actually seen an error like that, so really aren't sure. I'm also not really sure if Rel8.Table.Verify is the best name for the module, I just picked Table pretty arbitrarily.
  2. I reused tests/Rel8/Generic/Rel8able/Test.hs quite a bit to add some things necessary for my own test cases, but I'm not really sure if that's an appropriate place for them. I could move them if that's preferred.
  3. The current work won't show KaneTW's original work, since I just copy-pasted it. Not sure if there's a proper way to base work off a gist, or if it's even an issue, but I thought it should still be said
  4. I kind of want to have a version of the type checking function which also reads the data from the table after checking the schema, as having a valid schema isn't enough to ensure existing data can be read by rel8, which can be important when doing migrations. However, this isn't possible just using the Statement type. I could just add a version using Session which does this, but for people using hasql-transaction they would still need to rewrite that function manually. This isn't a terribly difficult thing for users of the library to do themselves, since anyone using it is already going to need to convert from Statement to Transaction, and can just read from the table after type checking, but it just feels like an oversite to not have this. For the moment the issue is explained in the documentation for the function.

It should also be said that #59 is kind of exploring similar ideas, and @KaneTW also was exploring some other ideas for interfaces in other changes to the original gist.

@KaneTW
Copy link

KaneTW commented Aug 2, 2024

There's a way to tag people in the commit text:

Co-authored-by: NAME <[email protected]>

For me it would be:

Co-authored-by: David Kraeutmann <[email protected]>

@abigailalice
Copy link
Contributor Author

Oh, I think Fixed n and Char also have a bug, due to not handling modifiers correctly. I'll work on that, as well as add the co-author disclaimer.

The failed build seems to just be from an unrelated test which happens to fail non-determinstically, using too precise rational comparison. I could fix that too, though I'm not sure if the 1e-15 is supposed to be correct, or just picked arbitrarily. I've noticed a few non-deterministic tests while working on this.

@abigailalice
Copy link
Contributor Author

abigailalice commented Aug 3, 2024

Okay, I fixed the bugs for those corner cases, added some test cases to cover them, and added proper attribution. My only major concern now is that I needed to force push to correct the first commit message, and that may have affected other commits in the middle. I'm newish to git in groups, so let me know if that's an issue, and I can just make another branch without the issue, and issue a new pull request. It does say I want to merge 9 commits in, and that is certainly not the case. Otherwise, unless any of the bikeshedding is cause for concern, I'm happy with the current state of things.

I should say I've only tested this superficially on a live project; it was able to detect some bugs in my code in some rarely inserted into tables (due to fields which had been removed in Haskell, but not in the database). But otherwise most of the testing is from the test cases themselves. Primarily they work by taking a schema, using showCreateTable to generate the table, ensuring the typechecker doesn't detect any errors, then inserting some random data into it, but there are a few tests which also generate bad tables to ensure the type checker does fail when it should as well.

@ocharles
Copy link
Contributor

ocharles commented Aug 5, 2024

Thanks for all your hard work here @abigailalice and @KaneTW! There's quite a lot here so it might take me a while to get through this as I'm really busy at work at the moment, but I will try and get to this 🙏

@abigailalice
Copy link
Contributor Author

Take your time, just remember that the middle commits aren't anything related to the actual PR, only the first and last are.

@ocharles
Copy link
Contributor

Just remember that the middle commits aren't anything related to the actual PR, only the first and last are.

This makes it very difficult to review. Could you rebase your branch so the changes are on master? Don't worry too much about commit attribution - we squash merge here anyway, so we can sort that separately.

This closes circuithub#274 and circuithub#186, allowing the generation of CREATE TABLE
statements from a TableSchema, as well as checking a TableSchema against
a database to determine if the tables are defined correctly in the
database to be read from/written to by rel8.

It also adds tests for the creation and type checking of tables, to
ensure they succeed/fail in appropriate cases.

Co-authored-by: David Kraeutmann <[email protected]>
@abigailalice
Copy link
Contributor Author

abigailalice commented Aug 12, 2024

Okay @ocharles, I think it should be at an acceptable state to review. Sorry about the difficulty. I should say the issues stated in (1), (2), and (4) in the original PR are still present, assuming anyone feels strongly about them.

@abigailalice
Copy link
Contributor Author

@ocharles, is there something preventing this from being accepted, or have you just not had time to get around to it? Not an issue if it's the latter, just wanted to make sure if I misunderstood anything.

@ocharles
Copy link
Contributor

@abigailalice purely time constraints I'm afraid! It's still on my radar. I'll be on holiday in a few days so this might need to wait until mid September

Copy link
Contributor

@TeofilC TeofilC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @abigailalice and apologies that this has been waiting for a review for so long!

This is very nice! I appreciate that you added some tests, and that the exposed functions have nice comments.

I've left a few suggestions, mostly little ideas for making things clearer.
If we resolve those I think this should be good to merge.



attrsToMap :: [Attribute Result] -> M.Map String (Attribute Result)
attrsToMap = foldMap (\attr -> M.singleton (T.unpack $ attr.attribute.attname) attr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
attrsToMap = foldMap (\attr -> M.singleton (T.unpack $ attr.attribute.attname) attr)
attrsToMap = M.fromList . map (\attr -> (T.unpack attr.attribute.name, attr))

Something like this is likely to have better performance and is a bit clearer (at least to me)

, typeName = info.typeName}])
where
go :: [(String, TypeInfo)] -> M.Map String (NonEmpty.NonEmpty TypeInfo)
go = M.fromListWith (<>) . (fmap . fmap) pure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
go = M.fromListWith (<>) . (fmap . fmap) pure
go = M.fromListWith (<>) . map (\(name, typeInfo) -> (name, NonEmpty.singleton typeInfo))

This is a bit more verbose but a bit clearer

Comment on lines +348 to +349
-- In the event multiple columns have the same name, this will fail silently. To
-- handle that case, see @'checkedShowCreateTable'@
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- In the event multiple columns have the same name, this will fail silently. To
-- handle that case, see @'checkedShowCreateTable'@
-- In the event that multiple columns have the same name, this will fail silently. To
-- handle that case, see 'checkedShowCreateTable'

-- table. This does not show relationships like primary or foreign keys, but can
-- still be useful to see what types rel8 will expect of the underlying database.
--
-- In the event multiple columns have the same name, this will return a map of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- In the event multiple columns have the same name, this will return a map of
-- In the event that multiple columns have the same name, this will return a map of

-- names to the labels identifying the column.
checkedShowCreateTable :: Rel8able k => TableSchema (k Name) -> Either (M.Map String (NonEmpty [String])) String
checkedShowCreateTable schema = case checkedSchemaToTypeMap schema.columns of
Left e -> Left $ (fmap . fmap) (\typ -> typ.label) e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Left e -> Left $ (fmap . fmap) (\typ -> typ.label) e
Left e -> Left $ map (\(name, typ) -> (name, typ.label)) e

Slightly clearer

--
-- However, it is possible for migrations to cause valid data to become invalid,
-- in ways not detectable by this function, if the migration code changes the
-- schema correctly but doesn't handle the value-level constraints correctly, so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- schema correctly but doesn't handle the value-level constraints correctly, so
-- schema correctly but doesn't handle the value-level constraints correctly. So

-- in ways not detectable by this function, if the migration code changes the
-- schema correctly but doesn't handle the value-level constraints correctly, so
-- it is a good idea to both read from the tables and check the schema for errors
-- in a transaction during the migration. This former will catch value-level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- in a transaction during the migration. This former will catch value-level
-- in a transaction during the migration. The former will catch value-level

Comment on lines +578 to +580
-- This is used by @'schemaErrors'@ to more conveniently group every table an
-- application relies on together, for typechecking the postgresql schemas
-- together in a single batch.
Copy link
Contributor

@TeofilC TeofilC May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- This is used by @'schemaErrors'@ to more conveniently group every table an
-- application relies on together, for typechecking the postgresql schemas
-- together in a single batch.
-- This is used by @'schemaErrors'@ to conveniently group every table an
-- application relies on for typechecking the postgresql schemas
-- together in a single batch.

-- |@'getSchemaErrors'@ checks whether the provided schemas have correct postgresql
-- column names and types to allow reading and writing from their equivalent Haskell
-- types, returning a list of errors if that is not the case. The function does not
-- crash on an encountered bug, instead leaving it to the caller to decide how
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- crash on an encountered bug, instead leaving it to the caller to decide how
-- crash on encountering a bug, instead leaving it to the caller to decide how

-- This function does nothing to check that the conflict target of an @Upsert@
-- are valid for the schema, nor can it prevent invalid uses of @unsafeDefault@.
-- However, it should be enough to catch the most likely errors.
getSchemaErrors :: [SomeTableSchema] -> HS.Statement () (Maybe Text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want Text or String here. Some of the other functions return String but this returns Text. I don't think it makes much of a difference but I wanted to check whether you wanted it to be consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Generate tests that can check against an actual postgres schema
4 participants