Skip to content

Conversation

@RandomInsano
Copy link
Contributor

This PR changes cargo loco seed --dump to stream data to files instead of building a large string before writing to disk. It uses a buffered writer to lessen the access overhead. I will add the same for seeding, but that will require a bit more work and I wanted to get opinions on this first.

One problem is that I couldn't find a reasonable way to access sqlx's streaming for custom queries in SeaORM, so in my case Loco is still allocating 6GB of RAM for 3 million rows. Is it alright for me to access sqlx directly here?

Sorry for the formatting shift here. Let me know if you want me to revert it.

@RandomInsano
Copy link
Contributor Author

Did some experimenting here. Consider this an RFC because I've fixed my problem and can polish this up to upstream.

Because serde_yaml didn't support streaming inside of async context with serde_yaml::Serializer, I changed the serializing to serde_json. The YAML serializer failed due to an internal `static lifetime, but it's also deprecated. Frankly, I prefer YAML for this context myself.

The format of the JSON is not a true JSON array but just a series of JSON objects side-by-side. I've confirmed jq is happy with this, and it allows us to use serde_json::Deserializer::from_reader() to read data in object by object. I'm not sure if it's safe/possible to put this into a database transaction, but happy to do that if the import should be atomic (say a dump file is corrupted).

I raised a dump() function into the Hooks trait so that the Generics could be used when serializing. What I don't like at the moment is that the implementor of that function needs to do the table filtering, but I don't have a good solution to promote it to run_app_dump()... Some advice there would be help, but I can mull it over if the work I've done here is worthwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant