Skip to content
This repository was archived by the owner on Jul 6, 2020. It is now read-only.
This repository was archived by the owner on Jul 6, 2020. It is now read-only.

Offline and persistent cache support #61

@JoviDeCroock

Description

@JoviDeCroock

Offline

We all think about this in the modern PWA-era but there's a lot to this. We'll have to keep track of what requests the user needs to send when the connection is restored, after these requests are sent there will MOST likely be several optimistic entries to clear.

Operations

So for knowing what operations to cache it should be sufficient to only cache mutation operations. These will then be kept in a map<key, operation> and be persisted to some indexedDB/localStorage when we kill the application and they haven't been dispatched yet.

The hard part about this is that we would have to restore the optimisticKeys in the exchange, this makes me think about moving these to our instance of store instead. Since the serialisation of entities, links and optimisticKeys could then happen from one place. This brings as additional advantage that it can be done with one restore method.

One concern would be the read/write speed of killing/rebooting the cache in this state. The HAMT structure is quite hard to serialise taking in account that it will contain optimistic values mixed with normal ones.

Connection checking

This should be easily doable by means of navigator.online, we could buffer all requests until we come online and then send them in the correct order one by one to avoid concurrency problems. The difficult part here wold be that we buffer up until all operations are dispatched, this means that if the user performs another action while we are emptying the queue this could take a while to get a response (given we are using optimisticResponses though).

Ideally when we see we are offline we filter all queries, and just keep them incomplete. When we see we are going offline all subscriptions should receive an active teardown.

Exchange

When reasoning about this my thoughts always wonder to a separate exchange to manage the operation buffering and to incorporate the restoring/serialising inside the graphCache. This has a bit of an overlap but I think it's sufficient reason to keep them separate.

Persistance

Here I'm having issues seeing how we could effectively solve this, we have the schema now so we could potentially just iterate over the whole schema and write it that way but that won't cover the case where people just want persisted-cache without the whole schema effort.

What scares me the most about this is that localStorage isn't the ideal candidate for persisted cache but by using indexedDB we exclude about 5% of the browser population.
IndexedDB seems to ask for permission if a blob is >50MB on Firefox, that's about it no explicit size limitations for even a single data field.

The max size for localStorage is 10MB so I don't really think this is sufficient for big applications, since the initial cost of the data structure is also there. We could strip everything down but how do we rebuilt it then, maybe by bucket size?

This is a brain dump of what I've been thinking about and is by no means a final solution but I think this could serve as an entry to finding the solution to what feels like a really awesome feature.

Other relevant solution: https://github.com/redux-offline/redux-offline/tree/v1.1.0#persistence-is-key

This uses redux-persist under the hood that also relies on indexedDB under the hood. Since this is a reliable and widespread solution I think it's safe to resort to indexedDB and fallback to localStorage when needed.

For react-native we can easily resort to the AsyncStorage module. It seems that AsyncStorage isn't 100% safe either since on android this errors out when you exceed a 6MB write.

Introducing some way of leaving certain fields/queries out seems very mandatory to me since in the test described underneath we see that we're hitting the limits of localStorage pretty quickly.

Test

I did a small test with our current benchmarking where I serialised 50k entities and just wrote them to a JSON file to look at the size:

ENTITIES 14260659B 14.260659MB
Links 664618B 0.664618MB

This already exceeds the limits of localStorage and would cause a prompt in indexedDB asking for permissions saving this amount of data.

Code used:

const urqlStore = new Store();
write(urqlStore, { query: BooksQuery }, { books: tenThousandBooks });
write(
  urqlStore,
  { query: EmployeesQuery },
  { employees: tenThousandEmployees }
);
write(urqlStore, { query: StoresQuery }, { stores: tenThousandStores });
write(urqlStore, { query: WritersQuery }, { writers: tenThousandWriters });
write(urqlStore, { query: TodosQuery }, { todos: tenThousandEntries });

const entities = JSON.stringify(urqlStore.records);
const links = JSON.stringify(urqlStore.links);

fs.writeFileSync('./entities.json', entities);
fs.writeFileSync('./links.json', links);

const { size: entityFileSize } = fs.statSync('./entities.json');
const { size: linkFileSize } = fs.statSync('./links.json');
console.log('ENTITIES', entityFileSize, entityFileSize / 1000000.0)
console.log('Links', linkFileSize, linkFileSize / 1000000.0)

Wild thoughts

I've been thinking about maybe making a distinction between a storage.native and a storage file. This way we could leverage web workers and application cache to write our results at runtime instead of just when we close the application.

Requirements

To implement persistent data we would have to implement an adapter with an API surface for getting setting and deleting. People can in turn pass in every storage they would like, this way people who use something like PouchDB can write an adapter and just use that.

We should decide on an approach when to write, after every query? This would make us have to write after every optimistic write as well which makes everything a tad harder certainly since it's going to be hard to incrementally write changes from our HAMT structure. I think it's better to work with a hydrate and exit approach. This could make writes take up more time but in the end would require a whole lot less logic.

We would need an approach that can evict certain portions of the state from being cached, examples would be an exclude/include pattern. When we include something that will be the only thing being cached. When we exclude something all but that exclude will be cached. These should be mutually exclusive.

When not supplied with a schema how would we arrange for excluding data.

Drew up a diagram of how I expect this to happen, the code for the offline part was easy to write and is done.

Screenshot 2019-09-05 at 15 05 40

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions