Skip to content

Testing utilities for downstream code #28

@jl-wynen

Description

@jl-wynen

As it stands, code built on top of pyscicat can only be tested either against an instance of scicat or by mocking requests. The former is too slow for development. The latter means that you lose the abstraction the client provides because users need to know exactly which requests are made.

For example, let's say we have some code that creates a dataset and datablock. But we don't know (or care) exactly how, in what order, or if it does anything else to scicat. We only care about the fact that at the end, there is a dataset and a datablock in scicat. Currently, we have to test it like this:

client = ScicatClient(...)
dataset = ...
datablock = ....
with requests_mock.Mocker() as mock_request:
    set_up_mock(mock_requests)
    a_function_that_uploads_a_dataset(client, dataset, datablock)
   # assert that mock_request.request_history contains the correct requests

A much better solution would be

client = FakeClient()
dataset = ...
datablock = ....
a_function_that_uploads_a_dataset(client, dataset, datablock)
assert dataset in client.__fake_datasets__
assert datablok in client.__fake_datablocks__

This is better because:

  • it requires no knowledge of requests and the API
  • it does not require parsing of URLs and request parameters
  • it still works if the order of operations in a_function_that_uploads_a_dataset changes or it uses different endpoints for some reason

With the changes in #27, FakeClient can be implemented along these lines:

class FakeClient(ScicatClient):
    def __init__(self, datasets):
        super().__init__(base_url='https://does-not-exist', token='123')
        self._datasets = datasets

    @property
    def __fake_datasets__(self):
        return self._datasets

    def _call_endpoint(
            self,
            cmd: str,
            endpoint: str,
            data: BaseModel = None,
            operation: str = "",
            allow_404=False,
        ) -> Optional[dict]:
        if cmd == 'get':
            base, pid = endpoint.split('/')
            if base == 'Datasets':
                return self.__fake_datasets__[pid].dict(exclude_none=True)
            raise NotImplementedError(f'GET with endpoint {endpoint} not supported')
        else:
            raise NotImplementedError(f'Command not supported: {cmd}')

It is not strictly necessary to support all endpoints in the fake. It would already be useful if it only supports the most common ones.

Downsides:

  • increased maintainance burden on pyscicat. The fake needs to be kept up to date as well as the actual client
  • the fake needs to be tested which might be tricky

I think that we need something like this in our code. Otherwise, we cannot really test our usage of pyscicat. It is an option that we develop it entirely on our side. But it would be beneficial to the larger community if the fake was provided by pyscicat itself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions