-
Notifications
You must be signed in to change notification settings - Fork 3
Abstract datagouv interactions #464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
That's amazing! The API keys could also be provided through environment variables Otherwise I like your last proposal. Something like: from datagouv import Client, Dataset, Resource
client = Client(api_key=DATAGOUV_SECRET_API_KEY)
my_dataset: Dataset = client.create_dataset(...)
my_dataset.update_metadata({...})
my_resource_1: Resource = my_dataset.create_static_resource(...)
my_resource_2 = client.create_static_resource(dataset_id=my_dataset.dataset_id, ...)
same_resource_1 = client.dataset(dataset_id=my_resource_1.dataset_id).resource(resource_id=my_resource_1.resource_id) It makes sense to me. But: dataset = client.dataset.create(...) does not feel off to me. If |
Thanks for the feeback 🙏 I don't know what I prefer between |
We can go with both syntax imo. Since it will be a static method, one of the two can just be a wrapper around the other one to make it easier to maintain.
Looking at the API https://guides.data.gouv.fr/guide-data.gouv.fr/readme-1/reference/datasets we could make only the |
This is what a workflow could look like with improvements made after your feedback 🙏 : from datagouv import Client
client = Client(api_key=DATAGOUV_SECRET_API_KEY)
my_dataset = client.dataset().create({"title": "Brand new dataset"}) # this creates the dataset online, and returns an instance of Dataset
print(my_dataset.created_at) # Datasets and Resources have some (the list can be refined) attributes set from their metadata
# let's populate our new dataset
for file in files:
resource = client.resource().create_static(
file_to_upload={"source_path": file["path"], "source_name": file["name"]},
dataset_id=my_dataset.id,
payload={"title": file["title"]},
)
# alternatively, it's possible to create a resource from the dataset itself, in which case you don't have to specify the dataset_id
# resource = my_dataset.create_static(
# file_to_upload={"source_path": file["path"], "source_name": file["name"]},
# )
# both return an instance of Resource
print(resource.url) # url is a Resource only attribute
# and we also have a documentation online
remote_resource = my_dataset.create_remote(
payload={"url": "http://url/to/doc.pdf", "title": "Documentation", "type": "documentation"},
)
print(remote_resource) # print displays all the Resource's attributes in a dict
my_dataset.update({"title": "The true title"}) # the dataset's title is modified online
print(my_dataset.title) # and the new title is directly changed in the object We handle communautary resources as well, and can update extras of objects. |
New syntaxes:
Ideally, I'd like the creation functions to be class or static methods, because it feels weird to have to instanciate a
Dataset
orResource
to be able to create one, but I have not managed to find a way to do that, as bothDatasetCreator
andResourceCreator
need to be given the instanciated client.An other syntax could be: