-
-
Notifications
You must be signed in to change notification settings - Fork 71
Open
Labels
Complexity: Mediumrequire research/investigation before completing; internal team info/input or external team questionrequire research/investigation before completing; internal team info/input or external team questionDependencyAn issue that includes dependenciesAn issue that includes dependenciesRole: Data ScienceData management, loading, or analysisData management, loading, or analysisp-feature: datainfo available to users i.e. NC boundaries/names, SR info/data, etc (user friendly map info/data)info available to users i.e. NC boundaries/names, SR info/data, etc (user friendly map info/data)size: 3ptCan be done in 13-18 hoursCan be done in 13-18 hours
Milestone
Description
Dependency
- We need to see that there is a demand for Huggingface as a backup to the Socrata API before we proceed
- every time we add sub-categorizations of data, we need to write logic to "bridge" the datasets
- this adds complexity that I think we should avoid unless we find that Socrata API is just not reliable
Overview
We need to create quarterly Huggingface Repositories so that we can load smaller chunks of data when using Hugginface, which will increase load times and lead to a better user experience.
More Information
We will start with quarters for 2024, simply because each repository will require extra work to ensure our cron job is updating each repo correctly. Completing this work can open up the same work to be done on 2023 and years prior. While this ticket is not blocked by the ticket to create Hf repo for 2025 (#1895), it is closely related.
Action Items
- PM or dev lead: create the following Hugginface Repositories
- 2024Q1
- 2024Q2
- 2024Q3
- 2024Q4
- modify
hfClean()to additionally create quarterly datasets locally. Consider using naming convention similar to the ones we make in Huggingface - modify
hfUpload()to also upload those files to the quarterly data repos
Resources/Instructions
Related Tickets
Other Resources
Metadata
Metadata
Assignees
Labels
Complexity: Mediumrequire research/investigation before completing; internal team info/input or external team questionrequire research/investigation before completing; internal team info/input or external team questionDependencyAn issue that includes dependenciesAn issue that includes dependenciesRole: Data ScienceData management, loading, or analysisData management, loading, or analysisp-feature: datainfo available to users i.e. NC boundaries/names, SR info/data, etc (user friendly map info/data)info available to users i.e. NC boundaries/names, SR info/data, etc (user friendly map info/data)size: 3ptCan be done in 13-18 hoursCan be done in 13-18 hours
Type
Projects
Status
Icebox (on hold)