Skip to content

DEV - Create Huggingface repositories to sort annual data into quarters #1923

@ryanfchase

Description

@ryanfchase

Dependency

  • We need to see that there is a demand for Huggingface as a backup to the Socrata API before we proceed
    • every time we add sub-categorizations of data, we need to write logic to "bridge" the datasets
    • this adds complexity that I think we should avoid unless we find that Socrata API is just not reliable

Overview

We need to create quarterly Huggingface Repositories so that we can load smaller chunks of data when using Hugginface, which will increase load times and lead to a better user experience.

More Information

We will start with quarters for 2024, simply because each repository will require extra work to ensure our cron job is updating each repo correctly. Completing this work can open up the same work to be done on 2023 and years prior. While this ticket is not blocked by the ticket to create Hf repo for 2025 (#1895), it is closely related.

Action Items

  • PM or dev lead: create the following Hugginface Repositories
  • modify hfClean() to additionally create quarterly datasets locally. Consider using naming convention similar to the ones we make in Huggingface
  • modify hfUpload() to also upload those files to the quarterly data repos

Resources/Instructions

Related Tickets

Other Resources

Metadata

Metadata

Assignees

No one assigned

    Labels

    Complexity: Mediumrequire research/investigation before completing; internal team info/input or external team questionDependencyAn issue that includes dependenciesRole: Data ScienceData management, loading, or analysisp-feature: datainfo available to users i.e. NC boundaries/names, SR info/data, etc (user friendly map info/data)size: 3ptCan be done in 13-18 hours

    Type

    No type

    Projects

    Status

    Icebox (on hold)

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions