Skip to content

spike (protocol-api): database and infra upgrades for scaling tweet index volume #321

@teslashibe

Description

@teslashibe

Problem:

We are seeing a large volume of tweets flowing into the indexing API and have 10's on millions of tweets in the PostgreSQL data base.

Acceptance criteria & questions arising:

  • Autoscaling the containers that run the app to write to the db
  • No alerting on the db service - this would help track the status
  • Monitor and track the db - there is no way to check what is happening with vertical scaling of the db
  • IO and network is increasing as the volume in tweets scale exponentially
  • How big is this going and where do we end up with this?
  • Do we save the tweet data to flat files in S3, from there this can be loaded to any DB
  • Architecture diagram - feel to Ettore on this
  • Whitelist Validator and internal IP addresses for access so the API is secure

The outcome of this ticket is one or more tickets that define a stable system that scales to billions of tweets

added a comment below proposing a path forward that enables us to capture, index, and archive the scale of tweets we envision (billions) in an efficient manner.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions