Skip to content

Releases: macrocosm-os/data-universe

Into s3 buckets

22 Apr 19:33
3dd11bf
Compare
Choose a tag to compare
  • Enabled uploads into s3 buckets
  • We to start with dual uploads HF + S3 to not disrupt the current products
  • Once we everything works correctly and secure, we to disable HF uploads
  • If miners can upload the data with no issues, we will deprecate the HF rewards and enable the S3 validation ( without dehydration so you will upload the data much faster) with **keyword DD ** which will move away us from hastags/subreddits on data you already uploaded.
  • prevent potential media content spoofing in X tweet validation

Media Universe S3 Infra

18 Apr 15:49
2d48a6d
Compare
Choose a tag to compare

Release 1.9.0 – Media Universe

This release introduces the infrastructure required for full migration to S3 storage and launches the Media Universe — an extension of our tweet model and validation system with native support for tweet media.

🚧 Infrastructure (S3 Auth, Signatures, Delay)

This release sets up the core S3 upload mechanism using presigned POST policies authenticated with Bittensor commitments and Keypair signatures.

Uploads are paused until after Easter to allow time for testing.

We will begin dual-storage (S3 + HF) temporarily and fully migrate after testing.

Keypair signing integration will be finalized before uploads resume, securing all interactions.

:camera_with_flash: Media Universe: Native Support for Tweet Media

We’re launching media validation in tweets to support richer content comparison and enable future CV-based models.

  1. Added Media Field to XContent
    media: Optional[List[str]] added to the content schema

Uses exclude_none=True for full backward compatibility

  1. Introduced MEDIA_REQUIRED_DATE
    New constant MEDIA_REQUIRED_DATE = 2025-05-15

Media validation only applies to tweets after this date

  1. Scraper Upgrades
    Updated ApiDojoTwitterScraper and MicroworldsTwitterScraper to extract media_urls

Validators now extract and store media for comparison

  1. Validation Logic
    validate_tweet_content() now checks:

If validator has media, miner must too

If both have media, media count must match

Older tweets skip media validation

  1. Tests Added
    Media extraction tests

Media validation tests

✅ How This Works With Existing Data
No migration required

Optional field means old records still load fine

Validation is gradual, with enforcement starting May 15

We recommend miners start integrating media support immediately.

Media Universe is live.
S3 infrastructure is ready.
Uploads enabled after Easter.

Update the docs with S3 storage implementation

09 Apr 20:47
12a2694
Compare
Choose a tag to compare

Update the docs with S3 storage implementation

Release 1.8.1

17 Mar 16:35
443c293
Compare
Choose a tag to compare

Enhanced On-Demand API Release Announcement
New X/Twitter Data Enrichment Feature
We're excited to announce a significant upgrade to the Data Universe On-Demand API! Starting tomorrow, miners will be able to test this enhanced functionality on testnet using their existing hotkeys.
What's New
The enhanced API now delivers substantially richer X/Twitter content, including:

Comprehensive User Metadata

User display names, verification status, follower counts
Profile details and engagement metrics

Complete Tweet Context

Full engagement metrics (likes, retweets, replies, quotes, views)
Tweet classification (reply, quote, retweet)
Conversation threading information

Rich Media Support

Media URLs and content types
Support for photos and videos

Enhanced Value

More valuable data for validators and users
Better content analysis possibilities

How to Get Started
Miners can test this enhanced functionality on testnet starting tomorrow using their existing hotkeys. Installation is simple and backward compatible - the enhanced scraper will be available for all X/Twitter requests while Reddit functionality remains unchanged.
Implementation Benefits

Higher Quality Data: Deliver richer, more valuable content to validators
Competitive Edge: Enhanced content can lead to better scores in validation
Future-Ready: Positioning for upcoming data quality measurements

Miner Policy

To launch Gravity as a commercial product, we need to adhere to legal guidelines. The miner policy provides SN13 a basis for legality and outlines measures that miners should adhere to when scraping.
The miner policy is now displayed in the SN13 docs. It includes prohibiting the scraping of harmful or illegal content, and outlines legal responsibilities of data collection. We ask that you read this over and should you need to, make appropriate changes immediately.
Datasets uploaded to Hugging Face now display the Macrocosmos Miner Policy in dataset cards.
API Improvements

[CONTINUE HERE FOR ON-DEMAND API]

New Endpoint: list_hf_repo_names
Returns the list of distinct miner Hugging Face repos currently stored by the Validator.

Release 1.8.0

03 Mar 22:08
e26dd8c
Compare
Choose a tag to compare

Release 1.8.0

Key Enhancements

API Database Stability Fix
Fixed critical issues with the API key database system
Implemented more robust database initialization
Added improved error handling for database operations
Enhanced On-Demand Data Verification

Added validator verification when miners return empty results
Penalizes miners who fail to return data that actually exists
Provides users with data even when miners fail to deliver

Release 1.7.9

27 Feb 17:33
23d5cd6
Compare
Choose a tag to compare

In this update, we added improvements to on-demand data requests to deliver better results to API users and support future collaborations with other subnets, who will make use of this feature.

On-Demand Request Changes:

  • Now queries 5 miners instead of just 1 for each request ( random coldkeys).
  • Added consistency checks between miner responses
  • Implemented occasional validation of returned data (5% of requests)
  • Added small credibility penalties for miners who return bad data
  • Improved handling of empty results and non-existent data queries
  • Better selection logic to return the most reliable data to users

The process is as follows:

  • Select up to 5 diverse miners from top 60% performers ( by coldkey)
  • Query all selected miners with the same request
  • Check consistency among responses (within 30% of median)
  • Validate data in 5% of cases or when consistency is poor
  • Apply small credibility for bad data (0.01-0.05)
  • Choose best data to return from the following:
    a. Validated miners with highest score
    b. Consistent miners with most data
    c. Median response when inconsistent
  • Return unique results to API user

Release 1.7.85

25 Feb 19:38
33f5225
Compare
Choose a tag to compare

In this update:

  • Removed labels with greater than 140 characters from Dynamic Desirability uploads and retrieval.
  • Fixed datetime fromisoformat error when if commit date is greater than 19 hours old.
    No action needed from miners.

Release 1.7.84

20 Feb 15:40
11b25be
Compare
Choose a tag to compare
  • A label weight can have a max value of 5 when incentivized by dynamic desirability
  • change label limit from 32 to 140 chars
  • filter out Unexpected header key encountered logs

Release 1.7.83

19 Feb 17:06
dfb2fb1
Compare
Choose a tag to compare

Temporarily remove parquet check.
Change base miner code to upload data every 17h.
Increase max total dynamic desirability value from 100 to 250

Hotfix of < vs >

17 Feb 22:34
89fe222
Compare
Choose a tag to compare

Hotfix of < vs >