Releases: macrocosm-os/data-universe
Into s3 buckets
- Enabled uploads into s3 buckets
- We to start with dual uploads HF + S3 to not disrupt the current products
- Once we everything works correctly and secure, we to disable HF uploads
- If miners can upload the data with no issues, we will deprecate the HF rewards and enable the S3 validation ( without dehydration so you will upload the data much faster) with **keyword DD ** which will move away us from hastags/subreddits on data you already uploaded.
- prevent potential media content spoofing in X tweet validation
Media Universe S3 Infra
Release 1.9.0 – Media Universe
This release introduces the infrastructure required for full migration to S3 storage and launches the Media Universe — an extension of our tweet model and validation system with native support for tweet media.
🚧 Infrastructure (S3 Auth, Signatures, Delay)
This release sets up the core S3 upload mechanism using presigned POST policies authenticated with Bittensor commitments and Keypair signatures.
Uploads are paused until after Easter to allow time for testing.
We will begin dual-storage (S3 + HF) temporarily and fully migrate after testing.
Keypair signing integration will be finalized before uploads resume, securing all interactions.
:camera_with_flash: Media Universe: Native Support for Tweet Media
We’re launching media validation in tweets to support richer content comparison and enable future CV-based models.
- Added Media Field to XContent
media: Optional[List[str]] added to the content schema
Uses exclude_none=True for full backward compatibility
- Introduced MEDIA_REQUIRED_DATE
New constant MEDIA_REQUIRED_DATE = 2025-05-15
Media validation only applies to tweets after this date
- Scraper Upgrades
Updated ApiDojoTwitterScraper and MicroworldsTwitterScraper to extract media_urls
Validators now extract and store media for comparison
- Validation Logic
validate_tweet_content() now checks:
If validator has media, miner must too
If both have media, media count must match
Older tweets skip media validation
- Tests Added
Media extraction tests
Media validation tests
✅ How This Works With Existing Data
No migration required
Optional field means old records still load fine
Validation is gradual, with enforcement starting May 15
We recommend miners start integrating media support immediately.
Media Universe is live.
S3 infrastructure is ready.
Uploads enabled after Easter.
Update the docs with S3 storage implementation
Update the docs with S3 storage implementation
Release 1.8.1
Enhanced On-Demand API Release Announcement
New X/Twitter Data Enrichment Feature
We're excited to announce a significant upgrade to the Data Universe On-Demand API! Starting tomorrow, miners will be able to test this enhanced functionality on testnet using their existing hotkeys.
What's New
The enhanced API now delivers substantially richer X/Twitter content, including:
Comprehensive User Metadata
User display names, verification status, follower counts
Profile details and engagement metrics
Complete Tweet Context
Full engagement metrics (likes, retweets, replies, quotes, views)
Tweet classification (reply, quote, retweet)
Conversation threading information
Rich Media Support
Media URLs and content types
Support for photos and videos
Enhanced Value
More valuable data for validators and users
Better content analysis possibilities
How to Get Started
Miners can test this enhanced functionality on testnet starting tomorrow using their existing hotkeys. Installation is simple and backward compatible - the enhanced scraper will be available for all X/Twitter requests while Reddit functionality remains unchanged.
Implementation Benefits
Higher Quality Data: Deliver richer, more valuable content to validators
Competitive Edge: Enhanced content can lead to better scores in validation
Future-Ready: Positioning for upcoming data quality measurements
Miner Policy
To launch Gravity as a commercial product, we need to adhere to legal guidelines. The miner policy provides SN13 a basis for legality and outlines measures that miners should adhere to when scraping.
The miner policy is now displayed in the SN13 docs. It includes prohibiting the scraping of harmful or illegal content, and outlines legal responsibilities of data collection. We ask that you read this over and should you need to, make appropriate changes immediately.
Datasets uploaded to Hugging Face now display the Macrocosmos Miner Policy in dataset cards.
API Improvements
[CONTINUE HERE FOR ON-DEMAND API]
New Endpoint: list_hf_repo_names
Returns the list of distinct miner Hugging Face repos currently stored by the Validator.
Release 1.8.0
Release 1.8.0
Key Enhancements
API Database Stability Fix
Fixed critical issues with the API key database system
Implemented more robust database initialization
Added improved error handling for database operations
Enhanced On-Demand Data Verification
Added validator verification when miners return empty results
Penalizes miners who fail to return data that actually exists
Provides users with data even when miners fail to deliver
Release 1.7.9
In this update, we added improvements to on-demand data requests to deliver better results to API users and support future collaborations with other subnets, who will make use of this feature.
On-Demand Request Changes:
- Now queries 5 miners instead of just 1 for each request ( random coldkeys).
- Added consistency checks between miner responses
- Implemented occasional validation of returned data (5% of requests)
- Added small credibility penalties for miners who return bad data
- Improved handling of empty results and non-existent data queries
- Better selection logic to return the most reliable data to users
The process is as follows:
- Select up to 5 diverse miners from top 60% performers ( by coldkey)
- Query all selected miners with the same request
- Check consistency among responses (within 30% of median)
- Validate data in 5% of cases or when consistency is poor
- Apply small credibility for bad data (0.01-0.05)
- Choose best data to return from the following:
a. Validated miners with highest score
b. Consistent miners with most data
c. Median response when inconsistent - Return unique results to API user
Release 1.7.85
In this update:
- Removed labels with greater than 140 characters from Dynamic Desirability uploads and retrieval.
- Fixed datetime fromisoformat error when if commit date is greater than 19 hours old.
No action needed from miners.
Release 1.7.84
- A label weight can have a max value of 5 when incentivized by dynamic desirability
- change label limit from 32 to 140 chars
- filter out Unexpected header key encountered logs
Release 1.7.83
Temporarily remove parquet check.
Change base miner code to upload data every 17h.
Increase max total dynamic desirability value from 100 to 250
Hotfix of < vs >
Hotfix of < vs >