·
21 commits
to master
since this release
Use the following link to download the dataset: Dataset Download Link
The dataset has close to 16 million news stories. The dataset file has each stock news story as a line in JSON format in reverse chronological order. An example news story in prettified multi-line JSON format is shown below:
{
"title": "Tech giants Nvidia, OpenAI and others join forces for massive UAE Stargate AI data center",
"url": "https://qz.com/american-tech-partners-with-uae-for-new-ai-data-center-1851781991",
"unix_timestamp": 1747936200,
"id": "-3744939139222479336",
"tickers_direct": [
".openai",
"orcl",
"nvda"
],
"tickers_indirect": [
"csco"
],
"description": "A group of global tech giants gathered in Abu Dhabi to pose for a photo as anAI supergroup, including OpenAI's Sam Altman, Oracle's (ORCL) Larry Ellison, Nvidia's (NVDA) Jensen Huang, and Chuck Robbins of Cisco (CSCO), along with their new UAE partners. Read more..."
}
The fields of the JSON blob are explained below. Most of the fields have the same semantics as the ones in the response of TickerTick API.
| Field name | Meaning | Optional field? (If yes, this field can be missing) |
|---|---|---|
| title | The title of this news story | No |
| url | The original URL for the full news story | No |
| unix_timestamp | The UNIX timestamp when the news was reported | No |
| id | A unique string ID of this news story | No |
| description | A short description of this news story | Yes |
| tickers_direct | The tickers that the news story is directly about, e.g., the name of the company for the ticker is mentioned | Yes |
| tickers_indirect | The tickers that the news story is indirectly about, e.g., the CEO or a product of the company for this ticker is mentioned | Yes |
Note that many well-known pre-IPO startups (e.g., Bytedance, the parent company of TikTok) have made-up tickers like .openai and .databricks.