Skip to content

Commit 113d29f

Browse files
committed
Align Databento docs with public schemas
- Document public OHLCV, statistics, and status API usage - Clarify Databento timestamps, venue mapping, and live behavior - Preserve nested `.venv` artifacts during build cleanup
1 parent ee833a9 commit 113d29f

2 files changed

Lines changed: 77 additions & 53 deletions

File tree

Makefile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -314,10 +314,10 @@ clean-build-artifacts: #-- Clean compiled artifacts (.so, .dll, .pyc, .c files)
314314
find target target-v2 -name "*.rmeta" -delete 2>/dev/null || true
315315
rm -rf target/*/build target/*/deps target-v2/*/build target-v2/*/deps 2>/dev/null || true
316316
# Clean Python build artifacts
317-
find . -type d -name "__pycache__" -not -path "./.venv*" -exec rm -rf {} + 2>/dev/null || true
318-
find . -type f -name "*.c" -not -path "./.venv*" -not -path "./target/*" -not -path "./target-v2/*" -exec rm -f {} + 2>/dev/null || true
319-
find . -type f -a \( -name "*.pyc" -o -name "*.pyo" \) -not -path "./.venv*" -exec rm -f {} + 2>/dev/null || true
320-
find . -type f -a \( -name "*.so" -o -name "*.dll" -o -name "*.dylib" \) -not -path "./.venv*" -exec rm -f {} + 2>/dev/null || true
317+
find . -type d -name "__pycache__" -not -path "*/.venv*" -exec rm -rf {} + 2>/dev/null || true
318+
find . -type f -name "*.c" -not -path "*/.venv*" -not -path "./target/*" -not -path "./target-v2/*" -exec rm -f {} + 2>/dev/null || true
319+
find . -type f -a \( -name "*.pyc" -o -name "*.pyo" \) -not -path "*/.venv*" -exec rm -f {} + 2>/dev/null || true
320+
find . -type f -a \( -name "*.so" -o -name "*.dll" -o -name "*.dylib" \) -not -path "*/.venv*" -exec rm -f {} + 2>/dev/null || true
321321
rm -rf build/ cython_debug/ 2>/dev/null || true
322322
# Clean test artifacts
323323
rm -rf .coverage .benchmarks 2>/dev/null || true

docs/integrations/databento.md

Lines changed: 73 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -97,9 +97,9 @@ The following Databento schemas are supported by NautilusTrader:
9797
:::note
9898
Databento also documents reference schemas, including corporate actions,
9999
adjustment factors, and security master data. This adapter currently maps only
100-
the schemas listed above to Nautilus data types. The Databento DBN crate also
101-
exposes `ohlcv-eod`; Nautilus keeps an adapter-level override for daily bars,
102-
while the public Databento schema docs list `ohlcv-1d` for daily OHLCV.
100+
the schemas listed above to Nautilus data types. Daily Databento OHLCV uses
101+
`ohlcv-1d`. Official settlement prices and open interest come from the
102+
`statistics` schema, not OHLCV bars.
103103
:::
104104

105105
:::info
@@ -124,7 +124,8 @@ the `u8` Arrow column width used for persistence.
124124
events. Choose them for a complete top-of-book event tape. For quote and trade
125125
alignment, prefer TBBO or TCBBO.
126126
- **MBP-10 (L2)**: Top 10 levels with trades. Use it for depth-aware strategies
127-
that do not need full MBO data. Includes orders per level.
127+
that do not need full MBO data. Includes orders per level. Databento order
128+
book depth subscriptions support only `depth=10`.
128129
- **MBO (L3)**: Per-order events for queue position modeling and exact book
129130
reconstruction. Start at node initialization for proper replay context.
130131
- **BBO_1S/BBO_1M and CBBO_1S/CBBO_1M**: Sampled top-of-book updates at fixed
@@ -134,9 +135,13 @@ the `u8` Arrow column width used for persistence.
134135
- **TRADES**: Trades only. Pair with MBP-1 (`include_trades=True`) or use TBBO
135136
or TCBBO for quote context with trades.
136137
- **OHLCV**: Aggregated bars from trades. Use them for higher-timeframe
137-
analytics. Set `bars_timestamp_on_close=True` for close timestamps.
138-
- **Imbalance, statistics, and status**: Venue operational data. Subscribe via
138+
analytics. Set `bars_timestamp_on_close=True` for close timestamps. Daily
139+
bars use `ohlcv-1d`; use `statistics` for official settlements and open
140+
interest.
141+
- **Imbalance and statistics**: Venue operational data. Subscribe via
139142
`subscribe_data` with a `DataType` carrying `instrument_id` metadata.
143+
- **Status**: Venue trading-state updates. Subscribe via
144+
`subscribe_instrument_status`.
140145

141146
:::tip
142147
Consolidated schemas (CMBP_1, CBBO_1S, CBBO_1M, TCBBO) aggregate data across
@@ -275,17 +280,11 @@ self.subscribe_bars(
275280
self.subscribe_bars(
276281
bar_type=BarType.from_str(f"{instrument_id}-1-DAY-LAST-EXTERNAL")
277282
)
278-
279-
# Subscribe to daily bars with the adapter's end-of-day override
280-
self.subscribe_bars(
281-
bar_type=BarType.from_str(f"{instrument_id}-1-DAY-LAST-EXTERNAL"),
282-
params={"schema": "ohlcv-eod"},
283-
)
284283
```
285284

286285
### Custom data type subscriptions
287286

288-
Imbalance, statistics, and status data require the generic `subscribe_data` method:
287+
Imbalance and statistics data require the generic `subscribe_data` method:
289288

290289
```python
291290
from nautilus_trader.adapters.databento import DATABENTO_CLIENT_ID
@@ -304,11 +303,14 @@ self.subscribe_data(
304303
data_type=DataType(DatabentoStatistics, metadata={"instrument_id": instrument_id}),
305304
client_id=DATABENTO_CLIENT_ID,
306305
)
306+
```
307+
308+
Instrument status uses the dedicated status subscription API:
307309

310+
```python
308311
# Subscribe to instrument status updates
309-
from nautilus_trader.model.data import InstrumentStatus
310-
self.subscribe_data(
311-
data_type=DataType(InstrumentStatus, metadata={"instrument_id": instrument_id}),
312+
self.subscribe_instrument_status(
313+
instrument_id=instrument_id,
312314
client_id=DATABENTO_CLIENT_ID,
313315
)
314316
```
@@ -321,16 +323,18 @@ does not provide one. Databento only guarantees this ID is unique within a given
321323
day. This differs from the Nautilus `InstrumentId`, a string of symbol + venue
322324
separated by a period: `"{symbol}.{venue}"`.
323325

324-
The decoder maps the Databento `raw_symbol` to the Nautilus `symbol` and uses an
325-
[ISO 10383 market identifier code](https://www.iso20022.org/market-identifier-codes)
326-
from the definition message for the Nautilus `venue`.
326+
The decoder maps the Databento `raw_symbol` to the Nautilus `symbol`. Publisher
327+
IDs map to the default Nautilus venue through `publishers.json`. Subscription
328+
`InstrumentId` metadata can also seed the symbol-to-venue map before market data
329+
arrives.
327330

328331
Databento identifies datasets with a *dataset ID*, separate from venue identifiers.
329332
See [Databento dataset naming conventions](https://databento.com/docs/api-reference-historical/basics/datasets)
330333
for details.
331334

332-
For CME Globex MDP 3.0 (`GLBX.MDP3`), these exchanges group under the `GLBX` venue.
333-
The instrument's `exchange` field determines the mapping:
335+
For CME Globex MDP 3.0 (`GLBX.MDP3`), publisher defaults map to the `GLBX`
336+
venue. When `use_exchange_as_venue=True`, definition messages can override
337+
`GLBX` with the instrument's exchange MIC:
334338

335339
- `CBCM`: XCME-XCBT inter-exchange spread
336340
- `NYUM`: XNYM-DUMX inter-exchange spread
@@ -359,10 +363,13 @@ Nautilus data requires at least two timestamps (per the `Data` contract):
359363
- `ts_event`: UNIX timestamp (nanoseconds) when the data event occurred.
360364
- `ts_init`: UNIX timestamp (nanoseconds) when the data instance was created.
361365

362-
The decoder maps Databento `ts_recv` to Nautilus `ts_event`. This timestamp is
363-
more reliable and monotonically increases per Databento symbol. The exceptions are
364-
`DatabentoImbalance` and `DatabentoStatistics`, which carry all timestamp fields
365-
since they are adapter-specific types.
366+
Quote and trade-like schemas map Databento `ts_recv` to Nautilus `ts_event`
367+
because it is more reliable and monotonically increases per Databento symbol.
368+
Bars use the DBN bar interval timestamp; `bars_timestamp_on_close` controls
369+
whether Nautilus bars use the interval open or close timestamp. `InstrumentStatus`
370+
uses the status event timestamp from the decoded status message.
371+
`DatabentoImbalance` and `DatabentoStatistics` preserve Databento timestamp
372+
fields because they are adapter-specific types.
366373

367374
:::info
368375
See these Databento docs for details:
@@ -395,6 +402,7 @@ to the appropriate Nautilus `Instrument` type.
395402
| Option spread | `T` | `OptionSpread` |
396403
| Mixed spread | `M` | `OptionSpread` |
397404
| FX spot | `X` | `CurrencyPair` |
405+
| Index | `I` | Not yet available |
398406
| Bond | `B` | Not yet available |
399407

400408
### Price precision
@@ -476,8 +484,12 @@ matches the venue's inability to distinguish them.
476484

477485
### OHLCV (bar aggregates)
478486

479-
Databento timestamps bar messages at the **open** of the interval. The decoder
480-
normalizes `ts_event` to the bar **close** (original `ts_event` + interval).
487+
Databento timestamps bar messages at the **open** of the interval. By default,
488+
the decoder normalizes bar `ts_event` to the bar **close**: the original
489+
`ts_event` plus the interval. `ts_init` uses the live receipt time, or the close
490+
time for historical and file-based loads when no explicit init timestamp is
491+
supplied. Set `bars_timestamp_on_close=False` to timestamp bar `ts_event` on
492+
the interval open.
481493

482494
### Imbalance and statistics
483495

@@ -509,8 +521,10 @@ self.subscribe_data(
509521
)
510522
```
511523

512-
Request the previous day's `statistics` for the `ES.FUT` parent symbol
513-
(all active E-mini S&P 500 futures):
524+
Request a bounded range of `statistics` for the `ES.FUT` parent symbol
525+
(all active E-mini S&P 500 futures). Use Databento's Historical
526+
[`metadata.get_cost`](https://databento.com/docs/api-reference-historical/metadata/metadata-get-cost)
527+
endpoint before real historical pulls:
514528

515529
```python
516530
from nautilus_trader.adapters.databento import DATABENTO_CLIENT_ID
@@ -521,6 +535,7 @@ instrument_id = InstrumentId.from_str("ES.FUT.GLBX")
521535
metadata = {
522536
"instrument_id": instrument_id,
523537
"start": "2024-03-06",
538+
"end": "2024-03-07",
524539
}
525540
self.request_data(
526541
data_type=DataType(DatabentoStatistics, metadata=metadata),
@@ -608,6 +623,11 @@ decoding DBN per run.
608623
Performance benchmarks are under development.
609624
:::
610625

626+
For live data, decoded delivery from the feed handler to Nautilus is
627+
intentionally unbounded. This prevents slow consumers from stalling the feed
628+
path; a process under memory pressure should fail rather than block live
629+
decoding.
630+
611631
## Loading DBN data
612632

613633
The `DatabentoDataLoader` class loads DBN files and converts records to Nautilus
@@ -858,19 +878,19 @@ node.build()
858878

859879
### Configuration parameters
860880

861-
| Option | Default | Description |
862-
|---------------------------|---------|----------------------------------------------------------------------------------------------------------------------|
863-
| `api_key` | `None` | Databento API secret. Falls back to the `DATABENTO_API_KEY` environment variable when `None`. |
864-
| `http_gateway` | `None` | Historical HTTP gateway override for testing custom endpoints. |
865-
| `live_gateway` | `None` | Raw TCP real‑time gateway override, typically for testing only. |
866-
| `use_exchange_as_venue` | `True` | Use the exchange MIC for Nautilus venues (e.g., `XCME`). `False` retains the default GLBX mapping. |
867-
| `timeout_initial_load` | `15.0` | Seconds to wait for instrument definitions per dataset before proceeding. |
868-
| `mbo_subscriptions_delay` | `3.0` | Seconds to buffer before enabling MBO/L3 streams so initial snapshots replay in order. |
869-
| `bars_timestamp_on_close` | `True` | Timestamp bars on the close (`ts_event`/`ts_init`). `False` timestamps on the open. |
870-
| `reconnect_timeout_mins` | `10` | Minutes to attempt reconnection before giving up. `None` retries indefinitely. See [Connection stability](#connection-stability). |
871-
| `venue_dataset_map` | `None` | Optional Nautilus venue to Databento dataset code mapping. |
872-
| `parent_symbols` | `None` | Optional `{dataset: {parent symbols}}` to preload definition trees (e.g., `{"GLBX.MDP3": {"ES.FUT", "ES.OPT"}}`). |
873-
| `instrument_ids` | `None` | Nautilus `InstrumentId` values to preload definitions for at startup. |
881+
| Option | Default | Description |
882+
|---------------------------|---------|------------------------------------------------------------------------------------------------------------------------------|
883+
| `api_key` | `None` | Databento API secret. Falls back to the `DATABENTO_API_KEY` environment variable when `None`. |
884+
| `http_gateway` | `None` | Historical HTTP gateway override for testing custom endpoints. |
885+
| `live_gateway` | `None` | Raw TCP real‑time gateway override, typically for testing only. |
886+
| `use_exchange_as_venue` | `True` | Override GLBX definition venues with the exchange MIC when definitions include one. `False` keeps the publisher‑map default. |
887+
| `timeout_initial_load` | `15.0` | Seconds to wait for instrument definitions per dataset before proceeding. |
888+
| `mbo_subscriptions_delay` | `3.0` | Seconds to buffer before enabling MBO/L3 streams so initial snapshots replay in order. |
889+
| `bars_timestamp_on_close` | `True` | Timestamp bar `ts_event` on close. `False` timestamps bar `ts_event` on open. |
890+
| `reconnect_timeout_mins` | `10` | Minutes to retry before giving up. `None` retries indefinitely. See [Connection stability](#connection-stability). |
891+
| `venue_dataset_map` | `None` | Optional Nautilus venue to Databento dataset code mapping. |
892+
| `parent_symbols` | `None` | Optional `{dataset: {parent symbols}}` to preload definition trees (e.g., `{"GLBX.MDP3": {"ES.FUT", "ES.OPT"}}`). |
893+
| `instrument_ids` | `None` | Nautilus `InstrumentId` values to preload definitions for at startup. |
874894

875895
:::tip
876896
Use environment variables for credentials.
@@ -881,7 +901,7 @@ Use environment variables for credentials.
881901
The live client reconnects automatically on:
882902

883903
- **Network interruptions**: Temporary connectivity issues.
884-
- **Gateway restarts**: Databento Sunday maintenance. See the
904+
- **Gateway restarts**: Databento scheduled live gateway restarts. See the
885905
[maintenance schedule](https://databento.com/docs/api-reference-live/basics#maintenance-schedule).
886906
- **Market closures**: Sessions ending during off-hours.
887907

@@ -908,6 +928,10 @@ All reconnections include:
908928
- **Automatic resubscription**: Restores all active subscriptions after reconnecting.
909929
- **Cycle reset**: Each successful session (>60s) resets the timeout clock.
910930

931+
Individual unsubscribe requests log a warning and are ignored because Databento
932+
live sessions do not support granular unsubscribe. Stop the session to remove a
933+
subscription from the live gateway.
934+
911935
#### Timeout configuration
912936

913937
The `reconnect_timeout_mins` parameter controls how long the client attempts reconnection:
@@ -927,13 +951,13 @@ persistent configuration or authentication issues.
927951

928952
#### Scheduled maintenance
929953

930-
Databento restarts live gateways every Sunday (all clients disconnect):
954+
Databento restarts live gateways on this schedule (all clients disconnect):
931955

932-
| Dataset | Maintenance time (UTC) |
933-
|--------------------|------------------------|
934-
| CME Globex | 09:30 |
935-
| All ICE venues | 09:45 |
936-
| All other datasets | 10:30 |
956+
| Dataset | Restart time |
957+
|--------------------|-------------------|
958+
| CME Globex | Saturday 02:15 CT |
959+
| All ICE venues | Sunday 09:45 UTC |
960+
| All other datasets | Sunday 10:30 UTC |
937961

938962
The default 10-minute timeout covers typical restarts. For unattended systems,
939963
use `reconnect_timeout_mins=None` or a longer value. See the

0 commit comments

Comments
 (0)