CSV Format
Use CSV when you want the most portable raw dataset and do not mind larger text files.
Choose CSV when compatibility matters more than storage efficiency.
It is the easiest raw format to inspect manually, ingest with custom scripts, and move through generic tools that do not understand Parquet or NinjaTrader formats.
At A Glance
| Property | Value |
|---|---|
| extension | .csv.gz |
| wrapper | gzip |
| delimiter | semicolon ; |
| header row | yes |
| raw data or summary | raw event stream |
| timestamps inside rows | UTC, ISO8601, nanosecond precision |
| file day convention | YYYYMMDD grouped by America/New_York market day |
| best fit | interoperability, inspection, custom parsers, simple ETL |
What You Get
Each file contains the full daily event stream for one symbol and one expiration.
SFTP path example:
bash
data/csv/ES/06-25/20250601.csv.gz
Every row follows the same eight-field layout:
csv
level;mdt;timestamp;operation;depth;market_maker;price;volume
How To Read The Rows
Two row families appear in the same file:
L1: top of book, trades, and session statisticsL2: order book depth updates
Typical examples:
csv
L1;2;2025-06-01T17:19:57.725675300Z;;;;5913.75;1
L2;0;2025-06-01T15:45:52.803532500Z;0;3;;5914.5;8
Interpretation:
- the first row is a trade print
- the second row is a depth update on the ask side
MDT Codes
mdt | Meaning |
|---|---|
0 | ask |
1 | bid |
2 | trade |
3 | daily high |
4 | daily low |
5 | cumulative volume |
6 | session open |
7 | previous close |
8 | open interest |
9 | settlement price |
For L2 rows:
operationis0,1, or2for add, update, or removedepthis the price level index, where0is the best levelmarket_makeris reserved and currently empty
For L1 rows:
operation,depth, andmarket_makerare blank
Why Users Choose CSV
- it works almost everywhere
- it is easy to inspect without special tooling
- it is a safe interchange format when the destination is unknown
- it is easier than Parquet for quick conversions into custom formats
Common Gotchas
- The delimiter is
;, not a comma. - The file is delivered as
.csv.gz, so keep the gzip wrapper in mind. - The file day in the name is
America/New_York, while row timestamps areUTC. CSVand Parquet carry the same logical dataset. The difference is representation, not coverage.
If your workflow is analytics-first rather than compatibility-first, Parquet is usually the better default.
