Parquet Format | TickTradingData Docs

Choose Parquet when the destination is analysis, not generic interchange.

For most research workflows, this is the best default format. It keeps the same raw event stream as CSV, but stores it in a typed, columnar layout that is easier to process at scale.

At A Glance

Property	Value
extension	`.parquet`
wrapper	no extra `.gz` wrapper
layout	columnar
raw data or summary	raw event stream
timestamps inside rows	`UTC`, nanosecond precision
file day convention	`YYYYMMDD` grouped by `America/New_York` market day
best fit	Python, Polars, DuckDB, pandas, large reads, typed analytics

What You Get

Each file contains the same daily raw dataset you can retrieve as CSV, but with a stricter typed schema.

SFTP path example:

bash

data/parquet/ES/06-25/20250601.parquet

Column Schema

Column	Type	Nullable	Meaning
`level`	`string`	no	`L1` or `L2`
`mdt`	`int8`	no	market data type code
`timestamp`	`timestamp(ns, UTC)`	no	event timestamp
`operation`	`int8`	yes	add, update, remove for `L2`
`depth`	`int8`	yes	price level index for `L2`
`market_maker`	`string`	yes	reserved, currently null
`price`	`decimal128(18,8)`	no	exact price value
`volume`	`int32`	no	trade size or quote size

Why Users Choose Parquet

typed columns reduce parsing work
prices stay exact as decimal128(18,8)
timestamps keep nanosecond precision without string parsing
analytics engines can read only the columns they need
large pulls are usually smaller and faster to work with than equivalent text files

Interpretation Rules

The logical meaning is the same as CSV:

L1 rows contain best bid, best ask, trades, and session statistics
L2 rows contain order book depth updates
mdt uses the same 0 to 9 codes
operation and depth are null for L1
file name day is America/New_York, while event timestamps are UTC

Practical Advantage Over CSV

If you already know the destination is a research stack, Parquet removes several headaches:

no delimiter handling
no string-to-number conversion pass
no string timestamp parsing pass
no floating-point drift from reading price as free-form text

That matters more as the number of files grows.

Minimal Example

python

import polars as pl

df = pl.read_parquet("20250601.parquet")
trades = df.filter((pl.col("level") == "L1") & (pl.col("mdt") == 2))

If you need maximum compatibility with legacy tools, go back to CSV. If you are building an analysis workflow, continue with Python Analysis.