Python Analysis
Load TickTradingData files in Python, separate trades from book updates, and build practical research workflows on top of CSV or Parquet.
Use this workflow when your destination is research, analytics, or custom tooling rather than direct platform replay.
For most Python users, the best default is Parquet. It gives you typed columns, better multi-file reads, and less parsing work than raw text.
Use CSV when you need to inspect the raw rows directly or want a format that is easy to move through generic tooling before you settle on a final pipeline.
Choose The Right Input First
Before you write any code, decide whether you need raw event parsing or analytics-friendly loading.
| Input | Best when... | Best next step |
|---|---|---|
| Parquet | you want the default Python path for analytics, filtering, and multi-file work | start with Polars, pandas, DuckDB, or PyArrow |
| CSV | you want to inspect raw rows, validate event logic, or write a custom parser | load one file first and verify how you separate L1, L2, and mdt values |
| Stats Files | you want to screen days quickly before loading the raw tape | filter candidate days first, then pull the matching Parquet or CSV files |
A Good Default Stack
If you want one safe recommendation for Python:
- choose Parquet
- start with Polars for data loading and filtering
- move to pandas only if your existing stack already depends on it
- add DuckDB later if you want SQL over many files
That path keeps the first workflow simple and scales well as the dataset grows.
Load One File First
Start with one day and verify the event semantics before you touch a larger date range.
Parquet example with Polars
import polars as pl
df = pl.read_parquet("20250601.parquet")
print(df.schema)
print(df.head())
This is the fastest way to confirm:
- the columns you actually have
- the timestamp type
- the exact numeric types for
priceandvolume
CSV example with pandas
import pandas as pd
df = pd.read_csv("20250601.csv.gz", sep=";")
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)
print(df.head())
Use CSV when you want to inspect the raw row layout directly or compare parser behavior with another system.
Separate Trades From Quotes And Depth
The raw files mix several event families. Your first useful filter is usually the trade stream.
Trade-only extraction
import polars as pl
df = pl.read_parquet("20250601.parquet")
trades = (
df.filter(
(pl.col("level") == "L1") &
(pl.col("mdt") == 2)
)
.select(["timestamp", "price", "volume"])
)
print(trades.head())
Interpretation:
level = L1keeps top-of-book and trade-style rowsmdt = 2keeps executed trades only
If you need quote events instead, work with mdt = 0 for ask and mdt = 1 for bid. If you need book depth updates, move to the L2 rows and include operation and depth in your logic.
Work With Many Files
Once one day looks correct, move to a multi-file workflow. This is where Parquet becomes the better default.
Multi-file scan with Polars
import polars as pl
df = pl.scan_parquet("data/parquet/ES/06-25/*.parquet")
daily_trade_volume = (
df.filter(
(pl.col("level") == "L1") &
(pl.col("mdt") == 2)
)
.group_by(
pl.col("timestamp").dt.date().alias("day")
)
.agg(pl.col("volume").sum().alias("trade_volume"))
.sort("day")
.collect()
)
print(daily_trade_volume)
This pattern is a good base for:
- daily summaries
- contract comparisons
- date-range filtering
- screening days before more expensive analysis
Practical Patterns Worth Using Early
You do not need a huge framework to get value from the files.
Useful first analyses
- Load one day and confirm the schema.
- Filter trade rows only.
- Count rows by
levelandmdtso you know what is really in the file. - Build a simple daily or hourly volume summary.
- Only after that, move to derived outputs such as OHLCV, spread analysis, or depth reconstruction.
Example row counts by event type
import polars as pl
df = pl.read_parquet("20250601.parquet")
counts = (
df.group_by(["level", "mdt"])
.len()
.sort(["level", "mdt"])
)
print(counts)
When To Leave Python And Use Another Workflow
Python is the right layer for research and custom analysis. It is not always the right layer for every end goal.
- If your destination is direct replay in the platform, continue with NinjaTrader.
- If your goal is derived bars with a ready-made utility, continue with Generate OHLCV.
- If you are still choosing the source format, go back to File Formats.
