Python Analysis | TickTradingData Docs

Use this workflow when your destination is research, analytics, or custom tooling rather than direct platform replay.

For most Python users, the best default is Parquet. It gives you typed columns, better multi-file reads, and less parsing work than raw text.

Use CSV when you need to inspect the raw rows directly or want a format that is easy to move through generic tooling before you settle on a final pipeline.

Choose The Right Input First

Before you write any code, decide whether you need raw event parsing or analytics-friendly loading.

Input	Best when...	Best next step
Parquet	you want the default Python path for analytics, filtering, and multi-file work	start with Polars, pandas, DuckDB, or PyArrow
CSV	you want to inspect raw rows, validate event logic, or write a custom parser	load one file first and verify how you separate `L1`, `L2`, and `mdt` values
Stats Files	you want to screen days quickly before loading the raw tape	filter candidate days first, then pull the matching Parquet or CSV files

A Good Default Stack

If you want one safe recommendation for Python:

choose Parquet
start with Polars for data loading and filtering
move to pandas only if your existing stack already depends on it
add DuckDB later if you want SQL over many files

That path keeps the first workflow simple and scales well as the dataset grows.

Load One File First

Start with one day and verify the event semantics before you touch a larger date range.

Parquet example with Polars

python

import polars as pl

df = pl.read_parquet("20250601.parquet")

print(df.schema)
print(df.head())

This is the fastest way to confirm:

the columns you actually have
the timestamp type
the exact numeric types for price and volume

CSV example with pandas

python

import pandas as pd

df = pd.read_csv("20250601.csv.gz", sep=";")
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)

print(df.head())

Use CSV when you want to inspect the raw row layout directly or compare parser behavior with another system.

Separate Trades From Quotes And Depth

The raw files mix several event families. Your first useful filter is usually the trade stream.

Trade-only extraction

python

import polars as pl

df = pl.read_parquet("20250601.parquet")

trades = (
    df.filter(
        (pl.col("level") == "L1") &
        (pl.col("mdt") == 2)
    )
    .select(["timestamp", "price", "volume"])
)

print(trades.head())

Interpretation:

level = L1 keeps top-of-book and trade-style rows
mdt = 2 keeps executed trades only

If you need quote events instead, work with mdt = 0 for ask and mdt = 1 for bid. If you need book depth updates, move to the L2 rows and include operation and depth in your logic.

Work With Many Files

Once one day looks correct, move to a multi-file workflow. This is where Parquet becomes the better default.

Multi-file scan with Polars

python

import polars as pl

df = pl.scan_parquet("data/parquet/ES/06-25/*.parquet")

daily_trade_volume = (
    df.filter(
        (pl.col("level") == "L1") &
        (pl.col("mdt") == 2)
    )
    .group_by(
        pl.col("timestamp").dt.date().alias("day")
    )
    .agg(pl.col("volume").sum().alias("trade_volume"))
    .sort("day")
    .collect()
)

print(daily_trade_volume)

This pattern is a good base for:

daily summaries
contract comparisons
date-range filtering
screening days before more expensive analysis

Practical Patterns Worth Using Early

You do not need a huge framework to get value from the files.

Useful first analyses

Load one day and confirm the schema.
Filter trade rows only.
Count rows by level and mdt so you know what is really in the file.
Build a simple daily or hourly volume summary.
Only after that, move to derived outputs such as OHLCV, spread analysis, or depth reconstruction.

Example row counts by event type

python

import polars as pl

df = pl.read_parquet("20250601.parquet")

counts = (
    df.group_by(["level", "mdt"])
    .len()
    .sort(["level", "mdt"])
)

print(counts)

When To Leave Python And Use Another Workflow

Python is the right layer for research and custom analysis. It is not always the right layer for every end goal.

If your destination is direct replay in the platform, continue with NinjaTrader.
If your goal is derived bars with a ready-made utility, continue with Generate OHLCV.
If you are still choosing the source format, go back to File Formats.

Related Pages