Python Analysis

Load TickTradingData files in Python, separate trades from book updates, and build practical research workflows on top of CSV or Parquet.


Use this workflow when your destination is research, analytics, or custom tooling rather than direct platform replay.

For most Python users, the best default is Parquet. It gives you typed columns, better multi-file reads, and less parsing work than raw text.

Use CSV when you need to inspect the raw rows directly or want a format that is easy to move through generic tooling before you settle on a final pipeline.

Choose The Right Input First

Before you write any code, decide whether you need raw event parsing or analytics-friendly loading.

InputBest when...Best next step
Parquetyou want the default Python path for analytics, filtering, and multi-file workstart with Polars, pandas, DuckDB, or PyArrow
CSVyou want to inspect raw rows, validate event logic, or write a custom parserload one file first and verify how you separate L1, L2, and mdt values
Stats Filesyou want to screen days quickly before loading the raw tapefilter candidate days first, then pull the matching Parquet or CSV files

A Good Default Stack

If you want one safe recommendation for Python:

  • choose Parquet
  • start with Polars for data loading and filtering
  • move to pandas only if your existing stack already depends on it
  • add DuckDB later if you want SQL over many files

That path keeps the first workflow simple and scales well as the dataset grows.

Load One File First

Start with one day and verify the event semantics before you touch a larger date range.

Parquet example with Polars

python
import polars as pl

df = pl.read_parquet("20250601.parquet")

print(df.schema)
print(df.head())

This is the fastest way to confirm:

  • the columns you actually have
  • the timestamp type
  • the exact numeric types for price and volume

CSV example with pandas

python
import pandas as pd

df = pd.read_csv("20250601.csv.gz", sep=";")
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)

print(df.head())

Use CSV when you want to inspect the raw row layout directly or compare parser behavior with another system.

Separate Trades From Quotes And Depth

The raw files mix several event families. Your first useful filter is usually the trade stream.

Trade-only extraction

python
import polars as pl

df = pl.read_parquet("20250601.parquet")

trades = (
    df.filter(
        (pl.col("level") == "L1") &
        (pl.col("mdt") == 2)
    )
    .select(["timestamp", "price", "volume"])
)

print(trades.head())

Interpretation:

  • level = L1 keeps top-of-book and trade-style rows
  • mdt = 2 keeps executed trades only

If you need quote events instead, work with mdt = 0 for ask and mdt = 1 for bid. If you need book depth updates, move to the L2 rows and include operation and depth in your logic.

Work With Many Files

Once one day looks correct, move to a multi-file workflow. This is where Parquet becomes the better default.

Multi-file scan with Polars

python
import polars as pl

df = pl.scan_parquet("data/parquet/ES/06-25/*.parquet")

daily_trade_volume = (
    df.filter(
        (pl.col("level") == "L1") &
        (pl.col("mdt") == 2)
    )
    .group_by(
        pl.col("timestamp").dt.date().alias("day")
    )
    .agg(pl.col("volume").sum().alias("trade_volume"))
    .sort("day")
    .collect()
)

print(daily_trade_volume)

This pattern is a good base for:

  • daily summaries
  • contract comparisons
  • date-range filtering
  • screening days before more expensive analysis

Practical Patterns Worth Using Early

You do not need a huge framework to get value from the files.

Useful first analyses

  1. Load one day and confirm the schema.
  2. Filter trade rows only.
  3. Count rows by level and mdt so you know what is really in the file.
  4. Build a simple daily or hourly volume summary.
  5. Only after that, move to derived outputs such as OHLCV, spread analysis, or depth reconstruction.

Example row counts by event type

python
import polars as pl

df = pl.read_parquet("20250601.parquet")

counts = (
    df.group_by(["level", "mdt"])
    .len()
    .sort(["level", "mdt"])
)

print(counts)

When To Leave Python And Use Another Workflow

Python is the right layer for research and custom analysis. It is not always the right layer for every end goal.

  • If your destination is direct replay in the platform, continue with NinjaTrader.
  • If your goal is derived bars with a ready-made utility, continue with Generate OHLCV.
  • If you are still choosing the source format, go back to File Formats.

Related Pages