Documentation

Getting Started

Everything you need to integrate Profluent Research market data into your research pipeline.

Overview

The Profluent Research Storage Platform provides institutional-grade access to historical and real-time market data across equities, futures, options, FX, crypto, and fixed income. Data is stored in columnar Parquet format and queryable via REST API or Python SDK.

All datasets use microsecond or nanosecond timestamp precision with exchange-attributed source tagging. Data is normalized to a unified schema while preserving venue-specific fields.

Quickstart

Install the Python SDK and make your first query in under 60 seconds:

# Install
pip install profluent-research

# Query
from profluent import Client

client = Client(api_key="pr_live_xxxxxxxxxxxxxxxx")

df = client.query(
    dataset="us_equities_taq",
    symbols=["AAPL", "MSFT"],
    start="2026-04-01",
    end="2026-04-02",
    columns=["timestamp", "price", "size"]
)

print(df.head())
                

Authentication

All API requests require authentication via an API key. Keys are scoped to specific datasets and rate limit tiers. You can manage keys in the researcher dashboard.

Note: API keys starting with pr_live_ have production access. Keys starting with pr_test_ return sample data only (100 rows max, delayed by 15 minutes).

Python SDK

The Python SDK wraps the REST API with automatic pagination, retry logic, and Pandas/Polars DataFrame output. It supports async queries for large time ranges.

# Environment variable (recommended)
export PROFLUENT_API_KEY="pr_live_xxxxxxxxxxxxxxxx"

# Then in Python — no key needed
client = Client()  # reads from env
                

Querying Data

Queries are executed against a specific dataset and time range. The API returns data in the requested format (JSON, CSV, or Parquet). For queries exceeding 10M rows, results are returned asynchronously via the export endpoint.

Filters & Columns

Use the columns parameter to select specific fields and reduce payload size. Available columns are listed in each dataset's schema endpoint.

timestamp — Event timestamp (UTC, nanosecond precision)
symbol — Normalized ticker symbol
price — Trade or quote price (decimal, 10 d.p.)
size — Volume in shares/contracts/units
exchange — Source exchange MIC code
conditions — Trade condition flags (SIP)

Pagination

Results are paginated using cursor-based pagination. Each response includes a next_cursor field. Pass it as the cursor parameter in subsequent requests.

Warning: Offset-based pagination is deprecated as of API v3.1. Cursors are required for deterministic ordering across large result sets.

Data Formats

The API supports three output formats:

JSON — Default. Each row is a JSON object. Best for small queries and debugging.
CSV — RFC 4180 compliant. Suitable for spreadsheet tools and legacy pipelines.
Parquet — Apache Parquet with Snappy compression. Recommended for production workloads. 10-50x smaller than JSON.

Streaming (WebSocket)

Real-time data is available via WebSocket connections. Subscribe to one or more symbols and receive updates as they occur. Requires a Premium tier API key.

import asyncio
from profluent import StreamClient

async def main():
    async with StreamClient() as ws:
        await ws.subscribe(
            dataset="us_equities_taq",
            symbols=["AAPL"],
            channels=["trades", "nbbo"]
        )
        async for msg in ws:
            print(msg)

asyncio.run(main())
                

Error Handling

The API uses standard HTTP status codes. All error responses include a JSON body with error and message fields.

400 — Bad request (invalid parameters)
401 — Unauthorized (invalid or missing API key)
403 — Forbidden (key lacks access to requested dataset)
404 — Dataset or resource not found
429 — Rate limit exceeded
500 — Internal server error
503 — Service temporarily unavailable

Limits & Quotas

Rate limits are enforced per API key. Current tier limits:

Standard: 100 req/min, 10 concurrent queries, 10M rows/query
Premium: 1,000 req/min, 50 concurrent queries, 100M rows/query
Enterprise: Custom limits, dedicated infrastructure, SLA

Contact data@profluentresearch.com for enterprise tier access and custom SLAs.