Documentation

Getting Started

Everything you need to integrate Profluent Research market data into your research pipeline.

Overview

The Profluent Research Storage Platform provides institutional-grade access to historical and real-time market data across equities, futures, options, FX, crypto, and fixed income. Data is stored in columnar Parquet format and queryable via REST API or Python SDK.

All datasets use microsecond or nanosecond timestamp precision with exchange-attributed source tagging. Data is normalized to a unified schema while preserving venue-specific fields.

Quickstart

Install the Python SDK and make your first query in under 60 seconds:

# Install pip install profluent-research # Query from profluent import Client client = Client(api_key="pr_live_xxxxxxxxxxxxxxxx") df = client.query( dataset="us_equities_taq", symbols=["AAPL", "MSFT"], start="2026-04-01", end="2026-04-02", columns=["timestamp", "price", "size"] ) print(df.head())

Authentication

All API requests require authentication via an API key. Keys are scoped to specific datasets and rate limit tiers. You can manage keys in the researcher dashboard.

Note: API keys starting with pr_live_ have production access. Keys starting with pr_test_ return sample data only (100 rows max, delayed by 15 minutes).

Python SDK

The Python SDK wraps the REST API with automatic pagination, retry logic, and Pandas/Polars DataFrame output. It supports async queries for large time ranges.

# Environment variable (recommended) export PROFLUENT_API_KEY="pr_live_xxxxxxxxxxxxxxxx" # Then in Python — no key needed client = Client() # reads from env

Querying Data

Queries are executed against a specific dataset and time range. The API returns data in the requested format (JSON, CSV, or Parquet). For queries exceeding 10M rows, results are returned asynchronously via the export endpoint.

Filters & Columns

Use the columns parameter to select specific fields and reduce payload size. Available columns are listed in each dataset's schema endpoint.

  • timestamp — Event timestamp (UTC, nanosecond precision)
  • symbol — Normalized ticker symbol
  • price — Trade or quote price (decimal, 10 d.p.)
  • size — Volume in shares/contracts/units
  • exchange — Source exchange MIC code
  • conditions — Trade condition flags (SIP)

Pagination

Results are paginated using cursor-based pagination. Each response includes a next_cursor field. Pass it as the cursor parameter in subsequent requests.

Warning: Offset-based pagination is deprecated as of API v3.1. Cursors are required for deterministic ordering across large result sets.

Data Formats

The API supports three output formats:

  • JSON — Default. Each row is a JSON object. Best for small queries and debugging.
  • CSV — RFC 4180 compliant. Suitable for spreadsheet tools and legacy pipelines.
  • Parquet — Apache Parquet with Snappy compression. Recommended for production workloads. 10-50x smaller than JSON.

Streaming (WebSocket)

Real-time data is available via WebSocket connections. Subscribe to one or more symbols and receive updates as they occur. Requires a Premium tier API key.

import asyncio from profluent import StreamClient async def main(): async with StreamClient() as ws: await ws.subscribe( dataset="us_equities_taq", symbols=["AAPL"], channels=["trades", "nbbo"] ) async for msg in ws: print(msg) asyncio.run(main())

Error Handling

The API uses standard HTTP status codes. All error responses include a JSON body with error and message fields.

  • 400 — Bad request (invalid parameters)
  • 401 — Unauthorized (invalid or missing API key)
  • 403 — Forbidden (key lacks access to requested dataset)
  • 404 — Dataset or resource not found
  • 429 — Rate limit exceeded
  • 500 — Internal server error
  • 503 — Service temporarily unavailable

Limits & Quotas

Rate limits are enforced per API key. Current tier limits:

  • Standard: 100 req/min, 10 concurrent queries, 10M rows/query
  • Premium: 1,000 req/min, 50 concurrent queries, 100M rows/query
  • Enterprise: Custom limits, dedicated infrastructure, SLA

Contact data@profluentresearch.com for enterprise tier access and custom SLAs.