Huakun

Binance Data Loader

A Python library for downloading and processing historical data from Binance Vision.

GitHub: https://github.com/HuakunShen/binance-data-loader
PyPI: https://pypi.org/project/binance-data/

Features

  • Download historical data from Binance Vision S3 bucket
  • Support for multiple asset types (spot, futures)
  • Flexible prefix-based approach for any data type
  • Output formats: Parquet (default) or CSV
  • Pandera schema validation for data integrity
  • Timestamp auto-detection (milliseconds vs nanoseconds)
  • Concurrent downloads for better performance
  • Optional retention of raw ZIP files
  • Preserve original directory structure

Installation

pip install binance-data

# Or with uv
uv pip install binance-data

Quick Start

from binance_data_loader import BinanceDataDownloader

# Download BTCUSDT 1h futures data as Parquet
downloader = BinanceDataDownloader(
    prefix="data/futures/um/daily/klines/BTCUSDT/1h/",
    destination_dir="./data",
    output_format="parquet",
)
downloader.download()

Data Loading & Resampling

from binance_data_loader import BinanceDataLoader
from datetime import datetime, timedelta, UTC

loader = BinanceDataLoader(data_dir="./data", data_type="spot")

# Load with resampling
df = loader.load(
    symbol="BTCUSDT",
    interval="1m",
    resample_to="15m",
    start_time=datetime.now(UTC) - timedelta(days=7),
)

Supported Intervals

  • Seconds: 1s
  • Minutes: 1m, 3m, 5m, 15m, 30m
  • Hours: 1h, 2h, 4h, 6h, 8h, 12h
  • Days: 1d, 3d
  • Weeks: 1w
  • Months: 1M

Shifted Resampling

Generate multiple shifted datasets for training data augmentation:

# Default 15m intervals end at 0, 15, 30, 45 minutes
df_standard = loader.load(symbol="ETHUSDT", interval="1m", resample_to="15m")

# Shifted by 1m - intervals end at 1, 16, 31, 46 minutes
df_shifted = loader.load(
    symbol="ETHUSDT", interval="1m", resample_to="15m", shift="1m"
)

Perfect for machine learning training with data augmentation.

Data Schema

Kline data columns: open_time, open, high, low, close, volume, close_time, quote_volume, count, taker_buy_volume, taker_buy_quote_volume, ignore

A Python library for quantitative trading data preparation.

On this page