Polars¶
About
Polars is a high‑performance DataFrames library with interfaces for Rust, Python, Node.js, and R, plus a SQL context. It is powered by a multithreaded, vectorized query engine and written in Rust.
Features and data formats
Polars is an open-source library for data manipulation, known for being one of the fastest data processing solutions on a single machine. It features a well-structured, typed API that is both expressive and easy to use.
Fast: Written from scratch in Rust and with performance in mind, designed close to the machine, and without external dependencies.
I/O: First class support for all common data storage layers: local, cloud storage & databases.
Intuitive API: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer. Polars’ expressions are intuitive and empower you to write readable and performant code at the same time.
Out of Core: The streaming API allows you to process your results without requiring all your data to be in memory at the same time.
Parallel: Polars’ multithreaded query engine utilizes the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
Vectorized Query Engine: Uses Apache Arrow, a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage. This enables cache-coherent algorithms and high performance on modern processors.
Open Source: Polars is and always will be open source. Driven by an active community of developers. Everyone is encouraged to add new features and contribute. It is free to use under the MIT license.
Polars supports reading and writing to many common data formats. This allows you to easily integrate Polars into your existing data stack.
Text: CSV, JSON
Binary: Parquet, Delta Lake, Avro, Excel
IPC: Feather, Arrow IPC
Databases: MySQL, PostgreSQL, SQLite, Redshift, SQL Server, etc. (via ConnectorX)
Cloud storage: Amazon S3, Azure Blob/ADLS (via fsspec‑compatible backends)
Install
pip install 'polars[pyarrow]' sqlalchemy-cratedb
Synopsis
Write Polars dataframe to CrateDB.
example.py
import polars as pl
import sqlalchemy as sa
from sqlalchemy_cratedb import insert_bulk
CRATEDB_URI = "crate://crate:crate@localhost:4200"
TABLE_NAME = "example"
df = pl.from_pandas(makeTimeDataFrame(rows=500_000, freq="s"))
engine = sa.create_engine(CRATEDB_URI)
df.write_database(
engine="sqlalchemy",
connection=engine,
table_name=TABLE_NAME,
if_table_exists="replace",
engine_options={
"method": insert_bulk,
"chunksize": 20_000,
},
)
Quickstart example
Create the file example.py including the synopsis code shared above.
Complete the example by using the makeTimeDataFrame() function.
def makeTimeDataFrame(rows=5_000, freq = "B"):
import numpy as np
import pandas as pd
return pd.DataFrame(
np.random.default_rng(2).standard_normal((rows, 4)),
columns=pd.Index(list("ABCD"), dtype=object),
index=pd.date_range("2000-01-01", periods=rows, freq=freq),
)
Start CrateDB using Docker or Podman, then invoke the example program.
docker run --rm --publish=5432:5432 docker.io/crate '-Cdiscovery.type=single-node'
pip install 'polars[pyarrow]' sqlalchemy-cratedb pandas
python example.py
Full example
Connect to CrateDB and CrateDB Cloud using Polars.