pandas¶
About
pandas is a fast, powerful, flexible, and easy-to-use open-source data analysis and manipulation tool, built on top of the Python programming language. It offers data structures and operations for manipulating numerical tables and time series.
Pandas is built around data structures called Series and DataFrames. Data for these collections can be imported from various file formats such as comma-separated values, JSON, Parquet, SQL database tables or queries, and Microsoft Excel. A Series is a 1-dimensional data structure built on top of NumPy’s array.
Install
pip install pandas sqlalchemy-cratedb
Synopsis
Write pandas dataframe to CrateDB.
example.py
import sqlalchemy as sa
from sqlalchemy_cratedb import insert_bulk
CRATEDB_URI = "crate://crate:crate@localhost:4200"
TABLE_NAME = "example"
df = makeTimeDataFrame(rows=500_000, freq="s")
engine = sa.create_engine(CRATEDB_URI)
df.to_sql(
name=TABLE_NAME,
con=engine,
if_exists="replace",
index=False,
chunksize=20_000,
method=insert_bulk,
)
Quickstart example
Create the file example.py including the synopsis code shared above.
Complete the example by using the makeTimeDataFrame() function.
def makeTimeDataFrame(rows=5_000, freq = "B"):
import numpy as np
import pandas as pd
return pd.DataFrame(
np.random.default_rng(2).standard_normal((rows, 4)),
columns=pd.Index(list("ABCD"), dtype=object),
index=pd.date_range("2000-01-01", periods=rows, freq=freq),
)
Start CrateDB using Docker or Podman, then invoke the example program.
docker run --rm --publish=5432:5432 docker.io/crate '-Cdiscovery.type=single-node'
pip install pandas sqlalchemy-cratedb
python example.py
Full example
Connect to CrateDB and CrateDB Cloud using pandas.
Guides
Related sections
Efficient batch/bulk INSERT operations for pandas, Dask, and Polars
Importing Parquet files into CrateDB using Apache Arrow and SQLAlchemy