Load API data with dlt

Exercise a canonical dlt init example with CrateDB.

Install the package

Install the dlt destination adapter for CrateDB.

pip install dlt-cratedb

Initialize the dlt project

Start by initializing a new example dlt project.

export DESTINATION__CRATEDB__DESTINATION_TYPE=postgres
dlt init chess cratedb

The dlt init command will initialize your pipeline with chess [1] as the source, and cratedb as the destination. It generates several files and directories.

Edit the pipeline definition

The pipeline definition is stored in the Python file chess_pipeline.py.

  • Because the dlt adapter currently only supports writing to the default doc schema of CrateDB [2], please replace dataset_name="chess_players_games_data" by dataset_name="doc" within the generated chess_pipeline.py file.

  • To initialize the CrateDB destination adapter, insert the import dlt_cratedb statement at the top of the file. Otherwise, the destination will not be found, so you will receive a corresponding error [3].

Configure credentials

Next, set up the CrateDB credentials in the .dlt/secrets.toml file as shown below. CrateDB is compatible with PostgreSQL and uses the psycopg2 driver, like the postgres destination.

[destination.cratedb.credentials]
host = "localhost"                       # CrateDB server host.
port = 5432                              # CrateDB PostgreSQL TCP protocol port, default is 5432.
username = "crate"                       # CrateDB username, default is usually "crate".
password = "crate"                       # CrateDB password, if any.
database = "crate"                       # CrateDB only knows a single database called `crate`.
connect_timeout = 15

Alternatively, you can pass a database connection string as shown below.

destination.cratedb.credentials="postgres://crate:crate@localhost:5432/"

Keep it at the top of your TOML file, before any section starts. Because CrateDB uses psycopg2, using postgres:// is the right choice.

Start CrateDB

Use Docker or Podman to run an instance of CrateDB for evaluation purposes.

docker run --rm --name=cratedb --publish=4200:4200 --publish=5432:5432 crate:latest '-Cdiscovery.type=single-node'

Run pipeline

python chess_pipeline.py

Explore data

crash -c 'SELECT * FROM players_profiles LIMIT 10;'
crash -c 'SELECT * FROM players_online_status LIMIT 10;'