dbt

dbt is an open source tool for transforming data in data warehouses using Python and SQL. It is an SQL-first transformation workflow platform that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation.

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

With dbt, anyone on your data team can safely contribute to production-grade data pipelines.

The idea is that data engineers make source data available to an environment where dbt projects run, for example with Debezium or with Airflow. Afterwards, data analysts can run their dbt projects against this data to produce models (tables and views) that can be used with a number of BI tools.

Managed dbt

With dbt Cloud, you can ditch time-consuming setup, and the struggles of scaling your data production. dbt Cloud is a full-suite service that is built for scale.

  • Start building data products quickly using the dbt Cloud IDE with integrated security and governance controls.

  • Schedule, deploy, and monitor your data products using the scalable and reliable dbt Cloud Scheduler.

  • Help your data teams discover and reuse data products using hosted docs or integrations with the powerful Discovery API.

  • Extend your workflow beyond dbt Cloud with 30+ seamless integrations covering a range of use cases across the Modern Data Stack, from observability and data quality to visualization, reverse ETL, and much more.

  • Ship more high-quality data and scale your development like the 1000s of companies that use dbt Cloud. They’ve used its convenient and collaboration-friendly interface to eliminate the bottlenecks that keep growth limited.

Install

Install the most recent version of the dbt-cratedb2 Python package.

pip install --upgrade 'dbt-cratedb2'

Connect

dbt Profile Configuration: CrateDB targets should be set up using the following configuration in your profiles.yml file.

company-name:
  target: dev
  outputs:
    dev:
      type: cratedb
      host: [hostname]
      user: [username]
      password: [password]
      port: [port]   # Default is 5432.
      dbname: crate  # Fixed. Do not change.
      schema: doc    # `doc` is the default schema.

dbt-cratedb2 is based on dbt-postgres, which uses psycopg2 to connect to the database server. Because CrateDB is compatible with PostgreSQL, the same connectivity options apply like outlined on the dbt Postgres Setup documentation page.

Usage

Custom Schemas

By default, dbt writes the models into the schema you configured in your profile, but in some dbt projects you may need to write data into different target schemas. You can adjust the target schema using custom schemas with dbt.

If your dbt project has a custom macro called generate_schema_name, dbt will use it instead of the default macro. This allows you to customize the name generation according to your needs.

{% macro generate_schema_name(custom_schema_name, node) -%}
  {%- set default_schema = target.schema -%}
  {%- if custom_schema_name is none -%}
    {{ default_schema }}
  {%- else -%}
    {{ custom_schema_name | trim }}
  {%- endif -%}
{%- endmacro %}

Learn

Tutorials

Development

Webinars

Introduction to dbt

Learn how to get started using dbt by following along with an easy step-by-step tutorial.

In this video, you will learn how to install dbt, initialize a new project and then publish your project to a GitHub repository.

 

Webinar Fundamentals