Storage Usage

CrateDB stores data using both row and columnar stores, with automatic indexing on every column, enabling efficient searches without requiring manual index creation. Indexes are leveraged for reads, and depending on the query, the engine uses the most efficient store.

This is one of the many features that makes CrateDB very fast when reading and aggregating data, but it has an impact on storage size.

We are going to use Yellow taxi trip - January 2024 which has 2_964_624 rows.

This is the schema:

CREATE TABLE IF NOT EXISTS "doc"."taxi" (
   "VendorID" BIGINT,
   "tpep_pickup_datetime" TIMESTAMP WITHOUT TIME ZONE,
   "tpep_dropoff_datetime" TIMESTAMP WITHOUT TIME ZONE,
   "passenger_count" BIGINT,
   "trip_distance" REAL,
   "RatecodeID" BIGINT,
   "store_and_fwd_flag" TEXT,
   "PULocationID" BIGINT,
   "DOLocationID" BIGINT,
   "payment_type" BIGINT,
   "fare_amount" REAL,
   "extra" REAL,
   "mta_tax" REAL,
   "tip_amount" REAL,
   "tolls_amount" REAL,
   "improvement_surcharge" REAL,
   "total_amount" REAL,
   "congestion_surcharge" REAL,
   "Airport_fee" REAL
)

It takes:

  • ~48MiB in Parquet

  • ~342MiB in CSV

  • ~1.2GiB in JSON

  • ~510MiB in PostgreSQL 16.1 (Debian 16.1-1.pgdg120+1)

  • ~775MiB in CrateDB 5.9.3 (3 nodes, default settings)

At first sight, it might look that in CrateDB data takes more space than in PostgreSQL, but we need to dive deeper to really understand what is going on, the reality is the opposite.

Note

In version 5.10 storage usage was improved, some users report up to 70% of storage reduction, more reasonably a 30–50% can be expected for most data, the reduction percentage will vary depending on the use case.

Read more at https://cratedb.com/blog/reducing-storage-costs-in-cratedb-v5.10

This guide has been updated to reflect the v5.10 improvements.

How storage in CrateDB works

CrateDB is a distributed database; nodes, shards, partitions and replicas are tightly integrated.

When a table is created, data is sharded and distributed among nodes. This means that the memory footprint depends on our replication and sharding strategy.

Let’s break down how the 431MiB in CrateDB 5.10 and the 510MiB in PostgreSQL were obtained. For PostgreSQL it was straightforward:

SELECT pg_size_pretty(pg_total_relation_size('taxi_january'));

In CrateDB, when a table is created, sharding and replication have to be taken into account. When a table is created with default values, it gets split in a number of shards derived using this formula: max(4, num_data_nodes * 2).

For example, on a typical 3-nodes cluster, it will create:

num_data_nodes = 3

max(4, 3 * 2) = 6 shards

On top of that, the default replication is the 0-1 range: a maximum of one replica. A replica multiplies the number of shards, therefore creating 6 primary shards and 6 replica shards, making 12 shards physically distributed among the three nodes.

CrateDB shards allocation

The number of shards and tables per node can be checked by querying the sys.shards table:

SELECT sum(num_docs)         document_count,
       node                  ['name'] node, count(*) shard_count,
       array_agg(table_name) shard_table_names
FROM sys.shards
GROUP BY node ['name']

The total storage being used by a table can be calculated as the average size of 1 shard * total_shards.

Applying this to the taxi table that was created beforehand:

SELECT TRUNC(sum(size / (1024.0 * 1024.0)) / count(*), 2),
       TRUNC(sum(size) / (1024.0 * 1024.0), 2) as total_mib
FROM sys.shards
WHERE table_name = 'taxi'

CrateDB v5.9.3

avg_mib_per_shard

total_mib

61.59

739.12

CrateDB v5.10.9

avg_mib_per_shard

total_mib

35.89

430.79

The average size of a shard is ~35MiB; 35.89MiB * 12 shards = ~430.68 MiB.

You can check the actual size of a shard by inspecting the filesystem. Use select table_name, path from sys.shards to show the file path of the individual shards.

sh-5.1# pwd
/data/data/nodes/0/indices/LeFVb9VMT_G68tZs0vOuyA
sh-5.1# du -sh ./* | sort -h
8.0K	./_state
36M	./2
36M	./3
36M	./4
36M	./5

Reducing storage

The following techniques can help to reduce the average size of one shard.

Please note reducing disk usage often comes at the cost of performance.

If there are columns that will not be used in aggregations (joins) and groupings (group by, order by), it will have no impact on performance and might make sense to reduce its storage footprint.

Things that can be done:

  • disable indexing

  • disable the columnar store

  • call optimize table (only in >=v5.10)

  • change the compression algorithm

  • review the data schema

Disk size improvements can vary depending on the data types, schema and even disk manufacturer. This example is intended for illustrative purposes only.

Disable indexing

By default, CrateDB creates indexes on every column; this can be disabled when creating the table:

CREATE TABLE taxi_noindex
(
    "VendorID"              BIGINT INDEX OFF,
    "tpep_pickup_datetime"  TIMESTAMP WITHOUT TIME ZONE INDEX OFF,
    "tpep_dropoff_datetime" TIMESTAMP WITHOUT TIME ZONE INDEX OFF, ...
)

The index can only be disabled when the table is created, if the table already exists and it cannot be deleted, it will have to be re-created.

One of the ways of re-creating a table is by renaming it, for example:

  1. Rename table taxi (with INDEX) to taxi_deleteme with:

ALTER TABLE "taxi"
    RENAME TO "taxi_deleteme"
  1. Create the new table named taxi with indexes off.

  2. Copy data from taxi_deleteme to taxi.

INSERT INTO "taxi" (SELECT * FROM "taxi_deleteme")
  1. Delete taxi_deleteme with:

DROP TABLE "taxi_deleteme"

Attention

Dropping the table deletes the data, make sure that the copy was done correctly.

Attention

Disabled indexes can’t be re-added without recreating the table.

Effects on storage

CrateDB v5.9.3

avg_mib_per_shard

total_mib

53

635

CrateDB v5.10.9

avg_mib_per_shard

total_mib

23

285

Disk space used was reduced ~13.11% in v5.9 - original was 739.12MiB Disk space used was reduced ~33.87% in v5.10 - original was 430.79MiB

Disable the columnar store

The columnar store can be disabled at table creation with:

CREATE TABLE IF NOT EXISTS "doc"."taxi_nocolumnstore"
(
    "VendorID" BIGINT STORAGE WITH(
        columnstore = false
) ,
   "tpep_pickup_datetime" TIMESTAMP WITHOUT TIME ZONE STORAGE WITH (
      columnstore = false
   ),
   "tpep_dropoff_datetime" TIMESTAMP WITHOUT TIME ZONE STORAGE WITH (
      columnstore = false
   ),
    ...
)

As with indexing, it can only be turned off at table creation time.

Effects on storage

CrateDB v5.9.3

avg_mib_per_shard

total_mib

53

639

CrateDB v5.10.9

avg_mib_per_shard

total_mib

39

471

Disk space used was reduced ~13.11% in v5.9 similar to no_index - original was 739.12MiB Disk space used was increased ~10% in v5.10 - original was 430.79MiB

Note

When columnstore is off, data needs to be stored somewhere else (row store), which is less efficient in terms of disk usage, as column stores are designed specifically for each datatype.

In v5.10, we don’t recommend that you turn off column storage.

Upgrade to v5.10 and OPTIMIZE TABLE

CrateDB v5.10 introduced a new table storage format that can reduce storage up to 50% in comparison.

The easiest way to gain massive storage reduction is to update to the latest version available.

Additionally, as part of CrateDB’s background operations, index segments are merged and optimized; you would typically see storage usage being automatically reduced as more data gets added to the table. Nonetheless, you can trigger these optimizations manually by running:

OPTIMIZE TABLE <table_name> WITH (max_num_segments=1)

If a table is ingesting data continuously, it makes sense to let CrateDB merge and optimize the segments. If you run this command and storage usage doesn’t decrease within minutes; it’s most likely that they were already optimized.

Changing the compression algorithm

Data is compressed when it is stored on disk, two options are available default (LZ4) and best_compression.

best_compression might be less performant on certain queries, but it has less storage footprint.

It can be changed via table definition:

CREATE TABLE IF NOT EXISTS "doc"."taxi_january_nocolumnstore"
(
    "VendorID" BIGINT, ...
) WITH (codec = 'best_compression')

Effects on storage

CrateDB v5.9.3

avg_mib_per_shard

total_mib

44

536

CrateDB v5.10.9

avg_mib_per_shard

total_mib

36

436

Disk space used was reduced ~27.86% in v5.9 - original was 739.12MiB Disk space used was reduced ~1% in v5.10 - original was 430.79MiB

In v5.10 depending on your data, changing the compression algorithm might not yield storage improvements.

All results and what to do

In the following table, all the above results can be found, also different combinations of them:

SELECT table_name,
       SUM(num_docs)                          as records,
       (SUM(size) / (1024 * 1024))            as total_size_mib,
       (SUM(size) / count(*)) / (1024 * 1024) as avg_mib_per_shard,
       (SUM(size) / SUM(num_docs) :: DOUBLE)  as avg_bytes_per_record
FROM sys.shards
WHERE table_name LIKE 'taxi%'
GROUP BY 1
ORDER BY total_size_mib

CrateDB v5.9.3

table_name

records

total_size_mib

avg_mib_per_shard

avg_bytes_per_record

“taxi__noindex_nocolumnstore_bestcompresion”

2964624

122

20

41

“taxi_nocolumnstore_bestcompression”

2964624

205

34

69

“taxi_noindex_bestcompression”

2964624

212

35

71

“taxi_noindex_nocolumnstore”

2964624

237

39

80

“taxi_bestcompresion”

2964624

290

48

98

“taxi_noindex”

2964624

317

52

107

“taxi_nocolumnstore”

2964624

319

53

107

“taxi”

2964624

385

64

130

CrateDB v5.10.9

table_name

records

total_size_mib

avg_mib_per_shard

avg_bytes_per_record

“taxi_noindex_nocolumnstore_bestcompression”

2964624

113

18

40.01650765830675

“taxi_noindex_bestcompression”

2964624

135

22

47.98841067197729

“taxi_noindex”

2964624

142

23

50.380041448763826

“taxi_noindex_nocolumnstore”

2964624

152

25

53.92698838031399

“taxi_nocolumnstore_bestcompression”

2964624

195

32

69.21123150861627

“taxi”

2964624

215

35

76.17595485970566

“taxi_bestcompression”

2964624

218

36

77.14465341979286

“taxi_nocolumnstore”

2964624

236

39

83.49627406375986

To summarize:

  1. total_size_mib is the sum of the disk used in MiB of all the primary shards.

  2. All primary shards make up the full table.

  3. By default, every table will have a maximum of one replica.

The original 775MiB in v5.9 can be calculated as:

64MiB per shard * (6 primary shards + 6 replica shards) = 768 MiB

The result are slightly off 768 ~= 775 because in this example, decimals are being truncated. The goal is to give you an idea on how tweaking some CrateDB aspect can affect storage, being overly precise to the kilobyte level doesn’t matter too much.

Query with everything applied:

CREATE TABLE IF NOT EXISTS "doc"."taxi_nocolumnstore_noindex_bestcompression"
(
    "VendorID" BIGINT INDEX OFF STORAGE WITH(
        columnstore = false
) ,
   "tpep_pickup_datetime" TIMESTAMP WITHOUT TIME ZONE INDEX OFF STORAGE WITH (
      columnstore = false
   ),
   "tpep_dropoff_datetime" TIMESTAMP WITHOUT TIME ZONE INDEX OFF STORAGE WITH (
      columnstore = false
   ),
   ...
   WITH (codec = 'best_compression')

What to do

CrateDB’s default settings are optimized for performance.

If some columns are never used for aggregations or groupings, there will be no performance penalty. That might change in the future as your use case and data evolve, re-adding indexes or column store at later stages is tricky, as the table will need to be re-created and data copied over, which might need some downtime, depending on the setup.

When designing your data model, it is important to evaluate your current and future needs to minimize any future overhead.

Extra: Data normalization

One of the most common ways to reduce storage size is to not write data more than once, by normalizing your tables.

We encourage you to read more about database normalization.