Storage Usage¶

CrateDB stores data using both row and columnar stores, with automatic indexing on every column, enabling efficient searches without requiring manual index creation. Indexes are leveraged for reads, and depending on the query, the engine uses the most efficient store.

This is one of the many features that makes CrateDB very fast when reading and aggregating data, but it has an impact on storage size.

We are going to use Yellow taxi trip - January 2024 which has 2_964_624 rows.

This is the schema:

CREATE TABLE IF NOT EXISTS "doc"."taxi" (
   "VendorID" BIGINT,
   "tpep_pickup_datetime" TIMESTAMP WITHOUT TIME ZONE,
   "tpep_dropoff_datetime" TIMESTAMP WITHOUT TIME ZONE,
   "passenger_count" BIGINT,
   "trip_distance" REAL,
   "RatecodeID" BIGINT,
   "store_and_fwd_flag" TEXT,
   "PULocationID" BIGINT,
   "DOLocationID" BIGINT,
   "payment_type" BIGINT,
   "fare_amount" REAL,
   "extra" REAL,
   "mta_tax" REAL,
   "tip_amount" REAL,
   "tolls_amount" REAL,
   "improvement_surcharge" REAL,
   "total_amount" REAL,
   "congestion_surcharge" REAL,
   "Airport_fee" REAL
)

It takes:

~48MiB in Parquet
~342MiB in CSV
~1.2GiB in JSON
~510MiB in PostgreSQL 16.1 (Debian 16.1-1.pgdg120+1)
~775MiB in CrateDB 5.9.3 (3 nodes, default settings)

At first sight, it might look that in CrateDB data takes more space than in PostgreSQL, but we need to dive deeper to really understand what is going on, the reality is the opposite.

Note

In version 5.10 storage usage was improved, some users report up to 70% of storage reduction, more reasonably a 30–50% can be expected for most data, the reduction percentage will vary depending on the use case.

This guide has been updated to reflect the v5.10 improvements.

How storage in CrateDB works¶

CrateDB is a distributed database; nodes, shards, partitions and replicas are tightly integrated.

When a table is created, data is sharded and distributed among nodes. This means that the memory footprint depends on our replication and sharding strategy.

Let’s break down how the 431MiB in CrateDB 5.10 and the 510MiB in PostgreSQL were obtained. For PostgreSQL it was straightforward:

SELECT pg_size_pretty(pg_total_relation_size('taxi_january'));

In CrateDB, when a table is created, sharding and replication have to be taken into account. When a table is created with default values, it gets split in a number of shards derived using this formula: max(4, num_data_nodes * 2).

For example, on a typical 3-nodes cluster, it will create:

num_data_nodes = 3

max(4, 3 * 2) = 6 shards

On top of that, the default replication is the 0-1 range: a maximum of one replica. A replica multiplies the number of shards, therefore creating 6 primary shards and 6 replica shards, making 12 shards physically distributed among the three nodes.

CrateDB shards allocation

The number of shards and tables per node can be checked by querying the sys.shards table:

SELECT sum(num_docs)         document_count,
       node                  ['name'] node, count(*) shard_count,
       array_agg(table_name) shard_table_names
FROM sys.shards
GROUP BY node ['name']

The total storage being used by a table can be calculated as the average size of 1 shard * total_shards.

Applying this to the taxi table that was created beforehand:

SELECT TRUNC(sum(size / (1024.0 * 1024.0)) / count(*), 2),
       TRUNC(sum(size) / (1024.0 * 1024.0), 2) as total_mib
FROM sys.shards
WHERE table_name = 'taxi'

CrateDB v5.9.3

avg_mib_per_shard	total_mib
61.59	739.12

CrateDB v5.10.9

avg_mib_per_shard	total_mib
35.89	430.79

The average size of a shard is ~35MiB; 35.89MiB * 12 shards = ~430.68 MiB.

You can check the actual size of a shard by inspecting the filesystem. Use select table_name, path from sys.shards to show the file path of the individual shards.

sh-5.1# pwd
/data/data/nodes/0/indices/LeFVb9VMT_G68tZs0vOuyA
sh-5.1# du -sh ./* | sort -h
8.0K	./_state
36M	./2
36M	./3
36M	./4
36M	./5

Reducing storage¶

The following techniques can help to reduce the average size of one shard.

Please note reducing disk usage often comes at the cost of performance.

If there are columns that will not be used in aggregations (joins) and groupings (group by, order by), it will have no impact on performance and might make sense to reduce its storage footprint.

Things that can be done:

disable indexing
disable the columnar store
call optimize table (only in >=v5.10)
change the compression algorithm
review the data schema

Disk size improvements can vary depending on the data types, schema and even disk manufacturer. This example is intended for illustrative purposes only.

Disable indexing¶

By default, CrateDB creates indexes on every column; this can be disabled when creating the table:

CREATE TABLE taxi_noindex
(
    "VendorID"              BIGINT INDEX OFF,
    "tpep_pickup_datetime"  TIMESTAMP WITHOUT TIME ZONE INDEX OFF,
    "tpep_dropoff_datetime" TIMESTAMP WITHOUT TIME ZONE INDEX OFF, ...
)

The index can only be disabled when the table is created, if the table already exists and it cannot be deleted, it will have to be re-created.

One of the ways of re-creating a table is by renaming it, for example:

Rename table taxi (with INDEX) to taxi_deleteme with:

ALTER TABLE "taxi"
    RENAME TO "taxi_deleteme"

Create the new table named taxi with indexes off.
Copy data from taxi_deleteme to taxi.

INSERT INTO "taxi" (SELECT * FROM "taxi_deleteme")

Delete taxi_deleteme with:

DROP TABLE "taxi_deleteme"

Attention

Dropping the table deletes the data, make sure that the copy was done correctly.

Attention

Disabled indexes can’t be re-added without recreating the table.

Effects on storage¶

CrateDB v5.9.3

avg_mib_per_shard	total_mib
53	635

CrateDB v5.10.9

avg_mib_per_shard	total_mib
23	285

Disk space used was reduced ~13.11% in v5.9 - original was 739.12MiB Disk space used was reduced ~33.87% in v5.10 - original was 430.79MiB

Disable the columnar store¶

The columnar store can be disabled at table creation with:

CREATE TABLE IF NOT EXISTS "doc"."taxi_nocolumnstore"
(
    "VendorID" BIGINT STORAGE WITH(
        columnstore = false
) ,
   "tpep_pickup_datetime" TIMESTAMP WITHOUT TIME ZONE STORAGE WITH (
      columnstore = false
   ),
   "tpep_dropoff_datetime" TIMESTAMP WITHOUT TIME ZONE STORAGE WITH (
      columnstore = false
   ),
    ...
)

As with indexing, it can only be turned off at table creation time.

Effects on storage¶

CrateDB v5.9.3

avg_mib_per_shard	total_mib
53	639

CrateDB v5.10.9

avg_mib_per_shard	total_mib
39	471

Disk space used was reduced ~13.11% in v5.9 similar to no_index - original was 739.12MiB Disk space used was increased ~10% in v5.10 - original was 430.79MiB

Note

When columnstore is off, data needs to be stored somewhere else (row store), which is less efficient in terms of disk usage, as column stores are designed specifically for each datatype.

In v5.10, we don’t recommend that you turn off column storage.

Upgrade to v5.10 and `OPTIMIZE TABLE`¶

CrateDB v5.10 introduced a new table storage format that can reduce storage up to 50% in comparison.

The easiest way to gain massive storage reduction is to update to the latest version available.

Additionally, as part of CrateDB’s background operations, index segments are merged and optimized; you would typically see storage usage being automatically reduced as more data gets added to the table. Nonetheless, you can trigger these optimizations manually by running:

OPTIMIZE TABLE <table_name> WITH (max_num_segments=1)

If a table is ingesting data continuously, it makes sense to let CrateDB merge and optimize the segments. If you run this command and storage usage doesn’t decrease within minutes; it’s most likely that they were already optimized.

Changing the compression algorithm¶

Data is compressed when it is stored on disk, two options are available default (LZ4) and best_compression.

best_compression might be less performant on certain queries, but it has less storage footprint.

It can be changed via table definition:

CREATE TABLE IF NOT EXISTS "doc"."taxi_january_nocolumnstore"
(
    "VendorID" BIGINT, ...
) WITH (codec = 'best_compression')

Effects on storage¶

CrateDB v5.9.3

avg_mib_per_shard	total_mib
44	536

CrateDB v5.10.9

avg_mib_per_shard	total_mib
36	436

Disk space used was reduced ~27.86% in v5.9 - original was 739.12MiB Disk space used was reduced ~1% in v5.10 - original was 430.79MiB

In v5.10 depending on your data, changing the compression algorithm might not yield storage improvements.

All results and what to do¶

In the following table, all the above results can be found, also different combinations of them:

SELECT table_name,
       SUM(num_docs)                          as records,
       (SUM(size) / (1024 * 1024))            as total_size_mib,
       (SUM(size) / count(*)) / (1024 * 1024) as avg_mib_per_shard,
       (SUM(size) / SUM(num_docs) :: DOUBLE)  as avg_bytes_per_record
FROM sys.shards
WHERE table_name LIKE 'taxi%'
GROUP BY 1
ORDER BY total_size_mib

CrateDB v5.9.3

table_name	records	total_size_mib	avg_mib_per_shard	avg_bytes_per_record
“taxi__noindex_nocolumnstore_bestcompresion”	2964624	122	20	41
“taxi_nocolumnstore_bestcompression”	2964624	205	34	69
“taxi_noindex_bestcompression”	2964624	212	35	71
“taxi_noindex_nocolumnstore”	2964624	237	39	80
“taxi_bestcompresion”	2964624	290	48	98
“taxi_noindex”	2964624	317	52	107
“taxi_nocolumnstore”	2964624	319	53	107
“taxi”	2964624	385	64	130

CrateDB v5.10.9

table_name	records	total_size_mib	avg_mib_per_shard	avg_bytes_per_record
“taxi_noindex_nocolumnstore_bestcompression”	2964624	113	18	40.01650765830675
“taxi_noindex_bestcompression”	2964624	135	22	47.98841067197729
“taxi_noindex”	2964624	142	23	50.380041448763826
“taxi_noindex_nocolumnstore”	2964624	152	25	53.92698838031399
“taxi_nocolumnstore_bestcompression”	2964624	195	32	69.21123150861627
“taxi”	2964624	215	35	76.17595485970566
“taxi_bestcompression”	2964624	218	36	77.14465341979286
“taxi_nocolumnstore”	2964624	236	39	83.49627406375986

To summarize:

total_size_mib is the sum of the disk used in MiB of all the primary shards.
All primary shards make up the full table.
By default, every table will have a maximum of one replica.

The original 775MiB in v5.9 can be calculated as:

64MiB per shard * (6 primary shards + 6 replica shards) = 768 MiB

The result are slightly off 768 ~= 775 because in this example, decimals are being truncated. The goal is to give you an idea on how tweaking some CrateDB aspect can affect storage, being overly precise to the kilobyte level doesn’t matter too much.

Query with everything applied:

CREATE TABLE IF NOT EXISTS "doc"."taxi_nocolumnstore_noindex_bestcompression"
(
    "VendorID" BIGINT INDEX OFF STORAGE WITH(
        columnstore = false
) ,
   "tpep_pickup_datetime" TIMESTAMP WITHOUT TIME ZONE INDEX OFF STORAGE WITH (
      columnstore = false
   ),
   "tpep_dropoff_datetime" TIMESTAMP WITHOUT TIME ZONE INDEX OFF STORAGE WITH (
      columnstore = false
   ),
   ...
   WITH (codec = 'best_compression')

What to do¶

CrateDB’s default settings are optimized for performance.

If some columns are never used for aggregations or groupings, there will be no performance penalty. That might change in the future as your use case and data evolve, re-adding indexes or column store at later stages is tricky, as the table will need to be re-created and data copied over, which might need some downtime, depending on the setup.

When designing your data model, it is important to evaluate your current and future needs to minimize any future overhead.

Extra: Data normalization¶

One of the most common ways to reduce storage size is to not write data more than once, by normalizing your tables.

We encourage you to read more about database normalization.

Storage Usage¶

How storage in CrateDB works¶

Reducing storage¶

Disable indexing¶

Effects on storage¶

Disable the columnar store¶

Effects on storage¶

Upgrade to v5.10 and OPTIMIZE TABLE¶

Changing the compression algorithm¶

Effects on storage¶

All results and what to do¶

What to do¶

Extra: Data normalization¶

Upgrade to v5.10 and `OPTIMIZE TABLE`¶