Data models > Time series

Time Series Data Lifecycle

Partitioning strategy

CrateDB's table partitioning feature is a powerful tool for efficiently handling large datasets, especially time-series data. By using the PARTITIONED BY clause, data is split into smaller, more manageable partitions. This makes it easier to query and manage relevant data subsets, while also reducing storage costs by removing outdated or irrelevant data without affecting the rest of the dataset.

When new data is ingested into a partitioned table, CrateDB automatically creates new partitions when needed. Similarly, old data can be easily removed by dropping entire partitions without the costly process of index rebuilds. This makes it very easy to manage historical data and ensures that the database remains fast and efficient.

CrateDB's partitioning strategy allows for automated purging of old data, while retaining relevant information. This is useful for businesses with large datasets, as it optimizes storage and query efficiency.

CREATE TABLE t1 ( 
   name STRING, 
   month TIMESTAMP 
)  
CLUSTERED INTO 3 SHARDS 
PARTITIONED BY (month); 

INSERT INTO t1 (name, month) VALUES (  
  ('foo', '2023-01-01’), 
  ('bar', '2023-02-01') 
); 

DELETE FROM t1 WHERE month = '2023-01-01';

CREATE TABLE IF NOT EXISTS retention_policies (    
   "table_schema" TEXT, 
   "table_name" TEXT, 
   "partition_column" TEXT NOT NULL, 
   "retention_period" INTEGER NOT NULL, 
   "strategy" TEXT NOT NULL, 
PRIMARY KEY ("table_schema", "table_name", "strategy")

Snapshots

Snapshots are a great way to archive old partitions. They capture the state of tables at the exact moment they are taken. You can use them to back up and restore individual partitions, managing historical data that doesn't need to be available in the hot or warm storage.

Snapshots are stored in repositories, which act as storage containers. The repository is set up with a specific storage backend, such as Amazon S3, Microsoft Azure Blob Storage, Google Cloud Storage, or a local file system.

Step 1: Create a repository for storing snapshots

Use the CREATE REPOSITORY statement in CrateDB. In the example below, we create a repository called 'export_cold' and use S3 as the storage backend. The WITH clause includes all the necessary configuration details, such as protocol, endpoint, access credentials, and the name of the bucket.

CREATE REPOSITORY export_cold 
TYPE s3 
WITH ( 
  protocol = 'https’, 
  endpoint = 's3-store.example.org:443’, 
  access_key = '’, 
  secret_key = '’, 
  bucket = 'cratedb-cold-storage' 
);