Monitoring and diagnostics

It is important to continuously monitor your CrateDB database cluster to detect anomalies, so you can react to them promptly. Collecting statistics and following usage trends is also important for proper capacity planning.

CrateDB provides system information about the cluster as a whole, individual cluster nodes, and about the entities and resources it manages. Different kinds of observability interfaces and support utilities are available, for both continuous metrics collection and monitoring, and for ad hoc use. Below are a few popular and recommended options.

Types of interfaces

HTTP:

The HTTP interface root endpoint of CrateDB responds with status information if the cluster is ready to receive and process requests.

Example

If ok isn’t true, or status isn’t 200, you can assume the cluster is in an error state.

curl --fail http://localhost:4200/
{
  "ok" : true,
  "status" : 200,
  "name" : "Montagne des Agneaux",
  "cluster_name" : "crate",
  "version" : {
    "number" : "6.2.0",
    "build_hash" : "186b70597ac02d66f1093755091f8b372eb6e31a",
    "build_timestamp" : "NA",
    "build_snapshot" : true,
    "lucene_version" : "10.3.1"
  }
}
JMX:

CrateDB exposes telemetry data using the JMX monitoring feature that needs to be enabled when starting CrateDB. It provides statistical information about queries, shards, thread pools, and circuit breakers, and status information about connections, server nodes, shards, and health.

Information gathered by the JMX collectors can be provided to the Prometheus exporter or other observability and monitoring systems.

SQL:

CrateDB provides cluster and system information at runtime, through system tables located in the sys schema, which can be queried using SQL.

Diagnostics with system tables illustrates how inquiring system tables works, and how to interpret the results. [1]

Jobs, operations, and logs shares details about how to inspect the activities taking place in a cluster, reflected through two pairs of system tables, one about cluster jobs, and one about their corresponding break-down operations, separated into “currently active” vs. “completed” ones.

Prometheus:

The Crate JMX HTTP Exporter is a Prometheus exporter that consumes metrics information from CrateDB’s JMX collectors and exposes them via HTTP so they can be scraped by Prometheus, and, for example, subsequently displayed in Grafana, or processed into Alertmanager.

Monitoring a CrateDB cluster with Prometheus and Grafana illustrates a full setup for making CrateDB-specific metrics available to Prometheus. The tutorial uses the Crate JMX HTTP Exporter for exposing telemetry information, the Prometheus SQL Exporter for conducting system table inquiries, and the Prometheus Node Exporter for collecting OS metrics.

CLI / HTTP:

CrateDB is accompanied by a few looking-glass utilities that inquire its system tables to generate reports and display diagnostic output, or record it for later or remote inspection.

The reporting tools support you to automate capturing information and enriching it into textual and tabular reports, or machine-readable formats. The analysis tools support you to discover anomalies and performance bottlenecks in your cluster.

Reports: CrateDB Cluster Information, CrateDB Cluster Flight Recorder (CFR), ctk settings, ctk tail.
Analysis: CrateDB XMover, CrateDB Shard Analyzer.