In this video we'll cover some aspects of monitoring the behaviour of your CrateDB cluster and understanding how queries are executed. I'll also show you how CrateDB can be used as a long term storage, analysis and visualisation platform for metrics gathered from other systems via Telegraf or Prometheus. Let's jump right in.
You can monitor the behaviour of a CrateDB cluster using JMX Java Management Extensions. I'll demonstrate this using a Docker image. This command starts a single node cluster with some additional configuration for JMX. We'll use port 7979 as the remote JMX port.
JConsole is a basic graphical application supplied with Java that can connect to a JMX agent. I'm connecting it to port 7979 so that it can attach to the agent running in the CrateDB docker instance. Once connected, JConsole starts to display metrics from the CrateDB node running inside Docker.
Here we see an overview. We can also focus on memory usage, and finally we can drill down into the details of each thread in the JVM.
The CrateDB admin interface has an overview screen that allows you to monitor key operational statistics. Scrolling down, we see live charts that show data pertaining to query speed. These are displayed as an overall value as well as by type of query: selects, inserts, updates and deletes.
Another tool that's available to you when monitoring the performance of your CrateDB cluster is the EXPLAIN command. This command displays the execution plan for a given query. Invoke it by prepending EXPLAIN to the query you want to analyse.
Here I'm analysing a SELECT query that uses our 311 call data, and here's the plan for that query. Please refer to the documentation for the EXPLAIN command to understand the output here.
There's also a verbose mode that returns additional details. Use it by prepending your query with EXPLAIN VERBOSE. This returns a more human readable version of the plan with a breakdown of the steps performed by the query optimizer.
Finally, there's a third variant, EXPLAIN ANALYZE. In this case the plan is executed and the timings associated with the different phases are returned. To learn more about the output here, please consult the EXPLAIN documentation in the CrateDB documentation portal.
CrateDB also makes an excellent data store and analysis platform for metrics data generated by other systems. Let's see how this works for two popular solutions, Telegraf and Prometheus.
Telegraf is a server based agent for collecting metrics and sending them for further processing. These metrics originate from a variety of data sources, for example via HTTP, from sensors, perhaps via MQTT, or from cloud services. These are sent to Telegraf which uses its plug in architecture to ingest, process, aggregate and output data. Metrics can then be stored in CrateDB using Telegraf's Postgres output plug in. It's a simple matter of connecting the plug in with the connection URL for the CrateDB cluster. Once the data's in CrateDB, you can use standard SQL to analyse it, as well as leveraging popular visualisation tools such as Grafana to build custom dashboards of your own.
Prometheus is an open source systems monitoring and alerting toolkit. It collects and stores metrics data. It's great for short term storage and using the CrateDB Prometheus adaptor. This metrics data can seamlessly flow into CrateDB where it can be stored long term and accessed using the power of CrateDB SQL engine. Of course, you can also still use standard tools such as Grafana to build custom dashboards and other visualizations.
You should now have a high level understanding of how to monitor your CrateDB cluster and dig into query behaviour. We also showed you how to use CrateDB as an analytics database for monitoring the behaviour of other systems with metrics collected by standard frameworks such as Telegraf and Prometheus. For more information, be sure to check out CrateDB documentation online as well as the resources associated with this video.