Skip to content
Blog

Optimizing Big Crate Clusters

This article is more than 4 years old

As Crate gets more mature, our users are building bigger and bigger clusters. We see people inserting thousands of rows a second and querying huge data sets. Crate will handle these demands, but optimizing your setup is something that should never stop.

There isn't ONE setting that fits everyone for most of the tips in this blogpost. They depend on your application, the data structure(s) you've defined, how often you insert data and how you access it.

Bulk Size

When ingesting a large amount of data into Crate we recommend using bulk inserts. The structure of your data affects the optimal bulk size. Try inserting different bulk sizes like 100, 1000 and 10000 rows to see which best suits you.

Setting a Higher Refresh Interval

The refresh interval affects the time needed until changes like inserting, updating and deleting are accessible. Refreshing the index is resource intensive so if you don't rely on the data being available in real time, consider increasing this value. You can read more about this setting in the Crate documentation.

Optimizing Indexes

Crate writes to the bottom of index files and these files are regularly merged and optimized by Crate automatically. Whenever a lot of data is ingested into Crate the indexes can become fragmented. Optimizing these indexes can take time and happens whenever Crate decides it's needed. To control this optimization process turn the index can off before inserting data and trigger the optimization manually after the data is in Crate.

Example: (it only runs when the Elasticsearch API is enabled)

$ curl -XPOST 'http://localhost:4200/twitter/_optimize'

Index Store Throttle

When Crate is used with SSDs, increase the 'store throttle' to at least 100 to 200 MB/s, because it's possible to write to the disk much faster.

Example:

PUT /_cluster/settings
{
    "persistent" : {
        "indices.store.throttle.max_bytes_per_sec" : "100mb"
    }
}

Index Buffer Size

This setting describes at which buffer size an index write is triggered.

You can find more about ths setting in the elastic search documentation. In summary, when Crate isn't consuming a lot of RAM you can turn it up, otherwise turn it down.

Your experience?

We'd love to learn how you optimize your Crate setup. Tell us on Twitter and Slack.