Skip to content
Blog

Inside CrateDB’s Real-Time Query Engine: Aggregations, Ad-Hoc Queries, Hybrid Search, and AI

In today’s data-driven world, real-time insight is no longer optional, it’s a competitive edge. Businesses are flooded with data from sensors, applications, and users, but insight often arrives too late to make a difference.

CrateDB changes that. Built for real-time analytics, search, and AI, CrateDB delivers instant answers on fresh data at any scale. Behind this speed lies a unified query engine that seamlessly combines four powerful capabilities: aggregations, ad-hoc queries, hybrid search, and AI features, all accessible with standard SQL.

Let’s take a look inside.

1. Real-Time Aggregations: Always Up-to-Date Insights

Aggregations are at the heart of analytics, from computing KPIs and dashboards to monitoring live systems. But traditional databases often slow down as data grows, forcing teams to rely on pre-aggregations or delayed pipelines.

CrateDB was designed differently. Its distributed, columnar storage and real-time ingestion engine make aggregations both fast and fresh.

CrateDB can:

  • Ingest millions of records per second,
  • Automatically index new data within seconds,
  • And run complex aggregations on live streams, without caching or rollups.

Because CrateDB’s columnar engine stores data in compressed blocks optimized for analytical queries, operations like SUM(), AVG(), COUNT(), or GROUP BY scale efficiently even across billions of rows.

Imagine computing real-time production averages, fleet performance metrics, or IoT anomaly counts while new events are still flowing in. CrateDB’s distributed SQL engine processes queries in parallel across nodes, ensuring results arrive in milliseconds, not minutes.

CrateDB delivers aggregations at streaming speed, helping you see what’s happening now, not just what happened before.

2. Ad-Hoc Queries: Flexibility Without Trade-Offs

Data never stops changing, and neither do the questions you need to ask.

Traditional time-series or analytics databases often require predefined schemas, views, or indexes to perform efficiently. But in fast-moving environments, that limits agility.

CrateDB allows you to query your data freely. You can:

  • Run any SQL query, from quick filters to multi-join aggregations.
  • Adjust your schema on the fly: add new columns, change types, or enrich data with context.
  • Rely on automatic query optimization that adapts to your data shape and workload.

This flexibility is invaluable when troubleshooting incidents, investigating anomalies, or exploring new hypotheses. Analysts and engineers can issue queries directly from familiar tools, with the universal SQL language.

With CrateDB, ad-hoc analysis becomes part of real-time operations, empowering teams to explore data instantly, without reindexing or waiting for ETL.

3. Hybrid Search: When Text Meets Numbers (and Location)

Most databases are optimized for either structured analytics or unstructured search, rarely both. CrateDB unifies them in a single SQL engine that can handle text, numbers, location, and vectors side by side.

With CrateDB’s hybrid search, you can run queries that mix:

  • Full-text search with MATCH() for keyword or fuzzy matching,
  • Numeric and time filters for structured conditions,
  • Geospatial filters for location-aware queries, and
  • Vector similarity search with KNN_MATCH() for semantic understanding.

This combination allows you to move beyond keyword-based filtering to capture meaning and intent. For instance, imagine analyzing thousands of machine logs or support tickets. You can search for exact matches like “pressure drop”, and simultaneously retrieve messages that are semantically similar, even if they use different wording.

SELECT id, message, timestamp
FROM logs
WHERE MATCH(message, 'pressure drop')
 OR KNN_MATCH(embedding, [0.5, 0.9, -0.1, -0.7], 5) ORDER BY timestamp DESC;

Here, MATCH() finds keyword occurrences, while KNN_MATCH() leverages vector embeddings to surface conceptually related results, such as “valve malfunction” or “sensor anomaly.”

SELECT id, message, temperature, timestamp
FROM logs
WHERE (
   MATCH(message, 'pressure')
   OR KNN_MATCH(embedding, [0.5, 0.9, -0.1, -0.7], 5))
   AND temperature > 80 
   AND timestamp > now() - interval '1 day'
ORDER BY similarity DESC;

CrateDB’s hybrid search doesn’t just find matching words, it finds relevant meaning, merging full-text and semantic similarity search into one real-time query layer.

4. AI Features: From Real-Time Data to Real-Time Intelligence

AI and machine learning models are only as good as the data they learn from, and how fast they can access it. CrateDB acts as the real-time data backbone that continuously feeds AI and ML platforms with fresh, reliable, and context-rich information.

Instead of running inference within the database, CrateDB’s role is to enable external AI systems to consume live data efficiently, ensuring that models stay up to date and responsive to the latest events.

CrateDB provides:

  • High-throughput ingestion of streaming data from sensors, applications, and services.
  • Real-time feature extraction through SQL queries that aggregate, filter, and enrich data as it arrives.
  • Native integrations with AI/ML frameworks, notebooks, and platforms via standard interfaces like JDBC, PostgresWire, and HTTP endpoints.
  • Vector support to store and manage embeddings that can be consumed by AI models or semantic search systems.

For example, CrateDB can maintain a continuously updated feature store for predictive maintenance, anomaly detection, or personalization systems. External ML models can query these real-time features directly, no need for batch pipelines or intermediate storage layers.

CrateDB bridges the gap between data and intelligence, ensuring your AI and ML platforms are always powered by the most current, most relevant data.

5. One Unified Query Engine, Endless Possibilities

At the core of these capabilities lies CrateDB’s distributed SQL engine, designed to handle time-series, JSON, text, vector, and relational data in one unified model.

This means you don’t need separate databases for analytics, search, and AI. CrateDB adapts instantly to evolving workloads and data types, with built-in resilience and automatic optimization.

CrateDB’s real-time query engine is more than a feature, it’s an architecture for speed, scale, and simplicity.

Real-time analytics shouldn’t require stitching together multiple systems. With CrateDB, you get the power of aggregations, ad-hoc exploration, hybrid search, and AI features in one unified platform, built for instant insight and limitless scale.

Whether you’re monitoring fleets, optimizing production, analyzing user behavior, or building AI-driven applications, CrateDB helps you act on data as it happens, not after.