Four databases come up in nearly every industrial IoT evaluation: MongoDB, TimescaleDB, InfluxDB, and CrateDB. They represent four different architectural bets on how to handle high-frequency sensor data at scale. This post compares them on the dimensions that determine whether a database holds up in production: ingest throughput, query performance over long time ranges, SQL join capability, schema flexibility, and deployment options.
What this comparison covers
The workload that defines industrial IoT database requirements has a consistent profile: thousands of sensors writing at high frequency, readings that must be queryable the moment they arrive, and analytics that cross sensor streams with production, asset, and scheduling data. A distribution center might run 900,000 sensors across a single facility. A manufacturing platform might ingest a million sensor values per second. A media analytics system might query 60 million rows in under a second for live dashboards.
The databases in this comparison handle that workload profile differently. MongoDB is a general-purpose document store. TimescaleDB is a time-series extension on PostgreSQL. InfluxDB is a dedicated time-series database now in its third major architecture. CrateDB is a distributed SQL database built for high-cardinality operational data.
The evaluation covers six axes: ingest at scale, query over long ranges, cross-table joins, schema flexibility, horizontal scaling, and deployment options including on-premises.
MongoDB: document storage at the edge of its design
MongoDB was not built for time-series workloads. It is a general-purpose document store with strong support for flexible schemas and nested data, and it added a dedicated time-series collection type in version 5.0. That addition improved ingest efficiency for simple sensor streams, but it does not change the fundamental architecture.
At industrial IoT scale, the problem with MongoDB is index memory. A manufacturing deployment with thousands of sensors and weeks of data requires indices that scale with cardinality. Once the working index set exceeds available RAM, query performance drops sharply. Queries that return aggregate statistics across days of sensor data (the kind that calculate OEE or detect anomalies) can take tens of seconds on a MongoDB instance that has not been sized aggressively for its index requirements.
MongoDB's aggregation pipeline can express time-series queries, but the syntax diverges significantly from SQL. Teams that use SQL for everything else (Grafana, reporting tools, application code) face a context switch every time they query sensor data in MongoDB.
MongoDB belongs in industrial architectures where sensor data is a secondary concern alongside richer document data: configuration records, event logs, or machine metadata that does not require time-series aggregation. For workloads where the sensor stream is the primary data product, it is the wrong tool.
TimescaleDB: PostgreSQL with time-series extensions
TimescaleDB adds hypertables, continuous aggregates, and compression to PostgreSQL. For teams with existing PostgreSQL infrastructure, it offers a low-friction entry point into time-series workloads: same wire protocol, same drivers, same query syntax.
At moderate scale, TimescaleDB performs well on time-range queries. Hypertable partitioning organizes data by time chunk, and continuous aggregates pre-compute common aggregations so dashboards don't recompute against the raw table on every load. For a deployment with a few hundred sensors writing at low-to-moderate frequency, this works.
The constraints appear at higher cardinality and scale. TimescaleDB is an extension on a single-node database. Distributed hypertables (Multinode) — the mechanism for spreading data and query load across multiple nodes in self-hosted deployments — were deprecated in TimescaleDB 2.13 and removed entirely in version 2.14, released February 2024. Self-hosted horizontal scaling is no longer available in the open-source release. Teams that need distributed scale must use Timescale's cloud product.
On storage, TimescaleDB compresses well within time chunks, but the PostgreSQL heritage means index storage overhead accumulates as cardinality grows. A deployment with thousands of unique sensor IDs, each with its own indexed tags, builds a large index footprint.
TimescaleDB is the right choice for teams already on PostgreSQL who need time-series capabilities at moderate scale and do not need to distribute across multiple nodes. For workloads that require tens of thousands of sensors, high ingest rates, or horizontal elasticity, the ceiling appears before the workload does. The CrateDB vs. TimescaleDB comparison covers the architectural trade-offs between a PostgreSQL extension and a distributed SQL database in more detail.
InfluxDB: what changed in version 3 and what didn't
InfluxDB has gone through more architectural change than any other database in this comparison. Understanding the current state requires knowing what version you are evaluating; the gap between InfluxDB 1.x, 2.x, and 3.x is significant.
InfluxDB 1.x used a custom storage engine (TSM) and InfluxQL, a SQL-like query language. InfluxDB 2.x deprecated InfluxQL in favor of Flux, a functional data scripting language designed for time-series pipelines. InfluxDB 3.x deprecated Flux and reintroduced SQL support via Apache DataFusion, while also shifting the storage layer to Apache Parquet and Arrow. InfluxQL was reinstated in version 3.x for backwards compatibility.
The storage architecture change in version 3 is a genuine improvement. Parquet columnar storage and the Arrow memory format address the high-cardinality problem that made InfluxDB struggle at industrial scale in earlier versions: tag values no longer create unbounded time-series proliferation at the storage layer in the same way. The cardinality problem in InfluxDB 1.x and 2.x is detailed in the InfluxDB cardinality post.
What version 3 did not address:
Cross-system joins. InfluxDB 3 Enterprise supports SQL JOINs between measurements stored within InfluxDB — inner, left, right, and full outer joins are all available. The constraint is architectural: InfluxDB is a purpose-built time-series store, not a general operational SQL database. Production schedules, asset registries, maintenance records, and shift calendars typically live in relational systems outside InfluxDB. Joining sensor readings to that context requires either importing all of it into InfluxDB or maintaining a separate SQL layer for cross-domain queries. In InfluxDB 2.x with Flux, querying external SQL sources like a PostgreSQL asset registry was supported via sql.from(). That capability was not carried forward into InfluxDB 3's native SQL.
On-premises deployment maturity. InfluxDB 3 is architected cloud-first. Self-hosted deployment options exist but the primary development and support investment targets the cloud product. For DACH manufacturers operating under data sovereignty requirements, this is a structural constraint.
Migration tax for existing users. Teams on InfluxDB 2.x with Flux code must rewrite queries to migrate to version 3. The Flux-to-SQL migration is not a drop-in change; Flux pipelines that handle transformations inline require a different SQL structure. Teams that adopted Flux because it was the stated future of InfluxDB now face a migration regardless of whether they stay on InfluxDB or move to another database.
InfluxDB 3 is the right choice for greenfield cloud deployments where the workload is pure time-series with no cross-system joins, no on-premises requirement, and no existing InfluxDB 2.x codebase to migrate. For teams already on InfluxDB 2.x evaluating migration options, the Telegraf migration guide covers the specific steps for moving to CrateDB.
CrateDB: distributed SQL for operational sensor data
CrateDB is a distributed database built for high-cardinality operational data. It ingests sensor readings via Telegraf, Kafka, MQTT, or direct SQL insert, indexes every field on arrival, and serves queries over the full dataset with no export step between ingest and query.
Ingest at scale. ABB's Ability Genix platform processes sensor data from industrial equipment at 1 million values ingested per second, with 30,000 to 120,000 events retrieved per second for active analytics. TGW Group runs 900,000 sensors per distribution center through CrateDB. These are production deployments, not benchmarks.
Query across the full dataset. CrateDB executes standard SQL including JOIN, GROUP BY, window functions, and time-series aggregations without a pre-aggregation step. An OEE calculation that joins sensor readings to a production schedule to a shift calendar runs in a single query against live data. No intermediate export, no feature store, no batch pipeline between the sensor stream and the OEE dashboard. For the cross-plant SQL pattern specifically querying multiple facilities in one statement without manual exports, see Cross-Plant Visibility in One SQL Query.
Schema flexibility. Industrial sensor configurations change constantly: new equipment, firmware updates, new vendors. CrateDB's OBJECT(DYNAMIC) column type absorbs new sensor fields at ingest without an ALTER TABLE migration or pipeline downtime. The schema evolves with the data. For a step-by-step walkthrough of how this works during a live deployment change, see How to Add a New Sensor Type to Your Industrial Database Without Pipeline Downtime.
PostgreSQL wire protocol. CrateDB implements the PostgreSQL wire protocol, which means PostgreSQL client libraries, JDBC drivers, and BI tools (Grafana, Tableau, Power BI) connect without a proprietary driver. CrateDB's SQL is not PostgreSQL SQL: it is ANSI SQL with CrateDB-specific extensions including distributed query capabilities, dynamic column types, and full-text search. Clients that rely on PostgreSQL-specific syntax or transactions will encounter differences, but tools that issue standard analytical SQL work without modification.
Deployment options. CrateDB runs on CrateDB Cloud, on-premises via CrateDB Enterprise, with Docker, on Kubernetes, and on Linux, macOS, and Windows. For manufacturers under DACH data sovereignty requirements, on-premises and factory-edge deployments are a supported path, not a workaround.
Side-by-side comparison
| MongoDB | TimescaleDB | InfluxDB 3 | CrateDB | |
|---|---|---|---|---|
| Architecture | Document store | PostgreSQL extension | Columnar time-series (Parquet/Arrow) | Distributed SQL |
| Query language | MQL / aggregation pipeline | SQL (PostgreSQL) | SQL (DataFusion) / InfluxQL | ANSI SQL + CrateDB extensions |
| High-cardinality ingest | RAM-constrained | Moderate | Improved in v3 | Designed for it |
| Cross-table JOINs | Aggregation pipeline only | Full SQL JOINs | Within InfluxDB only | Full SQL JOINs |
| Schema flexibility | Native (document model) | Fixed schema | Tag-based model | Dynamic columns (OBJECT(DYNAMIC)) |
| Horizontal scaling | Sharding (Atlas) | Cloud only (removed from OSS) | Cloud-first | Built-in distributed |
| On-premises | Yes | Yes (single node) | Limited | Yes (Enterprise + OSS) |
| OEE analytics | Requires pipeline | Possible with SQL | Requires all context data in InfluxDB | Native SQL across sensor + schedule data |
| PostgreSQL wire protocol | No | Yes | No | Yes |
How to choose
MongoDB if your sensor data is secondary to richer document workloads and you already run MongoDB for other data in the same application. Sensor-primary workloads will hit RAM and query performance ceilings.
TimescaleDB if you are on PostgreSQL and your scale stays within single-node bounds. The SQL compatibility and continuous aggregates work well for moderate deployments. Evaluate the distributed scaling path carefully before committing at higher sensor counts — self-hosted horizontal scaling was removed from the open-source release in February 2024.
InfluxDB 3 if you are building greenfield in the cloud, your workload is pure time-series with no cross-system SQL joins required, and you have no existing InfluxDB 2.x codebase. If you are migrating from InfluxDB 2.x, compare the migration cost to InfluxDB 3 against migrating to CrateDB, as the effort is comparable and the capability delta is significant.
CrateDB if you need SQL joins across sensor, asset, and production data; if you ingest at high rates across many sensor types; if you need on-premises deployment for data sovereignty; or if your analytics layer includes OEE, predictive maintenance, or any query that crosses sensor readings with operational context. For teams building the predictive maintenance layer specifically, the Predictive Maintenance Database Architecture guide covers the database schema and SQL patterns from sensor data to maintenance trigger.
The CrateDB vs. InfluxDB comparison covers the InfluxDB-specific trade-offs in more detail. For teams focused on the time-series-versus-relational decision, the data historian vs. time-series database guide covers the architectural distinctions that determine which approach fits which workload.
The fastest way to evaluate CrateDB against your workload is to run queries against a live industrial dataset. Run queries on live data on cratedb.com/explore, no installation required.
For teams ready to deploy: Start free on CrateDB Cloud, run with Docker, or deploy on Kubernetes.
Frequently Asked Questions
There is no single best choice; the right database depends on the workload.
- MongoDB fits when sensor data is secondary to richer document data you already store.<
- TimescaleDB fits teams already on PostgreSQL operating within single-node scale.
- InfluxDB 3 fits greenfield cloud deployments where the workload is pure time-series (sensor readings in, dashboards out) and analytics do not need to correlate that sensor data with relational operational context like production schedules or asset registries.
- CrateDB fits workloads that need SQL joins across sensor, asset, and production data, high ingest rates across many sensor types, or on-premises deployment for data sovereignty.
MongoDB, TimescaleDB, and CrateDB all support on-premises deployment. InfluxDB 3 is architected cloud-first; self-hosted options exist, but the primary development and support investment targets the cloud product. CrateDB runs on-premises through CrateDB Enterprise and CrateDB OSS, including factory-edge deployments, which is a supported path for DACH manufacturers operating under data sovereignty requirements.
CrateDB's OBJECT(DYNAMIC) column type absorbs new sensor fields at ingest without an ALTER TABLE migration or pipeline downtime. When new equipment, firmware updates, or vendors introduce fields the schema has not seen, those fields are indexed on arrival and become queryable immediately. The schema evolves with the data rather than requiring a planned migration window.
Yes. CrateDB implements the PostgreSQL wire protocol, so PostgreSQL client libraries, JDBC drivers, and BI tools such as Grafana, Tableau, and Power BI connect without a proprietary driver. CrateDB's SQL is ANSI SQL with CrateDB-specific extensions, including distributed queries, dynamic column types, and full-text search. Tools that issue standard analytical SQL work without modification, while clients that rely on PostgreSQL-specific syntax or transactions will encounter differences.
In production, ABB's Ability Genix ingests 1 million sensor values per second, with 30,000 to 120,000 events retrieved per second for active analytics. TGW Group runs 900,000 sensors per distribution center through CrateDB. These are production deployments, not benchmarks. CrateDB indexes every field on arrival and serves queries over the full dataset with no export step between ingest and query.