Rauch Group, an Austrian beverage manufacturer, generates 400 data records per second across its production lines. Temperature, pressure, fill level, conveyor speed. The operations team monitors this data on dashboards that update as product moves through the line: not after a batch job has run, not after data has traveled to a cloud endpoint 800 kilometers away.
That is an edge analytics deployment. The database runs in the building. The queries run in milliseconds. The data stays where the law and the OT network topology say it should.
Building that architecture takes more than picking a database that supports Docker. It requires understanding why cloud-first analytics breaks at the factory floor and what a deployment model that actually fits industrial constraints looks like.
Why cloud-only analytics fails at the factory floor
The promise of cloud analytics is straightforward: move your data to a managed service, pay for what you use, never touch a server. For SaaS telemetry and web analytics, that model works. For manufacturing, it hits three constraints that do not bend.
Connectivity is not guaranteed. A factory floor is not a data center. WAN links drop. OT network segments are isolated by design. Production monitoring that depends on a cloud connection goes dark the moment the link does. Edge analytics runs locally: the query executes against data that is already on the local node, with no dependency on what the network is doing at that moment.
Data volume makes continuous cloud egress expensive. ABB's Ability Genix industrial AI platform ingests 1 million sensor values per second into CrateDB. Routing that volume continuously to a cloud endpoint at standard data transfer rates would consume an analytics budget in network costs alone. Edge-first architecture keeps high-frequency data local and sends aggregated summaries or threshold-triggered events to the cloud. You pay for cloud compute on the data that needs it, not on the firehose.
Latency matters for operational decisions. An OEE dashboard that reflects data from 3 minutes ago is not real-time OEE. A predictive maintenance alert that arrives after the equipment has already tripped is not predictive maintenance. Factory-floor analytics needs sub-second latency between sensor reading and dashboard update. Routing every query through a cloud endpoint adds round-trip time that compounds with query execution time. The edge node eliminates that overhead.
The OT/IT boundary: what it means for your analytics architecture
Industrial facilities run two separate network environments. The OT (operational technology) network contains PLCs, SCADA systems, data historians, and process control equipment. The IT network contains ERP, MES, and enterprise software. In most manufacturing plants, these networks are separated by a firewall or air gap, an intentional design decision that prevents a security incident on the IT side from reaching production systems.
That boundary does not disappear because you want real-time analytics. Your analytics architecture has to respect it.
An edge analytics database sits on the correct side of that boundary: inside the OT network or a DMZ segment that can receive data from historians and SCADA systems directly. CrateDB Enterprise deploys on-premises, inside the network segment where your sensor data lives. It does not require your process data to cross the OT/IT boundary to reach a query engine.
The data flow looks like this:

The shift supervisor's OEE dashboard runs a query against a CrateDB node that is physically in the facility. The query does not touch the WAN. The data does not cross the OT/IT boundary unless you configure an explicit replication job to push summaries to a cloud tier.
For the historian side of this integration, see Modern Data Historian: Real-Time Industrial and IoT Data Platform.
Data sovereignty in DACH: the regulatory reality for manufacturers
For manufacturers in Germany, Austria, and Switzerland, the OT/IT separation constraint intersects with a second requirement: data sovereignty.
Under GDPR, production data that can be linked to individuals, including shift-level worker productivity records and process data tied to named operators, carries restrictions on cross-border data flows. Germany's BDSG adds national-level requirements that apply to personal data processed in Germany. Beyond the statutory rules, most DACH manufacturers operate under internal data governance policies that classify production processes as commercially sensitive and require process data to remain on national or EU infrastructure.
For these companies, routing production data to a US-based or non-EU managed cloud database is not a decision the engineering team makes alone. It requires legal and compliance review that often concludes with a hard stop.
That is not a CrateDB-specific observation. It is why AWS Frankfurt and Azure Germany North exist, and it is why on-premises deployment is not a legacy choice in this segment. It is a current operational requirement.
CrateDB Enterprise is the deployment option for this scenario. It runs on your infrastructure, inside your facility or private data center, under your data governance policies. The query engine, the columnar indexing, and the distributed execution layer are identical to what runs in CrateDB Cloud. Your team writes the same SQL. Your Grafana dashboards connect via the same PostgreSQL wire protocol. The data does not leave your environment.
Rauch Group in Austria runs CrateDB for real-time production monitoring at 400 data records per second. That deployment runs on Rauch's own infrastructure, in Austria. The analytics do not require a cloud connection. The production data stays where Austrian and EU data governance rules say it should.
For a full treatment of the DACH sovereignty question and how to structure a compliant on-premises deployment, see Data Sovereignty for Manufacturing Analytics: Why Cloud-Only Doesn't Work in DACH.
What a live edge query looks like
At the factory floor, the most common query pattern is: what is happening right now, across all monitored assets, and where is something going wrong?
This is a real-time aggregation over the last few minutes of sensor data, grouped by asset, ordered by the metric most likely to indicate a problem. With CrateDB running on an edge node, that query executes against data that was indexed milliseconds ago, not data from the last batch cycle.
SELECT asset_id, AVG(temperature_c) AS avg_temp_c, MAX(vibration_rms_ms2) AS peak_vibration_ms2, COUNT(*) AS readings_in_window, MAX(ts) AS latest_reading FROM sensor_readings WHERE ts > NOW() - INTERVAL '5 minutes' GROUP BY asset_id ORDER BY peak_vibration_ms2 DESC;
The query runs against a five-minute window of live data. CrateDB indexes every field the moment a reading arrives: there is no pre-aggregation step, no rollup table, no materialized view to maintain. Add a new sensor type to the line, and it is immediately queryable in the same pattern.
For the OEE query pattern and how to structure production metrics across shifts and assets, see OEE Analytics on Live Data: How to Move from Nightly Exports to Real-Time Dashboards.
How CrateDB deploys at the edge
Two patterns cover most factory-floor use cases.
Pattern 1: Single-node edge, cloud replication optional.
A single CrateDB node runs on-premises. Telegraf collects data from OPC-UA endpoints or MQTT brokers and writes to CrateDB via the PostgreSQL wire protocol. Grafana runs on the same network and queries CrateDB directly. There is no cloud dependency in the query path.
If cross-plant analytics or long-term historical queries are needed in a cloud environment, CrateDB's logical replication handles the data flow between the edge node and CrateDB Cloud. Logical replication works on a publish and subscribe model: the edge node publishes one or more tables as a publication, and the cloud cluster subscribes to pull data from it. The edge node stays authoritative for live queries. The cloud node receives replicated data and handles historical and cross-facility work.
Pattern 2: Multi-node cluster, on-premises.
For facilities with higher availability requirements or data volumes that exceed what a single node handles, CrateDB runs as a three-node cluster on-premises. CrateDB's shared-nothing architecture means each node handles a portion of ingestion and query traffic. If one node fails, the cluster continues without manual intervention.
ABB's Ability Genix platform demonstrates this at scale: 1 million values ingested per second, 30,000 to 120,000 events retrieved per second. CrateDB runs where the data lives, not in a cloud region that introduces round-trip latency into every sensor read.
Hybrid edge-cloud: live queries, historical context
On-premises does not mean isolated. The most common production architecture for larger manufacturers combines edge nodes at each facility with a CrateDB Cloud cluster that holds aggregated historical data.
Each facility edge node stores the last 30 days of high-frequency sensor data. A scheduled aggregation job computes daily summaries on the edge and writes them to CrateDB Cloud. This is a separate step from logical replication, which syncs raw rows between clusters. Sending aggregates rather than raw data keeps cloud storage and compute costs predictable. Plant engineers query the edge node for sub-second dashboard updates on today's production. Analytics and data science teams query the cloud cluster for trend analysis across facilities over 12 months.
The SQL is the same in both environments. The PostgreSQL wire protocol means a Grafana dashboard configured against the edge node connects to the cloud cluster by changing only the connection string. There is no second query language to learn, no BI tool reconfiguration, no driver change.
For the broader IoT analytics architecture that this edge-cloud pattern sits within, see IoT Analytics Architecture Guide: From Sensor to Dashboard.
AI at the edge: from sensor reading to inference in milliseconds
Predictive maintenance models and anomaly detection models have a freshness requirement that most cloud analytics pipelines cannot satisfy. A model running inference on data that is 10 minutes old is not doing predictive maintenance. It is describing what already happened.
CrateDB stores vector embeddings alongside time-series sensor data in the same table. A single SQL query joins a sensor reading from 200 milliseconds ago with a pre-computed embedding vector and returns a similarity score. All on the edge node. All in sub-second latency.
That means your ML inference layer gets fresh operational data without a separate vector database, a separate embedding store, or a synchronization job that introduces its own latency and failure modes.
TGW Logistics feeds real-time data from 900,000 sensors per distribution center to AI and ML teams for digital twin and predictive modeling workloads. That data reaches the ML layer through CrateDB, a single query engine that handles the sensor time series and the vector data in the same statement.
ABB's Ability Genix platform retrieves 30,000 to 120,000 events per second for industrial AI workloads. That retrieval rate is not achievable with a system that requires data to travel to a cloud endpoint before a query can run.
Get started
If you are building a factory-floor edge analytics deployment, the fastest path to a running query is:
- Start CrateDB on an on-premises node via Docker image or Linux package at cratedb.com/start-free.
- Configure Telegraf with your OPC-UA or MQTT input and the CrateDB output plugin.
- Connect Grafana to CrateDB using the PostgreSQL data source.
- Run your first query against live sensor data.
For DACH deployments with data sovereignty requirements, contact the CrateDB solutions engineering team to discuss on-premises licensing and architecture support.
To run queries against a live industrial IoT dataset before installing anything, start with Run queries on live data. The Exploration Path walks you through a sensor query scenario in a live environment.