A machine at Plant 7 shows an anomaly at 09:15. Engineering wants to know whether the same pattern is appearing at the other eleven sites. The data exists but producing a cross-plant answer takes hours, not seconds, because every facility runs its own analytics stack.
This is the analytics silo problem. It is not a data collection failure, every site is collecting data. It is an architecture problem.
How per-facility analytics silos form
Industrial analytics infrastructure grows one site at a time. A factory installs a historian and a SCADA system. The local engineering team adds a reporting database, connects a BI tool, and builds dashboards for the production floor. It works. The plant has analytics.
Then a second site comes online. It follows the same pattern: historian, export job, local reporting database, local dashboards. Same logical architecture, different physical systems. By the time the company operates ten or fifteen facilities, it runs ten or fifteen independent analytics installations, each optimized for local conditions.
No team planned this. It arrived through reasonable local decisions made before cross-site visibility was a stated requirement. The result: benchmarking performance across facilities requires pulling an export from each site, normalizing independently-designed schemas, and assembling the result manually. A query that should take seconds takes a day.
The consolidation workaround and its cost
The standard response to the silo problem is a central consolidation layer, a data warehouse or data lake that receives scheduled exports from each site, normalizes them into a common schema, and serves cross-plant queries. This approach works. It is also expensive to maintain, slow by design, and fragile.
The latency is structural. A consolidation layer built on batch exports inherits the batch cycle. Each site's data lands in the central store only after the export job completes. The central view reflects a state from minutes, hours, or the previous shift, depending on export frequency. For OEE decisions and downtime response, the export cycle is too slow. The question has already been answered the hard way by the time the dashboard updates.
The schema maintenance cost is real. When one site adds a new sensor type or changes a unit of measurement, the ETL pipeline feeding the consolidation layer requires a migration. In a ten-site operation, this maintenance surface multiplies: ten source schemas, ten pipelines, one central schema they all map to. Engineering time goes to keeping data current, not building analytics that improve operations.
The cross-system join gap persists. Correlating sensor data with ERP downtime codes or MES production records still requires a separate extraction from the ERP system and a join in the application layer, unless ERP data has also been loaded into the consolidation store through a second pipeline. Most teams have not done this. The join remains manual.
One SQL query layer across all facilities
CrateDB is a distributed SQL database. Data from every facility lands in one central CrateDB cluster, written continuously via Telegraf, a Kafka connector, or direct HTTP ingestion from edge agents at each site. There is no separate consolidation layer, no export job between sites, and no central schema migration required when a new sensor type arrives at one facility. Dynamic columns absorb new fields at ingestion.
The cross-plant query runs directly against live data:
-- OEE benchmark across all plants: current week -- One query, all facilities, data from the last few seconds SELECT plant_id, ROUND( (SUM(runtime_seconds) / NULLIF(SUM(planned_seconds), 0.0)) * (SUM(actual_output) / NULLIF(SUM(ideal_output), 0.0)) * (SUM(good_units) / NULLIF(SUM(actual_output), 0.0)) * 100.0, 1 ) AS oee_pct, hyperloglog_distinct(asset_id) AS assets_monitored FROM shift_production WHERE ts > DATE_TRUNC('week', NOW()) GROUP BY plant_id ORDER BY oee_pct ASC;
This query returns every facility's OEE for the current week, ranked from lowest to highest, directing improvement resources to the sites with the most to gain. The plant_id column is populated by the Telegraf agent at each site on ingestion; no post-processing step assigns facility context after the fact. CrateDB's distributed query engine executes the aggregation across all cluster nodes in parallel, so query time does not grow linearly as the number of facilities in the result set increases.
Edge deployment: logical replication to a central cluster
For deployments where data sovereignty or OT/IT separation requires the database to stay on the factory floor, CrateDB Enterprise runs at the factory edge. Each edge cluster ingests sensor data locally and serves SQL queries independently (production data does not leave the facility). When cross-plant visibility is also required, logical replication pushes data from every edge cluster to a central CrateDB Cloud cluster. The cross-plant query runs against the cloud cluster, which holds the aggregated multi-site dataset.
The SQL is identical whether a query runs against a local edge cluster or the aggregated cloud cluster. Engineers at each facility query their own data in standard SQL. Central operations teams query all facilities in the same standard SQL: no separate query language, no custom consolidation layer, no ETL between edge and cloud. The edge analytics guide covers the full edge-to-cloud deployment pattern including hybrid configurations.
The same architecture that eliminates the export step for real-time OEE dashboards is what makes cross-plant queries possible: data arrives live and is immediately available for SQL, from any site, without a consolidation step in between. The modern data historian guide covers the OT/IT integration pattern: historians at the OT layer, CrateDB as the analytics layer above them.
Joining sensor data with operational records in one query
The silo problem extends beyond cross-plant sensor data. It also runs along the boundary between OT systems (sensor readings, machine states) and business systems (ERP downtime codes, MES production orders, shift schedules). Answering "why did Line 3 stop at 11:40?" requires correlating both domains. In most industrial stacks, that correlation happens in the application layer, manually, after two separate exports.
CrateDB stores both sensor tables and operational reference tables in the same system. The join runs in SQL, on live data:
-- Root-cause: sensor anomalies correlated with ERP downtime codes -- Last 7 days, all plants, temperature threshold exceeded during a registered downtime event SELECT s.plant_id, s.asset_id, s.ts, s.temperature_c, e.downtime_code, e.downtime_description FROM sensor_readings s JOIN erp_downtime_events e ON s.asset_id = e.asset_id AND s.ts BETWEEN e.start_ts AND e.end_ts WHERE s.ts > NOW() - INTERVAL '7 days' AND s.temperature_c > 85.0 ORDER BY s.ts DESC;
The erp_downtime_events table is populated from the ERP system via a standard connector (JDBC, REST API, or a Kafka topic). Once it is in CrateDB alongside the sensor data, the join runs as a single SQL statement from any connected BI tool or Grafana panel. CrateDB speaks the PostgreSQL wire protocol, so Grafana, Tableau, Power BI, and Superset all connect without custom drivers. A root-cause analysis that previously required an analyst to reconcile two separate exports becomes a dashboard query.
At scale: TGW Logistics Group and SPGo!
TGW Logistics Group engineers automated warehouse and intralogistics systems for global manufacturers and retailers. Each distribution center runs 900,000 sensors, generating more than 100,000 messages every few seconds. CrateDB handles the query layer across the full sensor footprint, supporting real-time system monitoring and digital twin applications across facilities. For a business where a stalled conveyor segment at one distribution center affects downstream schedules across the network, cross-facility visibility is not a reporting feature — it is an operational requirement.
"CrateDB allows us to operate on any Cloud and on-prem/Edge with simplicity and stellar performance, and significant cost advantages." — Alexander Mann, Owner Connected Warehouse Architecture, TGW Logistics Group
Full story: cratedb.com/stories/tgw-logistics
SPGo! (PETROMIN) monitors mining operations across Peru with 30,000 sensors per mine site and 760 million records processed daily. The sensor configuration differs across sites — each mine runs different sensor types for different extraction conditions. CrateDB provides a single query layer across all sites without requiring a normalized central schema: sensor types that vary between mines land via dynamic columns, and the same SQL queries cover the full multi-site dataset.
Full story: cratedb.com/stories
What this means for engineering teams operating across multiple sites
Cross-plant visibility is an operational capability: the ability to ask "where is this pattern appearing across all our sites?" and get an answer before the shift ends. The silo problem is architectural, not a data shortage, every site has data. The constraint is that the data sits in separate systems, reachable only through export jobs and manual consolidation.
A distributed SQL layer that ingests from all sites into one query surface, without requiring a unified schema to be imposed in advance, removes that constraint. For manufacturing and logistics teams with multi-site sensor infrastructure, that query runs the moment the data lands.
The full Telegraf to CrateDB to Grafana architecture for industrial deployments (ingestion patterns, SQL query structure, and distributed query behavior) is covered in the IoT Analytics Architecture Guide.
Each facility's data does not have to be a separate island. Try CrateDB Live: one Docker command, under 30 minutes to a working multi-source SQL dashboard.
