Data Historians: What They Are and When Your Stack Needs More

A data historian is software that records, stores, and retrieves time-stamped process data from industrial equipment: sensors, PLCs, SCADA systems, distributed control systems (DCS), and manufacturing execution systems (MES). Each measurement is stored as four fields: tag name, timestamp, value, and quality code. That four-field model was purpose-built for OT environments where thousands of measurement points stream data every second and write determinism is non-negotiable.

The historian sits at the boundary between the shop floor and the IT network. Its job is to capture operational data reliably, serve it back for process control and compliance, and bridge the OT world of PLCs and SCADA to systems that need access to plant data.

Types of data historians

Operational historian. Sits close to the plant floor. Optimized for write speed and determinism. Typically deployed per-site or per-plant. The primary record for process data at a given facility.

Enterprise historian. Aggregates data from multiple operational historians across sites. Built for cross-site KPI reporting and benchmarking across a manufacturing network.

Historians are also categorized by deployment: traditional on-site installations versus cloud historians that support multi-site consolidation and remote access. The deployment choice affects data latency, data sovereignty requirements, and how much OT protocol translation happens at the edge.

Leading data historian products

The most widely deployed historians in industrial environments:

AVEVA PI System (formerly OSIsoft PI) is the most widely deployed historian. It uses a proprietary tag-based architecture and is standard in energy, utilities, and large-scale manufacturing.
AspenTech IP.21 (InfoPlus.21) is standard in oil and gas, chemicals, and process manufacturing. It offers deep process analytics within its original deployment model.
AVEVA Historian (formerly Wonderware Historian) is common in discrete manufacturing and factory automation, typically deployed alongside AVEVA InTouch SCADA.
Canary Historian is a more recent option with cloud capabilities, used across mid-market industrial deployments.

Data historian vs. time series database

The distinction is design philosophy, not just feature lists.

	Data historian	Time series database
Built for	OT environments: stability, determinism, native protocol connectivity	IT environments: scale, developer access, SQL interfaces
Query interface	Proprietary APIs (PI Web API, OLEDB), vendor-specific query tools	Standard SQL or custom query language (Flux, InfluxQL)
Protocol connectivity	Native OPC-UA, OPC-DA, Modbus, SCADA	MQTT, HTTP, Kafka, varies by product
Data model	Tag-based (tag name, timestamp, value, quality code)	Metric, measurement, or relational row
Deployment	On-premise, close to the OT layer	Cloud-native or hybrid
Designed for	Reliable capture and retrieval of process data	High-volume ingestion and analytical queries

Neither category replaces the other. They solve different problems in the industrial data stack.

For a deeper comparison: Data Historian vs. Time Series Database: Which Belongs in Your Industrial Stack.

Where historians reach their limits

Historians handle OT data capture well. The design constraints that make them reliable for capture become constraints at analytics scale.

Proprietary query access. Retrieving historian data requires vendor APIs, COM/DCOM interfaces, or ODBC connectors. Running a cross-plant query that joins data from three facilities means three API calls, a transformation step, and a result that is already stale by the time the query returns.

No standard SQL. Analysis requires extracting data into external tools before any query can run. Each extraction adds latency and produces a copy of the data that is no longer live. Teams end up reporting on what happened yesterday, not what is happening now.

Schema rigidity for modern sensor types. Traditional historians are built for the four-field tag model. Modern industrial equipment sends JSON payloads, machine event logs, image metadata, and diagnostic strings. Getting that data into a historian requires custom transformation pipelines and upfront schema decisions that do not survive the next hardware generation.

Scale ceilings on high-density deployments. TGW Logistics runs 900,000 sensors per distribution center. ABB ingests 1 million values per second. At those volumes, proprietary storage formats and retrieval architectures built for a different era become operational bottlenecks.

No live ML inference. Predictive maintenance and anomaly detection models need to score live sensor readings, not a data export from the previous shift. Historians were not designed to serve as the data layer for real-time ML inference loops.

How CrateDB fits the industrial data stack

CrateDB is a distributed SQL database built for operational analytics on high-volume, high-cardinality industrial data. It is not a data historian.

Many industrial teams keep their existing historian for OT-layer data capture. The historian handles OPC-UA connectivity, quality codes, and deterministic write for process control and compliance. CrateDB sits alongside it as the analytics layer: the place where data lands for SQL queries, cross-plant joins, live dashboards, and ML workloads. Data arrives in CrateDB and is queryable within milliseconds, with no batch export step in the path.

For teams building new data infrastructure or using modern protocols such as MQTT and HTTP, CrateDB handles high-frequency ingest directly. Telegraf, an open-source plugin-driven metrics agent, routes data from sensors, MQTT brokers, and PLCs straight to CrateDB with no intermediate layer. For OT environments requiring OPC-UA or Modbus connectivity, Crosser (a hybrid-first streaming platform) runs edge nodes on-site and bridges industrial protocols to CrateDB. The OT connectivity layer is preserved without a traditional historian in the path.

The right architecture depends on what is already running. What does not change is where analytics happens: CrateDB.

What CrateDB handles that historians cannot

Standard SQL across all your industrial data. Query sensor readings, JSON payloads, machine event logs, and maintenance records in a single SQL statement. No proprietary API, no vendor tooling, no extract step before the query runs. Any Grafana dashboard, Tableau workbook, or Python script that speaks SQL works against CrateDB without modification.

High-cardinality analytics without pre-aggregation. Most time-series databases degrade as the number of unique series grows. CrateDB is architected for high-cardinality, high-dimensionality analytics from the ground up. Querying across hundreds of thousands of sensors with ad-hoc joins and dimensional filters is a design target, not a workaround. You should not have to know your query patterns in advance to get sub-second performance.

Multi-model data in one engine. Industrial data is not purely numeric. CrateDB stores and indexes time-series, JSON documents, full-text, vector, and geospatial data in one distributed engine. A single query can join a sensor time-series table with a JSON maintenance log, filter by plant location, and rank by text similarity — all with sub-second latency, no data movement required.

Live ML inference alongside operational data. Fault probability scores from a trained predictive maintenance model write back to CrateDB and are JOINable against live sensor readings in SQL. No external feature store, no synchronization lag between the model's input data and the live readings it scores. See: Predictive Maintenance Database Architecture.

Horizontal scale without schema redesign. Scale CrateDB by adding nodes. The cluster self-balances automatically: no manual sharding, no migration scripts, no maintenance window to coordinate. ABB runs 1 million values per second into CrateDB without a dedicated DBA team. The same schema that handles today's sensor density handles ten times the density without a redesign.

OEE and live manufacturing analytics. Calculate Overall Equipment Effectiveness against live readings, not nightly exports. Query across plants in a single SQL statement. Grafana connects via the PostgreSQL wire protocol and reads live data directly from CrateDB with no intermediate layer.

In Production

ABB Ability Genix ingests 1 million values per second into CrateDB and retrieves 30,000 to 120,000 events per second for industrial AI workloads including predictive maintenance. Read the ABB story.

TGW Logistics Group runs 900,000 sensors per distribution center. Real-time data from CrateDB feeds digital twin models and predictive maintenance workloads across the logistics network. Read the TGW story.

Rauch Group streams 400 data records per second from manufacturing facilities in Austria into CrateDB for real-time production monitoring.

Connecting Your OT Stack to CrateDB

CrateDB integrates with the OT connectivity layer your operation already runs.

Telegraf is an open-source, plugin-driven metrics agent that collects data from sensors, MQTT brokers, PLCs, and hundreds of other sources and routes it directly to CrateDB. Teams already running Telegraf with InfluxDB swap one line in the output plugin configuration to point at CrateDB instead. See: Migrating from InfluxDB to CrateDB: A Telegraf Output Plugin Swap Guide. For OPC-UA and MQTT ingest via Telegraf: How to Ingest OPC-UA and MQTT Data into SQL with Telegraf and CrateDB.

Crosser is a hybrid-first streaming platform that supports OPC-UA, Modbus, MQTT, and other industrial protocols. Edge nodes run on-site and translate OT data to CrateDB in real time. Data does not leave the plant until it reaches CrateDB's ingestion layer.

EOT.AI provides no-code and low-code pipelines for extracting data from legacy OT systems, with semantic modeling and data governance. For teams with complex historian-to-CrateDB migration paths, EOT.AI handles the transformation layer.

For teams using MQTT or HTTP-based industrial protocols directly, CrateDB's streaming connectors handle ingest without an intermediate broker.

See all CrateDB integrations.

Deployment options

CrateDB runs where your data and compliance requirements dictate.

CrateDB Enterprise deploys on-premises or in a private cloud. Required for DACH manufacturers and other regulated environments where process data cannot leave the facility. The same SQL, the same schema, with full control over the infrastructure it runs on. Learn about CrateDB Enterprise.

Edge deployment. CrateDB runs at the factory floor with limited or intermittent connectivity to central infrastructure. Local query execution and data buffering before sync. See: Data Sovereignty for Manufacturing Analytics.

CrateDB Cloud runs as a fully managed service on AWS, Azure, or GCP. Free tier available, no credit card required. Suitable for multi-site analytics where data sovereignty is not a deployment constraint. Start free on CrateDB Cloud.

Hybrid. CrateDB Enterprise on-premises or at the edge, syncing to CrateDB Cloud for cross-plant analytics. One SQL interface across all environments.

Run queries on live industrial data

The Industrial IoT exploration path runs OEE queries, maintenance analytics, fault density analysis, and geospatial queries against 500,000 timestamped sensor readings from four German plants. From setup to a live dashboard in under 30 minutes.

Run queries on live data →

Evaluating CrateDB for a production industrial workload? Talk to a Solutions Engineer.

On-demand Webinar

DIY Machine Learning for Industrial Operations in Energy and Manufacturing

Engineers and operators in energy or manufacturing want to harness AI without writing code?
Join our live webinar and learn how to build and run machine learning models on live industrial data—no data science team required.

On-demand Webinar

Screenshot of Modern Data Pipelines for Smart Factories

Modern Data Pipelines for Smart Factories with CrateDB and Crosser

Watch this webinar recording to learn how to build modern data pipelines for smart factories—from industrial equipment to end-user analytics and AI applications.

Webinar

Time Series Data in CrateDB

This tutorial explores how CrateDB manages time series by querying data in milliseconds, utilizing simple SQL, combining various data types, handling high ingest rates, and storing extensive historical data.

A data historian is software that records, stores, and retrieves time-stamped process data from industrial equipment including sensors, PLCs, SCADA systems, and distributed control systems. Each measurement is captured as four fields: tag name, timestamp, value, and quality code. The technology was purpose-built for OT environments where write determinism and native protocol connectivity are non-negotiable.

Most industrial teams keep their existing historian for OT-layer data capture: OPC-UA connectivity, quality codes, and deterministic write for process control and compliance. CrateDB serves as the analytics layer alongside it. Data lands in CrateDB and is queryable in standard SQL within milliseconds, with no batch export step or proprietary API in the path.

Data historians are built for OT environments with native OPC-UA, Modbus, and SCADA connectivity, deterministic write, and proprietary tag-based storage. Time series databases are built for IT environments with SQL or custom query interfaces and horizontal scalability. The two categories solve different problems and are often deployed together in a modern industrial data stack.

An operational historian runs at the plant level, storing high-frequency raw data from SCADA and PLCs with write speed as the primary design constraint. An enterprise historian aggregates data from multiple operational historians across sites for cross-plant KPI reporting and benchmarking. Many large industrial organizations run both tiers.

If your operation depends on native OPC-UA, OPC-DA, or Modbus connectivity, millisecond deterministic data capture with quality codes, or deep integration with PI System-based process control workflows, a traditional historian remains the right tool for OT data collection. CrateDB is the analytics layer that sits alongside it, not a replacement for OT-native capture.

What Is a Data Historian?

Types of data historians

Leading data historian products

Data historian vs. time series database

Where historians reach their limits

How CrateDB fits the industrial data stack

What CrateDB handles that historians cannot

In Production

Connecting Your OT Stack to CrateDB

Deployment options

Run queries on live industrial data

Want to know more?

Related reading

Blog

Data Historian vs. Time Series Database: Which Belongs in Your Industrial Stack

Blog

OEE Analytics on Live Data: How to Move from Nightly Exports to Real-Time Dashboards

Blog

Migrating from InfluxDB to CrateDB: A Telegraf Output Plugin Swap Guide

Blog

Predictive Maintenance Database Architecture: From Sensor Data to Maintenance Trigger in SQL

Guide

Why Industrial IoT Data Breaks Traditional Databases

FAQ

Product

Developers

Company

Community