Skip to content
Data guide

What Is a Data Historian?

A data historian is software that records, stores, and retrieves time-stamped process data from industrial equipment: sensors, PLCs, SCADA systems, distributed control systems (DCS), and MES platforms. It captures each measurement as a tag name, a timestamp, a value, and a quality code. This four-field model is purpose-built for operational environments where thousands of measurement points stream data every second. The data historian term is used interchangeably with process historian, operational historian, and plant historian, depending on the industry.

The technology has been standard in process industries since the 1980s, used in production across manufacturing, energy, oil and gas, and utilities worldwide. It was originally designed to handle plant-floor data volumes that relational databases could not sustain at speed or cardinality.

Traditional historians capture operational data reliably. They were not designed for multi-site aggregation, mixed data models, or sub-second SQL queries the moment new readings arrive. CrateDB is a modern data historian: a distributed SQL database that ingests at historian scale and makes every record queryable without pre-aggregation.

This page covers how data historians work, the types of historians, the leading products in the market, when to modernize, and how CrateDB functions as the operational analytics layer that keeps your data live and queryable.

Types of data historians

Operational historian vs. enterprise historian

An operational historian sits close to the plant floor. It stores high-frequency raw data from SCADA systems and control equipment, with write speed and determinism as the primary design constraints.

An enterprise historian aggregates data from multiple operational historians across sites into a centralized store. It serves cross-site benchmarking, KPI reporting, and long-term trend analysis rather than real-time plant-floor monitoring.

On-premise vs. cloud historian

Traditional historian deployments are on-premise, running in the plant or data center alongside the control systems they monitor. A cloud historian moves this capability to cloud infrastructure, supporting multi-site consolidation, AI integration, and remote access without on-premise hardware.

CrateDB supports both models. Deploy at the edge for low-latency operational data capture, or in the cloud for consolidated analytics across sites. CrateDB Cloud is a fully managed option for teams that want historian-grade ingestion without infrastructure overhead.

cr-quote-image

Data historian vs. time series database

The two categories overlap but are not the same.

  Data historian     Time series database
Data model  Tag-based (name, timestamp, value, quality code)     Flexible: metrics, labels, fields, timestamps
Query interface Proprietary API or limited SQL     SQL, PromQL, Flux, or REST depending on product
OT connectivity Native: OPC-UA, OPC-DA, Modbus     Requires an external integration layer
Compression Lossy: exception-based recording (SDT)     Configurable; lossless options available
Scalability    Vertical, bounded by hardware     Horizontal, shared-nothing clusters
Multi-model data Time series only     Varies; few support JSON or vector natively

 

Traditional historians were built for OT environments where stability and determinism matter more than query flexibility. Time series databases were built for IT environments where scale and developer access matter more than plant-floor integration.

CrateDB covers both: native OPC connectivity via Crosser, historian-grade ingestion rates, and standard SQL that data engineers can query without a proprietary interface. For the full comparison, see Data Historian vs. Time Series Database.

cr-quote-image

The leading data historian products

The historian market is dominated by a small set of long-established vendors.

  • AVEVA PI System (formerly OSIsoft PI) is the most widely deployed historian. It uses a proprietary tag-based architecture and is standard in energy, utilities, and large-scale manufacturing.

  • AspenTech IP.21 (InfoPlus.21) is standard in oil and gas, chemicals, and process manufacturing. It offers deep process analytics within its original deployment model.

  • AVEVA Historian (formerly Wonderware Historian) is common in discrete manufacturing and factory automation, typically deployed alongside AVEVA InTouch SCADA.

  • Canary Historian is a more recent option with cloud capabilities, used across mid-market industrial deployments.

Each product does what it was designed for. None was designed for multi-model queries, SQL access from data science teams, or horizontal scaling beyond a single site. That is the gap CrateDB addresses. CrateDB provides a documented modernization path from AVEVA PI with schema migration guidance and connector support.

cr-quote-image

Why traditional historians fall short

Legacy historians often struggle when industrial data strategies mature. Common challenges include:

  • Proprietary storage formats
  • Limited scalability for high-resolution data
  • Difficult cloud/edge integration
  • Lack of flexibility for semi-structured or unstructured data
  • Slow access to granular historical records
  • High licensing and expansion costs
  • Limited support for advanced analytics or AI workloads

Modern Industry 4.0 and digital transformation programs require a historian that works across OT and IT, natively supports cloud and edge deployments, and integrates easily with AI/ML pipelines. 

cr-quote-image

CrateDB: a modern data historian

CrateDB is designed for the next generation of industrial data workloads. It combines time-series performance with multi-model flexibility, delivering ingestion, storage, and analytics in one distributed SQL engine. Key capabilities are:

  • High-frequency ingestion: Capture data from sensors, equipment, logs, and control systems at high throughput.

  • Instant query availability: Automatic indexing ensures that new data is ready for analytics within milliseconds.

  • Support for all industrial data types: Store time-series, relational, JSON, logs, text, geospatial, vector, and BLOB data in the same engine.

  • Flexible SQL interface: Use standard SQL without proprietary query languages or tools.

  • Fast analytics on large datasets: Run queries, aggregations, joins, search, and AI workloads on years of high-resolution data.

  • Horizontal scalability: Add nodes to increase capacity and maintain performance as data volumes grow.

  • Edge, on-premise, hybrid or cloud deployment: Same engine, full performance, in any environment.

  • AI-ready architecture: Feed high-quality historical and real-time data directly into AI and predictive maintenance models.

cr-quote-image

Bridging OT and IT

Modern industrial architectures require solutions that work seamlessly across both OT and IT domains. CrateDB provides a unified data foundation that:

  • Connects to OT systems on the shop floor
  • Integrates with IT systems in the cloud and data center
  • Handles structured and unstructured data from both sides
  • Supports AI, ML, BI, and enterprise analytics
  • Delivers consistent performance from edge to cloud
This convergence simplifies architectures, reduces integration overhead, and unlocks holistic insights across the entire operation.
cr-quote-image

OT connectivity with Crosser

CrateDB integrates with Crosser, a hybrid-first streaming and integration platform designed for industrial environments.

Crosser enables:

  • Direct connectivity to PLCs, SCADA, MES, sensors, and equipment
  • Support for industrial protocols (OPC-UA, Modbus, MQTT, etc.)
  • Edge nodes that run on-site behind the firewall
  • Low-code data flows for filtering, enrichment, transformation
  • Real-time stream analytics at the edge
  • OT-to-IT data pipelines delivering cleaned operational data into CrateDB
Together, Crosser and CrateDB deliver a complete OT ingestion pipeline: from machine signals to real-time analytics and AI.
cr-quote-image

Enterprise data fabric & AI readiness with EOT.AI

For organizations that need contextualized, governed, AI-ready operational data, CrateDB partners with EOT.AI.

EOT.AI provides:

  • No-code/low-code pipelines that extract data from SCADA, historians, legacy OT systems
  • A semantic layer that models assets, hierarchies, and metadata
  • Unification of operational data with business and contextual data
  • Data governance, quality, lineage, and access control
  • AI-ready data products combining time-series, events, metadata, and context
  • Integration with cloud analytics platforms, BI tools, and ML pipelines
CrateDB becomes the scalable storage and analytics engine for these enriched datasets, enabling predictive maintenance, digital twins, anomaly detection, and plant-wide optimization.

CrateDB + EOT.AI + Crosser together form a complete industrial data stack:
OT connectivity → data fabric & semantic modeling → scalable historian storage → analytics & AI.
cr-quote-image

Architecture overview

CrateDB integrates ingestion, storage, and analytics into a fault-tolerant distributed architecture.

Ingestion:

  • MQTT, Kafka, Flink, REST, IoT gateways, batch imports
  • OPC-UA, Modbus, SCADA, MES, PLCs (via Crosser or EOT.AI)
  • High-throughput, low-latency pipelines
  • Automatic indexing and dynamic schemas

Storage

Analytics
  • Sub-second queries on large time-series datasets
  • Aggregations, downsampling, trend analysis
  • Vector search for anomaly detection and AI
  • Text search for logs and contextual data
  • Real-time dashboards, monitoring, and alerting tools

Many industrial teams are moving toward a Unified Namespace (UNS) architecture, where all OT and IT data flows through a central message broker before reaching the storage and analytics layer. CrateDB integrates directly into UNS architectures as the operational database below the broker, keeping every sensor reading queryable the moment it arrives.

cr-quote-image

Integrates across the industrial ecosystem

CrateDB fits naturally into industrial architectures. It connects with:

  • SCADA / MES / PLCs (via Crosser or EOT.AI)
  • Industrial gateways (via Crosser or EOT.AI)
  • Cloud IoT hubs
  • BI tools
  • AI/ML platforms

It can serve as:

  • A primary historian
  • A scalable extension to an existing historian
  • A unified data hub combining historian, IoT, and contextual IT data
  • A foundation for AI and predictive analytics initiatives
cr-quote-image

Modernization path from AVEVA PI system

Many industrial organizations rely on the AVEVA PI System for operational data collection. As deployments grow, teams often encounter limits around scale, proprietary formats, analytics flexibility, cloud integration, or total cost of ownership.

CrateDB offers a modern path forward: it can replace traditional historians such as AVEVA PI when organizations need a scalable, open, and cost-efficient platform for high-resolution time-series data, advanced analytics, and AI workloads. Many companies keep their existing OT data collection layer (PLCs, SCADA, OPC connectors, or PI interfaces) while adopting CrateDB as the new long-term historian and analytics backbone.

CrateDB is especially effective in PI modernization scenarios where teams want to:

  • Move beyond storage bottlenecks
  • Retain high-resolution data cost-effectively
  • Run advanced SQL analytics across operational and contextual data
  • Power AI and predictive maintenance models
  • Integrate historian data with cloud, BI, and data platforms
  • Consolidate data silos into a unified storage and analytics engine
  • Support hybrid edge-to-cloud architectures
CrateDB does not aim to replicate proprietary PI functions. Instead, it provides a more flexible, scalable, and open backbone for long-term historical data, analytics, and AI, enabling a smooth transition from legacy historians to a modern data platform.
cr-quote-image

When do you need a data historian?

A dedicated historian is the right choice if your operation meets any of these conditions:

  • Your equipment communicates via OPC-UA, OPC-DA, or Modbus and requires a collector that speaks those protocols natively
  • You need to store sensor readings at millisecond or sub-second frequency across hundreds or thousands of tags
  • Your regulatory environment (FDA 21 CFR Part 11, ISO 50001, energy reporting) requires deterministic data capture with quality codes
  • You already run a PI System or AVEVA Historian and want to extend its analytics layer without replacing it

Consider CrateDB as your historian layer if you need to go beyond these requirements: querying raw records with standard SQL, combining sensor data with JSON event logs or vector embeddings, scaling across sites without vertical hardware limits, or feeding operational data directly into AI pipelines.

cr-quote-image

Deployment options

CrateDB runs everywhere with the same reliability and performance:

  • On-premise for regulated environments
  • Edge for real-time local processing
  • Cloud for elastic scalability
  • Hybrid deployments combining any of the above
This deployment flexibility supports any industrial or IoT architecture.
cr-quote-image

Benefits

  • Real-time visibility into operations
  • High-resolution storage at lower cost
  • Unified OT + IT data foundation
  • Semantic modeling and AI readiness via EOT.AI
  • Reliable OT connectivity via Crosser
  • Faster analytics without ETL
  • Open SQL interface
  • Ideal for predictive maintenance and digital twins
cr-quote-image
On-demand Webinar
Screenshot DIY Machine Learning for Industrial Operations in Energy and Manufacturing
DIY Machine Learning for Industrial Operations in Energy and Manufacturing

Engineers and operators in energy or manufacturing want to harness AI without writing code?
Join our live webinar and learn how to build and run machine learning models on live industrial data—no data science team required.

 
On-demand Webinar
Screenshot of Modern Data Pipelines for Smart Factories
Modern Data Pipelines for Smart Factories with CrateDB and Crosser​

Watch this webinar recording to learn how to build modern data pipelines for smart factories—from industrial equipment to end-user analytics and AI applications.

Webinar
Time Series Data in CrateDB - time series tutorials
Time Series Data in CrateDB

This tutorial explores how CrateDB manages time series by querying data in milliseconds, utilizing simple SQL, combining various data types, handling high ingest rates, and storing extensive historical data.

Want to know more?

Additional resources

FAQ

A data historian is software that records, stores, and retrieves time-stamped process data from industrial equipment: sensors, PLCs, SCADA systems, distributed control systems (DCS), and MES platforms. It captures each measurement as a tag name, a timestamp, a value, and a quality code. Historians are standard infrastructure in manufacturing, energy, oil and gas, and utilities for operational monitoring and analysis.

A process historian, also called an operational historian or plant historian, is a specialized database that stores time-series data from manufacturing processes: temperature, pressure, flow rates, vibration, and sensor readings. The terms process historian, operational historian, and data historian are used interchangeably across manufacturing, oil and gas, and utilities.

A data historian uses a proprietary tag-based data model with built-in OPC connectivity and hardware-specific compression, optimized for OT environments. A time series database uses open standards, SQL or API interfaces, and horizontal scalability. CrateDB combines historian-grade ingestion rates with time series, relational, JSON, and vector data in a single engine queryable via standard SQL.

An operational historian runs close to the plant floor, storing high-frequency raw data from SCADA and control systems. An enterprise historian aggregates data from multiple sites for reporting and cross-site analysis. CrateDB handles both: ingesting raw operational data at the edge and consolidating multiple sites in the cloud or on-premise via its shared-nothing distributed architecture.

Most historians collect data via OPC-UA, the current IEC 62541 interoperability standard, and its predecessor OPC-DA. Many also support Modbus, MQTT, and proprietary vendor protocols. CrateDB connects to OT sources via Crosser, an integration partner that supports 200+ industrial protocols and writes directly into CrateDB tables.

Yes. CrateDB often complements PI by providing a scalable, SQL-based platform for long-term data retention, advanced analytics, and AI workloads. Many organizations integrate PI with CrateDB to offload storage, run complex queries, or combine PI data with contextual enterprise data.

A cloud historian is a data historian deployed in or connected to cloud infrastructure. It extends plant-floor capabilities with cloud-scale storage, multi-site data consolidation, and integration with AI and analytics services. CrateDB Cloud is a fully managed deployment that ingests historian-grade time-series data and makes it queryable via standard SQL without infrastructure management.

Traditional historians use algorithms like swinging door trending (SDT) or exception-based recording to store only values that change beyond a defined threshold. This reduces storage but can distort analytics and predictive models. CrateDB stores all raw data using columnar storage with automatic indexing. CrateDB v5.10 achieved a 50% reduction in storage compared to prior releases (internal benchmark), without lossy compression.