Spark¶

About

Apache Spark is an open-source distributed computing framework designed for high-speed, versatile big-data processing.

It provides a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters, offering support for various data processing tasks, such as batch processing, real-time streaming, machine learning, and graph analytics.

Databricks

Learn

Getting started with Apache Spark and CrateDB

Using Apache Spark with CrateDB is a powerful combination for processing and analyzing large datasets.

Getting started with Apache Spark and CrateDB

Examples: Ready-to-run programs

Demonstrate how to load a Spark data frame into CrateDB.

https://github.com/crate/cratedb-examples/tree/main/by-dataframe/spark