Geospatial Data in CrateDB
Geospatial queries in most databases require PostGIS, a PostgreSQL extension that adds significant operational overhead and still leaves you managing a separate analytical layer for high-volume workloads. Dedicated GIS systems are powerful but isolated from your operational data, requiring synchronization pipelines every time device locations, sensor readings, or event data updates.
CrateDB includes native geospatial types and spatial functions in the same distributed SQL engine as your time-series and analytical workloads. Location data, device metrics, JSON metadata, and vector embeddings live in the same rows and are queryable together in a single statement.
Geospatial query examples
Native SQL with no proprietary GIS language and no external spatial processing library.
/* Find records within a given radius of a coordinate.
The query joins trip data with community area metadata at query time
No pre-joining, no pre-flattening.
Returns the five longest trips originating within 1km of the Shedd Aquarium. */
SELECT tripid,
miles,
c.name as dropoff_area
FROM taxi_rides t, community_areas c
WHERE distance('POINT(-87.6118581 41.8672843)'::GEO_POINT, pickupcentroidlocation) < 1000
AND dropoffcommunityarea = c.areanumber
ORDER BY miles DESC
LIMIT 5;
+------------------------------------------+-------+--------------+
| tripid | miles | dropoff_area |
+------------------------------------------+-------+--------------+
| e6d27fb09f7b17695a71666654c40fd56b79f472 | 46.5 | LOOP |
| eb8dbd2d81faa21f8f6d80117748a8630c55d55c | 37.27 | OHARE |
| 6aaa0d2aed0423f1321ff5df949afef0e3583d58 | 37.16 | OHARE |
| e9ab078e5e6ed4d29cfbd310ee75d47236d3afbb | 32.03 | OHARE |
| 329b4e6980070cab4c2af19ddbc7b73c1497a669 | 31.67 | OHARE |
+------------------------------------------+-------+--------------+
/* Test whether points fall inside a polygon boundary defined in WKT format.
This query identifies trips starting inside O'Hare airport's boundary
that exceeded 20 miles, grouped by day.
Common pattern for geofencing and zone-based analytics. */
SELECT date_bin('24 hour'::interval, starttime, 0) AS day,
count(*) as trips
FROM taxi_rides
WHERE within(
pickupcentroidlocation,
'POLYGON (
(-87.92151405039768 42.008992106324456,
-87.93962405297475 42.00147632076016,
-87.94067626623455 41.95592273798644,
-87.89918645000783 41.9491763875842,
-87.88992003878232 41.95716729007498,
-87.8821237584836 41.95716462062882,
-87.87074008603852 41.967968085787476,
-87.87727540517479 41.99772302560805,
-87.89707897269737 42.009775581434354,
-87.92151405039768 42.008992106324456)
)'
)
AND miles > 20
GROUP BY day
ORDER BY day DESC;
+---------------+-------+
| day | trips |
+---------------+-------+
| 1714521600000 | 24 |
| 1714435200000 | 110 |
| 1714348800000 | 109 |
| 1714262400000 | 77 |
| 1714176000000 | 52 |
| 1714089600000 | 98 |
| 1714003200000 | 113 |
| 1713916800000 | 82 |
| 1713830400000 | 59 |
| 1713744000000 | 91 |
| 1713657600000 | 69 |
| 1713571200000 | 48 |
| 1713484800000 | 82 |
| 1713398400000 | 75 |
| 1713312000000 | 80 |
| 1713225600000 | 98 |
| 1713139200000 | 104 |
| 1713052800000 | 63 |
| 1712966400000 | 75 |
| 1712880000000 | 142 |
| 1712793600000 | 169 |
| 1712707200000 | 103 |
| 1712620800000 | 154 |
| 1712534400000 | 203 |
| 1712448000000 | 98 |
| 1712361600000 | 71 |
| 1712275200000 | 80 |
| 1712188800000 | 79 |
| 1712102400000 | 72 |
| 1712016000000 | 91 |
| 1711929600000 | 39 |
+---------------+-------+
/* Test whether a line or shape overlaps a polygon.
This query finds community areas with populations over 30,000 that a flight path
between O'Hare and Midway passes over, demonstrating that CrateDB can combine
spatial intersection with structured filters in a single statement. */
SELECT name,
details['population'] AS population
FROM community_areas
WHERE
intersects (
'LINESTRING (-87.90753846894022 41.9807787186139, -87.72939445102593 41.97508268795004, -87.55763552335945 41.97982941555023, -87.47846040428642 41.92427055787562, -87.51102436455034 41.837749425166244, -87.59225264192574 41.80057450627759, -87.64801197528328 41.770600684095974, -87.69988112259261 41.7819633786835, -87.7530469985847 41.78583107072223)',
boundaries
)
AND details['population'] > 30000
ORDER BY population DESC;
+------------------------+------------+
| name | population |
+------------------------+------------+
| UPTOWN | 57182 |
| EDGEWATER | 56296 |
| CHICAGO LAWN | 55931 |
| ALBANY PARK | 48396 |
| LINCOLN SQUARE | 40494 |
| NORWOOD PARK | 38303 |
| GARFIELD RIDGE | 35439 |
| WEST LAWN | 33662 |
| GREATER GRAND CROSSING | 31471 |
+------------------------+------------+
Geospatial and JSON Data in Real Time with CrateDB
In this video, we use real-time climate telemetry from the EU Copernicus Climate Change Service and showcase how to run complex SQL queries to drive insights, such as calculating average temperatures over specific time frames and geographical areas in real-time.
High-volume geospatial ingestion
Geospatial workloads often involve millions of incoming coordinates per second: GPS streams from vehicle fleets, sensor telemetry from infrastructure, device location events from mobile applications. CrateDB ingests location data at high throughput and indexes it automatically on arrival. Data is queryable within milliseconds of ingestion with no manual indexing step and no batch delay before spatial queries can run.
Spatio-temporal analytics
Most geospatial workloads require understanding not only where assets are but when they were there. CrateDB combines time-series and geospatial capabilities in the same database, storing time, location, and rich metadata in the same row. Query patterns like "show me all assets that entered this zone between 7:00 and 8:00" or "which vehicles were in this area during the incident window" run in standard SQL without joining across systems or maintaining a separate location history store.
Native geospatial types
CrateDB supports two native geospatial types covering both simple and complex geometries:
-
geo_pointstores coordinates and GPS positions. Supported formats: array [longitude, latitude] and WKT POINT(longitude latitude). -
geo_shapestores polygons, lines, multipolygons, and other complex structures. Supported formats: GeoJSON and WKT including POLYGON, LINESTRING, and MULTIPOLYGON.
Both types are automatically indexed on ingestion and compatible with common geospatial tools, IoT systems, and GIS pipelines.
Geospatial functions
CrateDB includes three core spatial functions for fast and accurate spatial reasoning:
-
distance(pointA, pointB): calculate the distance between two coordinates in meters. Use for proximity search, nearest-asset queries, and radius filtering. -
within(shape, area): test whether a point or shape falls inside a polygon boundary. Use for geofencing, zone analysis, and containment queries. -
intersects(shapeA, shapeB): test whether two shapes overlap. Use for corridor analysis, route intersection, and territorial overlap detection.
Index precision for both geo_point and geo_shape is tunable: coarse-grained for large-scale proximity searches, fine-grained for high-precision boundary queries.
Hybrid location queries
Geospatial data in CrateDB lives in the same row as time-series metrics, JSON metadata, full-text fields, and vector embeddings. This enables hybrid queries that no separate GIS system can run without application-side merging:
-
Geo + text: assets in this zone that have logged "battery warning" in their status field.
-
Geo + vectors: semantically similar locations filtered by proximity constraint.
-
Geo + metrics: temperature spikes detected in specific geographic areas.
-
Geo + time: movement analysis within defined time windows.
All of these run as single SQL statements against live data.
Geospatial workloads in production
"CrateDB was a better solution for our needs than any other SQL or NoSQL database we tried. It was easy to migrate code off of our legacy SQL database and onto CrateDB to immediately benefit from its data flexibility and scalable performance."
Sheriff Mohamed
Director of Architecture
GolfNow
Clickdrive.io tracks automotive vehicle fleets in real-time. They use CrateDB to enable storage and analysis of up to 2,000 data points per second, per car.
"Most of the cars in our customers' fleets are in use almost 24 hours a day, and we need to store and analyse the massive amounts of data they generate in real time. We tried a few different SQL and NoSQL databases, and CrateDB offered the best combination of high performance, scalability and ease of use."
Go deeper on geospatial data in CrateDB
Documentation
Documentation
Academy
Lesson
Exercise
Related use cases
FAQ
Geospatial data represents specific geographical locations using latitude and longitude coordinates or text fields naming geographical areas like countries or states. This data is crucial for applications that require tracking and analyzing the location of people and objects. CrateDB supports the storage, analysis, and real-time tracking of geospatial data - all using SQL.
Spatial data encompasses various formats used to represent geographic locations and shapes. Common examples include arrays for simple point data and GeoJSON or WKT formats for more complex shapes. CrateDB supports these geospatial data types, offering flexibility and precision in storing and querying geospatial information.
Storing geospatial data efficiently requires a database with strong scalability due to the large volume of data points it generates. It should allow for the control of geographic index precision and resolution to enable faster query results and support exact queries with functions like intersects, within, and distance. CrateDB excels in this area by offering scalable SQL support for geospatial data types, including geo_point and geo_shape, making it ideal for geospatial applications.
The best databases for spatial data should offer advanced geospatial functions to handle spatial count, distance and overlap efficiently. Examples of databases optimized for spatial data include CrateDB, Amazon Aurora, Esri ArcGIS, and MariaDB.
CrateDB's advantage for geospatial workloads is the combination of native spatial functions with a distributed engine built for high-volume ingestion and real-time analytics. Teams tracking millions of device locations per second who also need to run analytical queries — aggregations by zone, correlations with time-series data, joins with device metadata — can do all of that in CrateDB without synchronizing location data to a separate GIS system.
Geospatial analytics involves analyzing data that is tied to specific geographical locations, often represented through maps or spatial coordinates. Unlike traditional data analytics, which typically focuses on numerical or categorical data, geospatial analytics adds a locational context to data, allowing users to visualize and analyze spatial patterns, relationships, and trends. This added dimension enables more accurate decision-making in fields like urban planning, environmental monitoring, and transportation logistics.
CrateDB manages geospatial data by providing native support for geographic data types, such as points, lines, and polygons. It allows users to store, query, and analyze geospatial data efficiently, leveraging SQL for easy querying. CrateDB supports geospatial indexing, enabling fast lookups and proximity searches. It also integrates with common geospatial standards like GeoJSON, making it compatible with a wide range of applications. Additionally, CrateDB’s scalability ensures that it can handle large datasets, making it suitable for real-time geospatial analytics across industries such as logistics, IoT, and smart city initiatives.
PostGIS is a mature and feature-rich spatial extension for PostgreSQL with broader geometry type support and more advanced spatial analysis functions, including topology, raster data, and coordinate system transformations that CrateDB does not currently support.
For teams running complex spatial analysis on moderate data volumes where PostgreSQL is already the operational database, PostGIS is a strong and well-established choice.
CrateDB's geospatial capabilities are more focused: distance, containment, and intersection queries. But they run in a distributed engine designed for high-volume ingestion and real-time analytics at scale.
Teams ingesting millions of location events per second and running analytical queries across billions of rows alongside structured and semi-structured data will find CrateDB significantly simpler to operate at that scale than PostGIS on a single-node PostgreSQL instance.
The choice depends on the workload. If advanced spatial analysis is the primary requirement, PostGIS. If high-volume geospatial ingestion combined with real-time analytics is the requirement, CrateDB.