Functions
CrateDB functions let you transform, enrich, and analyze data in real time, using familiar SQL. Choose from a rich catalog of built-in functions (scalar, aggregate, window, table) and extend behavior safely with user-defined functions (UDFs) when you need something custom. CrateDB’s distributed SQL engine executes these functions close to the data for millisecond-level results at scale.
Why Functions Matter in CrateDB
- Speed at scale: Functions are executed inside the distributed query engine, so complex logic (aggregations, filters, joins, window ops) stays close to your data, no extract-and-process round trips.
- Everything in SQL: Use the SQL you know to work with time series, vectors, text, JSON, geospatial, and relational data, without switching tools.
- Extensible when you need it: Add bespoke business logic as JavaScript UDFs, version them, and reuse across queries and teams.
Built-in Functions
- Scalar functions: String, date/time, geo, math, regex, array, object, conditional, system, and more. Use them in SELECT, WHERE, ORDER BY, GROUP BY.
- Aggregations: Core aggregate functions and expressions for real-time summaries over massive datasets.
- Window functions: Perform calculations across partitions and ordered rows (e.g., running totals, percentiles, top-N per group) with OVER (...).
- Table functions: Generate and transform rows with helpers like unnest(...), regexp_matches(...), and pg_catalog.generate_series(...).
- Operators: Arithmetic, bitwise, comparison, and array comparison operators for expressive filtering and scoring.
You can combine these capabilities to power aggregations, ad-hoc queries, hybrid search, and AI feature pipelines. All in SQL, all in real time.
User-Defined Functions (UDFs)
When built-ins aren’t enough, define your own JavaScript functions and call them from SQL like any other function.
- Language: JavaScript (declared with LANGUAGE JAVASCRIPT).
- Usage: Ideal for custom transformations, domain-specific parsing, specialized math, or feature engineering you want to reuse across queries.
- SQL-native: Create with CREATE FUNCTION, update with CREATE OR REPLACE FUNCTION, remove with DROP FUNCTION, and use in SELECT, WHERE, GROUP BY, etc.
Example use cases: Convert units, normalize messy text, compute a domain-specific score, or extract a value from a complex JSON object, then reuse it everywhere.
Design Best Practices
- Deterministic logic: Keep UDFs deterministic (same inputs generating same outputs) to ensure predictable results and easier debugging.
- Keep it tight: Put heavier lifting into built-ins when possible; reserve UDFs for the truly custom bits to maximize performance.
- Test & version: Treat UDFs as shared assets. Version them, document inputs/outputs, and test on representative datasets.|