Migrating queries from Rockset to CrateDB¶
Introduction¶
As we work with more and more companies looking to migrate their workloads from Rockset to CrateDB, we have built expertise on the details of what a migration entails.
Both Rockset and CrateDB use SQL, so there is no need for your teams to learn a new query language, there are however a few differences in the dialect.
First level columns¶
In Rockset, every record is entirely a JSON object, but in CrateDB tables have
first level columns which themselves can be of type OBJECT.
If you are looking at migrating data from Rockset keeping changes to a minimum,
you can just define your tables with a single column of type OBJECT
.
CREATE TABLE myschema.mytable (
data OBJECT
);
Accessing fields within objects¶
Rockset uses a dot notation for attribute access. In CrateDB, a bracket notation is used instead, similar to what you would use in Python for accessing dictionaries.
INSERT INTO myschema.mytable
VALUES('{"field1":1,"field2":"abc","field3":[1,2,3]}');
SELECT data['field2']
FROM myschema.mytable;
Inspecting inferred schemas¶
After you load some JSON data into your database, you may want to check the schema
CrateDB has inferred automatically.
In Rockset, you would do this with the DESCRIBE
command, in CrateDB you can use
SHOW CREATE TABLE
.
SHOW CREATE TABLE myschema.mytable;
Un-nesting arrays¶
Both Rockset and CrateDB provide an UNNEST function for un-nesting arrays.
In Rockset however you may have queries where UNNEST
is used in a CROSS JOIN
,
like in the example with cars data in Rockset’s documentation.
To achieve the same results in CrateDB, you can use UNNEST
in the list of
columns you are selecting instead.
SELECT data['make'], UNNEST(data['models'])
FROM companies;
Functions equivalence¶
You will find that a large number of functions work exactly in the same way in Rockset and CrateDB. There are however some functions that exists under different names.
We have compiled below a list of equivalences, and will come back and expand this list as new functions come out in our work with prospects.
If there is anything in your queries for which you do not see an equivalence, do not hesitate to reach out as chances are CrateDB has the same functionality under a different name, or there may be a simple workaround for your use cases.
Rockset function |
CrateDB equivalent |
---|---|
ACOSH(x) |
|
APPROX_DISTINCT(x[, e]) |
|
ARRAY_CONCAT(array1, array2, …) |
|
ARRAY_CONTAINS(array, element) |
|
ARRAY_CREATE(val1, val2, …) |
|
ARRAY_DISTINCT(array) |
|
ARRAY_EXCEPT(array1, array2) |
|
ARRAY_FLATTEN(array) |
|
ARRAY_JOIN(array, delimiter, nullReplacement) |
|
ARRAY_MAP(function_name, array) |
|
ARRAY_REMOVE(array, val) |
|
ARRAY_SHUFFLE(array) |
|
ARRAY_SORT(array) |
|
ARRAY_UNION(array1, array2) |
|
ASINH(x) |
|
ATANH(x) |
|
BITWISE_AND(x, y) |
|
BITWISE_OR(x, y) |
|
BITWISE_XOR(x, y) |
|
BOOL_AND(x) |
|
BOOL_OR(x) |
|
CARDINALITY(array) |
|
COUNT_IF(x) |
|
DATE_PARSE(string, format) |
UDF: |
DAYS(n) |
|
EUCLIDEAN_DIST(array, array) |
|
EVERY(x) |
|
FORMAT_DATE(format, date) |
|
FORMAT_DATETIME(format, datetime) |
|
FORMAT_TIME(format, time) |
|
FORMAT_TIMESTAMP(format, timestamp[, timezone] |
|
FROM_BASE64(s) |
|
FROM_HEX(s) |
|
HOURS(n) |
|
HYPOT(x, y) |
|
IS DISTINCT FROM |
|
JSON_FORMAT(x) |
|
JSON_PARSE(x) |
|
KEYS(obj) |
|
LOG(x) |
|
LOG10(x) |
|
LOG2(x) |
|
MILLISECONDS(n) |
|
MINUTES(n) |
|
MONTHS(n) |
|
PARSE_DATE_ISO8601(string) |
|
PARSE_DATETIME_ISO8601(string) |
|
POSITION(substring IN string) |
|
POW(x, y) |
|
RAND() |
|
REGEXP_LIKE(string, pattern) |
|
SEQUENCE(start, stop[, step]) |
|
SIGN(x) |
See [1] for CrateDB <5.8 |
SPLIT(string, delimiter)[index] |
|
ST_ASTEXT(geography) |
See ST_ASTEXT for |
ST_GEOGFROMTEXT(well_known_text) |
|
ST_GEOGPOINT(longitude, latitude) |
|
ST_INTERSECTS(geography_a, geography_b) |
|
ST_X(point) |
|
ST_Y(point) |
|
TIME_BUCKET(interval, timestamp[, origin]) |
|
TIMESTAMP_SECONDS(n) |
|
TO_BASE64(b) |
|
TO_HEX(b) |
|
TRUNCATE(x) |
|
TYPEOF(x) |
|
YEARS(n) |
|
Appendix¶
ST_ASTEXT¶
CrateDB user-defined function (UDF) implementation of ST_ASTEXT
function for polygons.
CREATE FUNCTION ST_ASTEXT(geography geo_shape)
RETURNS TEXT
LANGUAGE JAVASCRIPT AS $$
function st_astext(g) {
return 'POLYGON(' + g.coordinates.map(r => '(' + r.map(p => p.join(' ')).join(', ') + ')').join(', ') + ')';
}
$$;