HTTP endpoint¶

CrateDB provides a HTTP Endpoint that can be used to submit SQL queries. The endpoint is accessible under <servername:port>/_sql.

SQL statements are sent to the _sql endpoint in json format, whereby the statement is sent as value associated to the key stmt.

Parameter substitution¶

In addition to the stmt key the request body may also contain an args key which can be used for SQL parameter substitution.

The SQL statement has to be changed to use placeholders where the values should be inserted. Placeholders can either be numbered (in the form of $1, $2, etc.) or unnumbered using a question mark ?.

The placeholders will then be substituted with values from an array that is expected under the args key:

sh$ curl -sS -H 'Content-Type: application/json' \
... -X POST '127.0.0.1:4200/_sql' -d@- <<- EOF
... {
...   "stmt":
...     "select date,position from locations
...     where date <= \$1 and position < \$2 order by position",
...   "args": ["1979-10-12", 3]
... }
... EOF
{
  "cols": [
    "date",
    "position"
  ],
  "rows": [
    [
      308534400000,
      1
    ],
    [
      308534400000,
      2
    ]
  ],
  "rowcount": 2,
  "duration": ...
}

Note

In this example the placeholders start with an backslash due to shell escaping.

Warning

Parameter substitution must not be used within subscript notation.

For example, column[?] is not allowed.

The same query using question marks as placeholders looks like this:

sh$ curl -sS -H 'Content-Type: application/json' \
... -X POST '127.0.0.1:4200/_sql' -d@- <<- EOF
... {
...   "stmt":
...     "select date,position from locations
...     where date <= ? and position < ? order by position",
...   "args": ["1979-10-12", 3]
... }
... EOF
{
  "cols": [
    "date",
    "position"
  ],
  "rows": [
    [
      308534400000,
      1
    ],
    [
      308534400000,
      2
    ]
  ],
  "rowcount": 2,
  "duration": ...
}

Note

With some queries the row count is not ascertainable. In this cases rowcount is -1.

Default schema¶

It is possible to set a default schema while querying the CrateDB cluster via _sql end point. In such case the HTTP request should contain the Default-Schema header with the specified schema name:

sh$ curl -sS -H 'Content-Type: application/json' \
... -X POST '127.0.0.1:4200/_sql' \
... -H 'Default-Schema: doc' -d@- <<- EOF
... {
...   "stmt":"select name, position from locations order by id limit 2"
... }
... EOF
{
  "cols": [
    "name",
    "position"
  ],
  "rows": [
    [
      "North West Ripple",
      1
    ],
    [
      "Outer Eastern Rim",
      2
    ]
  ],
  "rowcount": 2,
  "duration": ...
}

If the schema name is not specified in the header, the default doc schema will be used instead.

Column types¶

CrateDB can respond a list col_types with the data type ID of every responded column. This way one can know what exact data type a column is holding.

In order to get the list of column data types, a types query parameter must be passed to the request:

sh$ curl -sS -H 'Content-Type: application/json' \
... -X POST '127.0.0.1:4200/_sql?types' -d@- <<- EOF
... {
...   "stmt":
...     "select date, position from locations
...      where date <= \$1 and position < \$2 order by position",
...   "args": ["1979-10-12", 3]
... }
... EOF
{
  "cols": [
    "date",
    "position"
  ],
  "col_types": [
    11,
    9
  ],
  "rows": [
    [
      308534400000,
      1
    ],
    [
      308534400000,
      2
    ]
  ],
  "rowcount": 2,
  "duration": ...
}

The Array collection data type is displayed as a list where the first value is the collection type and the second is the inner type. The inner type could also be a collection.

Example of JSON representation of a column list of (String, Integer[]):

"column_types": [ 4, [ 100, 9 ] ]

Available data types¶

IDs of all currently available data types:

ID	Data type
0	NULL
1	Not supported
2	CHAR
3	BOOLEAN
4	TEXT
5	IP
6	DOUBLE PRECISION
7	REAL
8	SMALLINT
9	INTEGER
10	BIGINT
11	TIMESTAMP WITH TIME ZONE
12	OBJECT
13	GEO_POINT
14	GEO_SHAPE
15	TIMESTAMP WITHOUT TIME ZONE
16	Unchecked object
17	INTERVAL
18	ROW
19	REGPROC
20	TIME
21	OIDVECTOR
22	NUMERIC
23	REGCLASS
24	DATE
25	BIT
26	JSON
27	CHARACTER
28	FLOAT VECTOR
29	UUID
100	ARRAY

Error handling¶

Queries that are invalid or cannot be satisfied will result in an error response. The response will contain an error code, an error message and in some cases additional arguments that are specific to the error code.

Client libraries should use the error code to translate the error into an appropriate exception:

sh$ curl -sS -H 'Content-Type: application/json' \
... -X POST '127.0.0.1:4200/_sql' -d@- <<- EOF
... {
...   "stmt":"select name, position from foo.locations"
... }
... EOF
{
  "error": {
    "message": "SchemaUnknownException[Schema 'foo' unknown]",
    "code": 4045
  }
}

To get more insight into what exactly went wrong an additional error_trace GET parameter can be specified to return the stack trace:

sh$ curl -sS -H 'Content-Type: application/json' \
... -X POST '127.0.0.1:4200/_sql?error_trace=true' -d@- <<- EOF
... {
...   "stmt":"select name, position from foo.locations"
... }
... EOF
{
  "error": {
    "message": "SchemaUnknownException[Schema 'foo' unknown]",
    "code": 4045
  },
  "error_trace": "..."
}

Note

This parameter is intended for CrateDB developers or for users requesting support for CrateDB. Client libraries shouldn’t make use of this option and not include the stack trace.

Error codes¶

Code	Error
40000	Generic bad request error.
4000	The statement contains an invalid syntax or unsupported SQL statement
4001	The statement contains an invalid analyzer definition.
4002	The name of the relation is invalid.
4003	Field type validation failed
4004	Possible feature not supported (yet)
4005	Alter table using a table alias is not supported.
4006	The used column alias is ambiguous.
4007	The operation is not supported on this relation, as it is not accessible.
4008	The name of the column is invalid.
40010	The document storage source is missing.
40011	Invalid snapshot name.
40012	A running snapshot is using the relation a user wants to drop or close.
40013	Raised if trying to remove an object while others still depend on it.
4010	User is not authorized to perform the SQL statement.
4011	Missing privilege for user.
4031	Only read operations are allowed on this node.
4032	The relation is closed, any read or write is forbidden.
4033	Only DQL operations are allowed on this relation.
4034	Only DML operations are allowed on this relation.
4035	Only DQL and DDL operations are allowed on this relation.
4036	Only DDL operations are allowed on this relation.
4037	Only DQL or DELETE operations are allowed on this relation.
4040	Resource not found (generic error, no further details).
4041	Unknown relation.
4042	Unknown analyzer.
4043	Unknown column.
4044	Unknown type.
4045	Unknown schema.
4046	Unknown Partition.
4047	Unknown Repository.
4048	Unknown Snapshot.
4049	Unknown user-defined function.
40410	Unknown user.
40411	Document not found.
4091	A document with the same primary key exists already.
4092	A VersionConflict. Might be thrown if an attempt was made to update the same document concurrently.
4093	A relation with the same name exists already.
4094	The used table alias contains tables with different schema.
4095	A repository with the same name exists already.
4096	A snapshot with the same name already exists in the repository.
4097	A partition for the same values already exists in this table.
4098	A user-defined function with the same signature already exists.
4099	A user with the same name already exists.
40910	An object with the same name already exists.
5000	Unhandled server error.
5001	The execution of one or more tasks failed.
5002	One or more shards are not available.
5003	The query failed on one or more shards
5004	Creating a snapshot failed
5005	The query was killed by a `kill` statement
5006	Verification of the related repository failed
5030	The query was killed by a `kill` statement
5030	Cluster unavailable, no further details
5031	No master node discovered (yet)
5032	No node is available
5033	No shard of the related relation is available
5034	Processing global cluster change events timed out
5035	The global cluster state is not recovered (yet)

Bulk operations¶

The HTTP endpoint supports executing a single SQL statement many times with different parameters.

Instead of the args (Parameter substitution) key, use the key bulk_args. This allows to specify a list of lists, containing all the parameters which shall be processed. The inner lists need to match the specified columns.

The bulk response contains a results array, with a row count for each bulk operation. Those results are in the same order as the issued operations of the bulk operation.

Here an example that inserts three records at once:

sh$ curl -sS -H 'Content-Type: application/json' \
... -X POST '127.0.0.1:4200/_sql' -d@- <<- EOF
... {
...   "stmt": "INSERT INTO locations (id, name, kind, description)
...           VALUES (?, ?, ?, ?)",
...   "bulk_args": [
...     [1337, "Earth", "Planet", "An awesome place to spend some time on."],
...     [1338, "Sun", "Star", "An extraordinarily hot place."],
...     [1339, "Titan", "Moon", "Titan, where it rains fossil fuels."]
...   ]
... }
... EOF
{
  "cols": [],
  "duration": ...,
  "results": [
    {
      "rowcount": 1
    },
    {
      "rowcount": 1
    },
    {
      "rowcount": 1
    }
  ]
}

Statements with a result set cannot be executed in bulk. The supported bulk SQL statements are:

Insert
Update
Delete

Bulk errors¶

There are two kinds of error behaviors for bulk requests:

Analysis error: Occurs if the statement is invalid, either due to syntax errors or semantic errors identified during the analysis phase before the execution starts. In this case the whole operation fails and you’ll get a single error:
```
{
    "error": {
        "code": 4043,
        "message": "ColumnUnknownException[Column y unknown]"
    }
}
```

Runtime error: For errors happening after the analysis phase succeeded during execution. For example on duplicate primary key errors or check constraint failures. In this case CrateDB continues processing the other bulk arguments and reports the results via a rowcount where -2 indicates an error. Additionally, each failing bulk operation result element contains an error object with the related code and message:

{
    "cols": [],
    "duration": 2.195417,
    "results": [
        {
            "rowcount": 1
        },
        {
            "rowcount": -2,
            "error": {
                "code": 4091,
                "message": "DuplicateKeyException[A document with the same primary key exists already]"
            }
        }
    ]
}

Note

To avoid too much memory pressure caused by errors, only the first 10 errors happening on each involved shard will contain an error payload. Any following error is exposed only by the -2 row count without any details.

Note

The error_trace option does not work with bulk operations.