COPY TO
¶
You can use the COPY TO
statement to export table
data to a file.
Table of contents
Synopsis¶
COPY table_ident [ PARTITION ( partition_column = value [ , ... ] ) ]
[ ( column [ , ...] ) ]
[ WHERE condition ]
TO DIRECTORY output_uri
[ WITH ( copy_parameter [= value] [, ... ] ) ]
Description¶
The COPY TO
command exports the contents of a table to one or more files
into a given directory with unique filenames. Each node with at least one shard
of the table will export its contents onto their local disk.
The created files are JSON formatted and contain one table row per line and, due to the distributed nature of CrateDB, will remain on the same nodes where the shards are.
Here’s an example:
cr> COPY quotes TO DIRECTORY '/tmp/' with (compression='gzip');
COPY OK, 3 rows affected ...
Note
Currently only user tables can be exported. System tables like sys.nodes
and blob tables don’t work with the COPY TO
statement.
The COPY
statements use Overload Protection to ensure other
queries can still perform. Please change these settings during large inserts if needed.
Parameters¶
table_ident
The name (optionally schema-qualified) of the table to be exported.
column
(optional) A list of column expressions that should be exported. E.g.
cr> COPY quotes (quote, author) TO DIRECTORY '/tmp/'; COPY OK, 3 rows affected ...
Note
When declaring columns, this changes the output to JSON list format, which is currently not supported by the
COPY FROM
statement.
Clauses¶
PARTITION
¶
If the table is partitioned, the optional
PARTITION
clause can be used to export data from a one partition
exclusively.
[ PARTITION ( partition_column = value [ , ... ] ) ]
- partition_column:
One of the column names used for table partitioning.
- value:
The respective column value.
All partition columns (specified by the
PARTITIONED BY clause) must be listed inside the
parentheses along with their respective values using the partition_column =
value
syntax (separated by commas).
Because each partition corresponds to a unique set of partition column row values, this clause uniquely identifies a single partition to export.
Tip
The SHOW CREATE TABLE statement will show you the complete list of partition columns specified by the PARTITIONED BY clause.
WHERE
¶
The WHERE
clauses use the same syntax as SELECT
statements, allowing
partial exports. (see WHERE clause for more information).
Example of using WHERE
clause with
comparison operators for partial export:
cr> COPY quotes WHERE category = 'philosophy' TO DIRECTORY '/tmp/';
COPY OK, 3 rows affected ...
TO
¶
The TO
clause allows you to specify an output location.
TO DIRECTORY output_uri
Parameters¶
output_uri
An expression must evaluate to a string literal that is a well-formed URI. URIs must use one of the supported URI schemes.
Note
If the URI scheme is missing, CrateDB assumes the value is a pathname and will prepend the file URI scheme (i.e.,
file://
). So, for example, CrateDB will convert/tmp/file.json
tofile:///tmp/file.json
.
URI schemes¶
CrateDB supports the following URI schemes:
file
¶
You can use the file://
scheme to specify an absolute path to an output
location on the local file system.
For example:
file:///path/to/dir
Tip
If you are running CrateDB inside a container, the location must be inside the container. If you are using Docker, you may have to configure a Docker volume to accomplish this.
Tip
If you are using Microsoft Windows, you must include the drive letter in the file URI.
For example:
file://C:\/tmp/import_data/quotes.json
Consult the Windows documentation for more information.
s3
¶
You can use the s3://
scheme to access buckets on the Amazon Simple
Storage Service (Amazon S3).
For example:
s3://[<accesskey>:<secretkey>@][<host>:<port>/]<bucketname>/<path>
S3 compatible storage providers can be specified by the optional pair of host and port, which defaults to Amazon S3 if not provided.
Here is a more concrete example:
COPY t TO DIRECTORY 's3://myAccessKey:mySecretKey@s3.amazonaws.com:80/myBucket/key1' with (protocol = 'http')
If no credentials are set the s3 client will operate in anonymous mode. See AWS Java Documentation.
Tip
A secretkey
provided by Amazon Web Services can contain characters such
as ‘/’, ‘+’ or ‘=’. These characters must be URL encoded. For a detailed
explanation read the official AWS documentation.
To escape a secret key, you can use a snippet like this:
sh$ python -c "from getpass import getpass; from urllib.parse import quote_plus; print(quote_plus(getpass('secret_key: ')))"
This will prompt for the secret key and print the encoded variant.
Additionally, versions prior to 0.51.x use HTTP for connections to S3. Since
0.51.x these connections are using the HTTPS protocol. Please make sure you
update your firewall rules to allow outgoing connections on port 443
.
az
¶
You can use the az://
scheme to access files on the Azure Blob Storage.
URI must look like az:://<account>.<endpoint_suffix>/<container>/<blob_path>
.
For example:
az://myaccount.blob.core.windows.net/my-container/dir1/dir2/file1.json
One of the authentication parameters (key or sas_token)
must be provided in the WITH
clause.
Protocol can be provided in the WITH
clause, otherwise https
is used by default.
For example:
COPY source
TO DIRECTORY 'az://myaccount.blob.core.windows.net/my-container/dir1/dir2/file1.json'
WITH (
key = 'key'
)
WITH
¶
You can use the optional WITH
clause to specify copy parameter values.
[ WITH ( copy_parameter [= value] [, ... ] ) ]
The WITH
clause supports the following copy parameters:
- compression
- Type:
text
Values:gzip
Default: By default the output is not compressed.OptionalDefine if and how the exported data should be compressed.
- protocol
- Type:
text
Values:http
,https
Default:https
Optional
- format
- Type:
text
Values:json_object
,json_array
Default: Depends on defined columns. See description below.OptionalPossible values for the
format
settings are:json_object
Each row in the result set is serialized as JSON object and written to an output file where one line contains one object. This is the default behavior if no columns are defined. Use this format to import with COPY FROM.
json_array
Each row in the result set is serialized as JSON array, storing one array per line in an output file. This is the default behavior if columns are defined.
- wait_for_completion
- Type:
boolean
Default:true
OptionalA boolean value indicating if the
COPY TO
should wait for the copy operation to complete. If set tofalse
the request returns at once and the copy operation runs in the background.
- key
- Type:
text
OptionalUsed for azblob scheme only. The Azure Storage Account Key.
Note
It must be provided if sas_token is not provided.
- sas_token
- Type:
text
OptionalUsed for azblob scheme only. The Shared Access Signatures (SAS) token used for authentication for the Azure Storage account. This can be used as an alternative to the The Azure Storage Account Key.
The SAS token must have read, write, and list permissions for the container base path and all its contents. These permissions need to be granted for the blob service and apply to resource types service, container, and object.
Note
It must be provided if key is not provided.