CREATE REPOSITORY¶
You can use the CREATE REPOSITORY statement to
register a new repository that you can use to create, manage, and restore
snapshots.
See also
Synopsis¶
CREATE REPOSITORY repository_name TYPE type
[ WITH (parameter_name [= value], [, ...]) ]
Description¶
The CREATE REPOSITORY statement creates a repository with a
repository name and repository
type. You can configure the different types of repositories WITH
additional parameters.
Note
If the back-end data storage (specific to the repository type) already contains CrateDB snapshots, they will become available to the cluster.
See also
Parameters¶
- repository_name
The name of the repository to register.
- type
The repository type.
Caution
You cannot change any repository parameters after creating the repository (including parameters set by the WITH clause).
Suppose you want to use new parameters for an existing repository. In that
case, you must first drop the repository using the DROP REPOSITORY statement and then recreate it with a new CREATE
REPOSITORY statement.
When you drop a repository, CrateDB deletes the corresponding record from sys.repositories but does not delete any snapshots from the corresponding backend data storage. If you create a new repository using the same backend data storage, any existing snapshots will become available again.
Clauses¶
WITH¶
You can use the WITH clause to specify one or more repository parameter
values:
[ WITH (parameter_name [= value], [, ...]) ]
Parameters¶
The following parameters apply to all repository types:
- max_restore_bytes_per_sec
The maximum rate (bytes per second) at which a single CrateDB node will read snapshot data from this repository. A value of
0disables throttling. Please note that the rate is additionally throttled through the recovery settings.Default:
40mb
- max_snapshot_bytes_per_sec
The maximum rate (bytes per second) at which a single CrateDB node will write snapshot data to this repository.. A value of
0disables throttling.Default:
40mb
All other parameters (see the next section) are specific to the repository type.
Types¶
CrateDB includes built-in support for the following types:
CrateDB can support additional types via plugins.
fs¶
An fs repository stores snapshots on the local file system. If a cluster
has multiple nodes, you must use a shared data storage volume mounted locally
on all master nodes and data nodes.
Note
To create fs repositories, you must configure the list of allowed file
system paths using the path.repo setting.
Parameters¶
- location
- Type:
textRequiredAn absolute or relative path to the directory where CrateDB will store snapshots. If the path is relative, CrateDB will append it to the first entry in the path.repo setting.
Windows UNC paths are allowed if the server name and shares are specified and backslashes are escaped.
The path must be allowed by the path.repo setting.
- compress
- Type:
booleanDefault:trueWhether CrateDB should compress the metadata part of the snapshot or not.
CrateDB does not compress the actual table data.
- chunk_size
- Type:
bigintortextDefault:nullDefines the maximum size of any single file that comprises the snapshot. If set to
null, CrateDB will not split big files into smaller chunks. You can specify the chunk size with units (e.g.,1g,5m, or9k). If no unit is specified, the unit defaults to bytes.
s3¶
An s3 repository stores snapshot on the Amazon Simple Storage Service
(Amazon S3).
Note
If you are using Amazon S3 in conjunction with IAM roles, the
access_key and secret_key parameters must be left undefined.
Additionally, make sure to attach the IAM to each EC2 instance that will run a CrateDB master node or data node. The attached IAM role will provide the necessary credentials when required.
Parameters¶
- access_key
- Type:
textRequired:falseAccess key used for authentication against Amazon Web Services (AWS).
Note
CrateDB masks this parameter. You cannot query the parameter value from the sys.repositories table.
- secret_key
- Type:
textRequired:falseThe secret key used for authentication against AWS.
Note
CrateDB masks this parameter. You cannot query the parameter value from the sys.repositories table.
- endpoint
- Type:
textDefault: The default AWS API endpointThe AWS API endpoint to use.
Tip
You can specify a regional endpoint to force the use of a specific AWS region.
- region
- Type:
textDefault: Inferred from the endpoint if possible or us-east-1.The region to use.
- protocol
- Type:
textValues:http,httpsDefault:httpsProtocol to use.
- bucket
- Type:
textRequiredName of the Amazon S3 bucket used for storing snapshots.
If the bucket does not yet exist, CrateDB will attempt to create a new bucket on Amazon S3.
- base_path
- Type:
textDefault:root directoryThe bucket path to use for snapshots.
The path is relative, so the
base_pathvalue must not start with a/character.
- compress
- Type:
booleanDefault:trueWhether CrateDB should compress the metadata part of the snapshot or not.
CrateDB does not compress the actual table data.
- chunk_size
- Type:
bigintortextDefault:nullDefines the maximum size of any single file that comprises the snapshot. If set to
null, CrateDB will not split big files into smaller chunks. You can specify the chunk size with units (e.g.,1g,5m, or9k). If no unit is specified, the unit defaults to bytes.
- readonly
- Type:
booleanDefault:falseIf
true, the repository is read-only.
- server_side_encryption
- Type:
booleanDefault:falseIf
true, files are server-side encrypted by AWS using theAES256algorithm.
- buffer_size
- Type:
textDefault:5mbMinimum:5mbIf a chunk is smaller than
buffer_size, CrateDB will upload the chunk with a single request.If a chunk exceeds
buffer_size, CrateDB will split the chunk into multiple parts ofbuffer_sizelength and upload them separately.
- max_retries
- Type:
integerDefault:3The number of retries in case of errors.
- use_throttle_retries
- Type:
booleanDefault:trueWhether CrateDB should throttle retries (i.e., should back off).
- storage_class
- Type:
textValues:standard,reduced_redundancyorstandard_iaDefault:standardThe S3 storage class used for objects stored in the repository. This only affects the S3 storage class used for newly created objects in the repository.
- use_path_style_access
- Type:
booleanDefault:trueWhether CrateDB should use path style access. Useful for some S3-compatible providers.
azure¶
An azure repository stores snapshots on the Azure Blob storage service.
Parameters¶
- account
- Type:
textThe Azure Storage account name.
Note
CrateDB masks this parameter. You cannot query the parameter value from the sys.repositories table.
- key
- Type:
textThe Azure Storage account secret key.
Note
CrateDB masks this parameter. You cannot query the parameter value from the sys.repositories table.
- sas_token
- Type:
textThe Shared Access Signatures (SAS) token used for authentication for the Azure Storage account. This can be used as an alternative to the Azure Storage account secret key.
The SAS token must have read, write, list, and delete permissions for the repository base path and all its contents. These permissions need to be granted for the blob service and apply to resource types service, container, and object.
Note
CrateDB masks this parameter. You cannot query the parameter value from the sys.repositories table.
- endpoint
- Type:
textThe Azure Storage account endpoint.
Tip
You can use an sql-create-repo-azure-endpoint to access Azure Storage instances served on private endpoints.
- container
- Type:
textDefault:crate-snapshotsThe blob container name.
Note
You must create the container before creating the repository.
- base_path
- Type:
textDefault:root directoryThe container path to use for snapshots.
- compress
- Type:
booleanDefault:trueWhether CrateDB should compress the metadata part of the snapshot or not.
CrateDB does not compress the actual table data.
- chunk_size
- Type:
bigintortextDefault:256mbMaximum:256mbMinimum:1bDefines the maximum size of any single file that comprises the snapshot. If set to
null, CrateDB will not split big files into smaller chunks. You can specify the chunk size with units (e.g.,1g,5m, or9k). If no unit is specified, the unit defaults to bytes.
- readonly
- Type:
booleanDefault:falseIf
true, the repository is read-only.
- max_retries
- Type:
integerDefault:3The number of retries (in the case of failures) before considering the snapshot to be failed.
gcs¶
A gcs repository stores snapshots on the Google Cloud Storage service.
Parameters¶
- bucket
- Type:
textRequiredName of the Google Cloud Storage bucket used for storing snapshots. The bucket must already exist before the repository is created.
- private_key_id
- Type:
textRequiredThe Private key id for the Google Service account from the json Google Service account credentials.
Note
This parameter will be masked (shown as
[xxxxx]) when querying sys.repositories table.
- private_key
- Type:
textRequiredThe private key in PKCS 8 format for the Google Service account from the json Google Service account credentials.
Note
This parameter will be masked (shown as
[xxxxx]) when querying sys.repositories table.
- client_id
- Type:
textRequiredThe client id for the Google Service account from the json Google Service account credentials.
Note
This parameter will be masked (shown as
[xxxxx]) when querying sys.repositories table.
- client_email
- Type:
textRequiredThe client email for the Google Service account from the json Google Service account credentials.
Note
This parameter will be masked (shown as
[xxxxx]) when querying sys.repositories table.
- base_path
- Type:
textDefault:root directoryThe container path to use for snapshots.
- compress
- Type:
booleanDefault:trueWhether CrateDB should compress the metadata part of the snapshot or not.
- chunk_size
- Type:
bigintortextDefault:nullDefines the maximum size of any single file that comprises the snapshot. If set to
null, the default value 5 Terabyte is used. You can specify the chunk size with units (e.g.,1g,5m, or9k). If no unit is specified, the unit defaults to bytes.
- connect_timeout
- Type:
textDefault:0Defines the timeout to establish a connection to the Google Cloud Storage service. The value should specify the unit. For example, a value of 5s specifies a 5 second timeout. The value of -1 corresponds to an infinite timeout. The default value
0indicates to use the default value of20sfrom the Google Cloud Storage library.
- read_timeout
- Type:
`textDefault:0Defines the timeout to read data from an established connection. The value should specify the unit. For example, a value of 5s specifies a 5 second timeout. The value of -1 corresponds to an infinite timeout. The default value
0indicates to use the default value of20sfrom the Google Cloud Storage library.
- endpoint
- Type:
textRequired:falseEndpoint root url to connect to an alternative storage provider.
url¶
A url repository provides read-only access to an fs repository via one of the supported network access
protocols.
You can use a url repository to restore snapshots.
Parameters¶
- url
- Type:
textThe root URL of the fs repository.
Note
The URL must match one of the URLs configured by the repositories.url.allowed_urls setting.