Import data from MySQL¶
Various ways exist to migrate your existing data from MySQL to CrateDB. However, these methods may differ in performance. A fast and reliable way to migrate is to use CrateDB’s existing export and import tools.
Table of contents
Setting up the example table¶
In this example, you have an existing table foo
in MySQL with the
following schema:
mysql> CREATE TABLE foo (
... id integer primary key,
... name varchar(255),
... date datetime,
... fruits set('apple', 'pear', 'banana')
... ) CHARACTER SET utf8 COLLATE utf8_unicode_ci;
To import sample data into foo
:
mysql> INSERT INTO foo (id, name, date, fruits)
... VALUES (1, 'foo', '2014-10-31 09:22:56', 'apple,banana'),
... (2, 'bar', null, 'pear');
Exporting data from MySQL¶
MySQL does not support JSON
as an output format, but it is possible to
write to a file using the csv
format.
Example:
mysql> SELECT id, name
... INTO OUTFILE '/tmp/dump.csv'
... CHARACTER SET utf8
... FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
... LINES TERMINATED BY '\n'
... FROM foo;
Make sure fields are separated by a comma (,
), that values are quoted with
a pair of "
, and that lines are terminated by the newline character \n
.
Note that you specify utf8
as encoding for writing into the outfile. This is
important because CrateDB requires utf8
encoding.
You may need to set the character encoding of the client and mysqld
in the
my.cnf
:
[client]
default-character-set=utf8
[mysqld]
character-set-server=utf8
All values written to csv
are quoted and therefore interpreted as strings
when read from the convert script later.
Converting date/time types¶
The standard output for date/time types in MySQL is a string. However,
CrateDB uses a long
type to store timestamps (with millisecond precision).
To prevent problems with dates that have timezone information, use MySQL’s
builtin UNIX_TIMESTAMP
function to convert date
, time
,
datetime
, timestamp
and year
types into UNIX timestamps directly in
the SELECT
statement.
The output of this function must be multiplied by 1000
(converting s
to
ms
) to obtain the correct long
value that can be used for importing
into CrateDB.
mysql> SELECT UNIX_TIMESTAMP(datetime_column)*1000 from table_name;
The final export query:
mysql> SELECT id, name, UNIX_TIMESTAMP(date)*1000, fruits
... INTO OUTFILE '/tmp/dump.csv'
... CHARACTER SET utf8
... FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
... LINES TERMINATED BY '\n'
... FROM foo;
Importing data into CrateDB¶
Use the COPY FROM statement to import your CSV file into CrateDB.
cr> COPY foo_imported FROM '/tmp/dump.csv' WITH (bulk_size=1000);
See also
Read Bulk Inserts for more information how to import huge datasets into CrateDB.
Third-party tool: csvkit¶
The tools provided by csvkit allow you to directly insert CSV data into CrateDB via SQLAlchemy, using CrateDB’s native driver to create the table, guess the corresponding data types, and insert any data found in the CSV file.
For example:
sh$ csvsql --db crate://localhost:4200 --insert /tmp/dump.csv
See also
See also the documentation of csvsql. To use the SQLAlchemy driver of CrateDB, the latest version of the CrateDB Python package is required.