Apache Superset is an easy-to-use open-source business intelligence (BI) application that offers a rich set of customizable data visualizations. With Superset, you can explore, filter, and organize your data. You can also create and share custom monitoring and analytics dashboards.
CrateDB is a distributed SQL database that makes it simple to capture, store, and analyze massive amounts of machine data in real-time.
In this post, I'll show you how to get started with CrateDB and Apache Superset. And by the end of this post, you should have your very first Superset visualizations running on top of CrateDB data.
Note: I'm using macOS, but my instructions should be trivially adaptable for Linux or Microsoft Windows.
Create a new directory
$ mkdir crate-superset-demo
Change into it:
$ cd crate-superset-demo
We want to use this as a working directory for the rest of this tutorial, so make sure to run all subsequent commands from here.
Get CrateDB set up with a single command:
$ bash -c "$(curl -L https://try.crate.io/)"
This command downloads CrateDB and runs it from the tarball. If you'd like to install CrateDB a bit more permanently, or you are using Windows, check out our collection of one-step install guides.
If you're using the command above, it should pop open the CrateDB admin UI for you automatically once it has finished. Otherwise, head over to http://localhost:4200/ in your browser.
If you ran the command above, it should pop open the CrateDB Admin UI for you automatically once it has finished. Otherwise, head over to http://localhost:4200/ in your browser.
You should see something like this:
If you’re playing around with a fresh CrateDB install, likely, you don't have any data. So head on over to the Help screen by selecting the question mark icon on the left-hand navigation menu.
The help screen looks like this:
Select IMPORT TWEETS FOR TESTING and follow the instructions to authenticate your Twitter account.
Don’t worry. This isn’t going to post anything on your behalf. It doesn’t even look at your tweets. All this does is import a bunch of recent public tweets on Twitter.
Once you're done, select the Tables icon from the left-hand navigation, and then select the tweets
table. You should end up here:
http://localhost:4200/#/tables/doc/tweets
Which should look like this:
Great! CrateDB is set up and running, and you have some test data.
For this demo, I want to show you how to graph a pie chart of the different Twitter clients used to post the tweets you just imported.
To do that, we need to group on the information in the source
column (i.e., the client identification). However, Twitter returns this data as HTML, which we don't want.
For expediency, let's create a view that strips out the HTML with a simple regular expression. (Note: You should not process HTML with regular expressions for any real-world application!)
Navigate to the Console tab, using the left-hand navigation, and then execute the following query in the query console:
CREATE VIEW tweets_source AS
SELECT id,
regexp_matches(source, '>(.*)<')[1] AS source_txt
FROM tweets;
You should see this:
You can test the view with this query:
SELECT * FROM tweets_source limit 100;
Like so:
Nice. Let's move on.
For this tutorial, we'll stick to the official documentation and install Superset inside a Python virtual environment.
Check which version of Python you're running:
$ python3 --version
Python 3.7.2
If you're using Python 3.3 or newer, run this command to set up a Python virtual environment:
$ python3 -m venv env
If you're running an older version of Python, consult the Superset docs for help installing the virtualenv
module.
With that done, you can activate the virtual environment, like so:
$ source env/bin/activate
(For more help working with Python virtual environments, consult the Python documentation.)
Before we continue, and as a precautionary measure, make sure that your Python packaging tools are up-to-date:
$ pip install --upgrade setuptools pip
Unfortunately, at the time of writing, Superset doesn't specify its package dependencies correctly. So, we need a workaround.
First of all, fetch the most up-to-date requirements.txt
file:
$ wget https://raw.githubusercontent.com/apache/incubator-superset/master/requirements.txt
Then, install the required Python packages:
$ pip install -r requirements.txt
Then, add this:
$ pip install sqlalchemy==1.2.18
Now we're ready to install Superset:
$ pip install superset
Finally, install the CrateDB Python Client library so that Superset can speak to CrateDB:
$ pip install crate
Great! Now, let's get Superset running.
Start by initializing the Superset database:
$ superset db upgrade
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> 4e6a06bad7a8, Init
INFO [alembic.runtime.migration] Running upgrade 4e6a06bad7a8 -> 5a7bad26f2a7, empty message
INFO [alembic.runtime.migration] Running upgrade 5a7bad26f2a7 -> 1e2841a4128, empty message
INFO [alembic.runtime.migration] Running upgrade 1e2841a4128 -> 2929af7925ed, TZ offsets in data sources
[...]
INFO [alembic.runtime.migration] Running upgrade 7fcdcde0761c -> 0c5070e96b57, add user attributes table
INFO [alembic.runtime.migration] Running upgrade 0c5070e96b57 -> 1a1d627ebd8e, position_json
INFO [alembic.runtime.migration] Running upgrade 1a1d627ebd8e -> 55e910a74826, add_metadata_column_to_annotation_model.py
Then, set up the environment:
$ export FLASK_APP=superset
Create an admin user:
$ flask fab create-admin
Username [admin]:
User first name [admin]:
User last name [user]:
Email [admin@fab.org]:
Password:
Repeat for confirmation:
Recognized Database Authentications.
Admin User admin created.
Set up the default roles and permissions:
$ superset init
2019-07-12 14:43:01,037:INFO:root:Creating database reference
2019-07-12 14:43:01,055:INFO:root:Syncing role definition
2019-07-12 14:43:01,121:INFO:root:Syncing Admin perms
[...]
2019-07-12 14:43:02,510:INFO:root:Creating missing database permissions.
2019-07-12 14:43:02,530:INFO:root:Creating missing metrics permissions
2019-07-12 14:43:02,536:INFO:root:Cleaning faulty perms
Finally, run Superset:
$ superset run -p 8080 --with-threads --reload --debugger
2019-07-12 14:43:45,211:INFO:werkzeug: * Running on http://127.0.0.1:8080/ (Press CTRL+C to quit)
2019-07-12 14:43:45,213:INFO:werkzeug: * Restarting with stat
2019-07-12 14:43:48,036:WARNING:werkzeug: * Debugger is active!
2019-07-12 14:43:48,058:INFO:werkzeug: * Debugger PIN: 280-982-752
Now, visit http://127.0.0.1:8080/ in your browser:
Log in using the admin account you just created.
Afterward, you should see this:
Fantastic!
Select Sources → Databases from the top navigation menu:
Then, select the Add new record button (the
On the Add Database screen, fill out the following information:
CrateDB
crate://crate@127.0.0.1:4200
Like so:
Then, select the Test Connection button.
You should be presented with a modal popup window saying "Seems OK!"
For now, we can ignore all of the other configuration options. Scroll to the bottom of the page and select Save.
Here's your created database source:
Now, select Sources → Tables from the top navigation menu:
We're going to set up a source for the tweets_source
view we created earlier.
Select the Add new record button (the
On the Import a table definition screen, fill out the following information:
doc
(the default CrateDB schema)tweets_source
Like so:
This works because views function as tables!
Select Save.
Because our view is so simple, there's no need to configure anything. You can ignore the blue notification at the top of the screen.
Select Charts from the top navigation menu.
Then, select the Add new record button (the
On the Create a new chart screen, fill out the following information:
docs.tweet_source
Like so:
Select Create new chart.
You should be presented with the following screen:
From here:
source_txt
.Now, select the Run Query button in the main workspace area.
Et voila! A pie chart.
You can move your cursor over the pie chart to get specific values.
In my case, Twitter for iPhone was responsible for 524 tweets, and Twitter for Android was responsible for 494.
From here, you can title the chart and save it.
Apache Superset is a modern, enterprise-ready business intelligence web application. And CrateDB is a distributed SQL database. Together, they provide a powerful open source platform for data warehousing and business intelligence.
In this post, I showed you how to:
From here, you might want to go on to create your first Superset dashboard.
Questions? Looking for help? Get in touch!