Live Stream on Jan 23rd: Unlocking Real Time Insights in the Renewable Energy Sector with CrateDB

Register now
Skip to content
Blog

How To Use Nginx To Make Crate Read-Only

This article is more than 4 years old

Update July 2016: We have added read-only support starting with 0.55.0!

... There's a gist of the Nginx configuration.

At the moment, Crate does not have a builtin concept of ACL or user permissions, but there are several ways to secure a cluster and prevent it from unauthorized access. However, sometimes you want make your cluster just read-only and people have been asking us about that.

This post will show you how to achieve this with a few simple steps using Nginx and a few lines of Lua code.

The HttpLuaModule for Nginx can be hard to install, so we use OpenResty, a web application server that bundles Nginx's core and lots of 3rd-party Nginx modules.
No matter whether you're using Nginx+HttpLuaModule or Openresty, I will just refer to it as Nginx for now.

Once you successfully set up Nginx, we can take a look at the configuration. Let's create a new .conf file that is included by the main nginx.conf on your host, and call it crate-readonly.conf.

The upstream Directive

First, we need to define the upstream servers. Since Crate is using the HTTP protocol we can simply use the private IPs of the Crate instances and their exposed HTTP ports (4200 if not otherwise specified).

Assuming our Crate cluster is already set up and consists of 3 nodes that are available on 10.0.0.{100..103}:4200, the upstream directive looks like

upstream crate {
    server 10.0.0.101:4200;
    server 10.0.0.102:4200;
    server 10.0.0.103:4200;
} 

The default load balancing method is round-robin, where requests to the upstream servers are distributed in a basic round-robin fashion. There are also other methods that are described here, from which the least_conn method could be a good option:

Least-connected allows controlling the load on application instances more fairly in a situation when some of the requests take longer to complete.

upstream crate {
    least_conn;
    server 10.0.0.101:4200;
    server 10.0.0.102:4200;
    server 10.0.0.103:4200;
} 

The server Directive

Crate provides a HTTP Endpoint that can be used to submit SQL queries. The endpoint is accessible on /_sql (see documentation).

Therefore we can define a location directive /_sql that passes the requests to the crate upstreams defined above. The most basic configuration would look like:

server {
  listen 4220;

  location = /_sql {
    # define upstream proxy_pass http://crate;
  }

} 

However, this does not yet limit any functionality, but only exposes ALL functionality via the Nginx proxy.

Even before restricting access we'll need to add some more options to the server configuration:

Permit only POST requests

Restricting requests is mainly a sanity check upfront, so invalid requests that are not POST requests are handled upfront.

if ($request_method !~ ^(POST)$ ) {
     return 403;
} 

Add additional headers to proxy request

Setting the Accept header for proxy requests also minimizes the risk of invalid requests.

proxy_set_header Accept "*/json";

Set the timeout for proxy requests

Queries and especially the COPY FROM statement can take quite a long time to execute. The timeout really depends on the usecase, and whether you want to return without a response. Note that even if the proxy request times out, the execution of the statement in Crate does not abort! That's why we want set the timeout quite high.

proxy_read_timeout 60s;

Improve the error handling

By default, Nginx retries a request on another upstream server if the first one times out. That is something we do not want to happen, so we have to define under which circumstances Nginx should retry on another upstream.

proxy_next_upstream error invalid_header http_500 http_502 http_503; 

Server directive, but still without restrictions

server {
    listen 4220;

    location = /_sql {
        # only permit POST requests
        if ($request_method !~ ^(POST)$ ) {
          return 403;
        }

        # set proxy headers
        proxy_set_header Accept "*/json";

        # set upstream timeout
        proxy_read_timeout 60s;

        # conditions in which case to try the next upstream server
        proxy_next_upstream error invalid_header http_500 http_502 http_503;

        # define upstream
        proxy_pass http://crate;

    }

} 

Access Control With Lua

Now's the time that we need the HttpLuaModule (or OpenResty)!

POST requests to the /_sql endpoint contain the SQL statement in the request body, encoded as a JSON string. The anatomy of the JSON body looks like:

{
  "stmt": "SELECT id, name FROM sys.nodes",
  "args": [],
  "bulk_args": []
} 

Simply by checking the stmt value for the SELECT keyword will allow us to make Crate read-only.

This check can be achieved by a few lines of Lua code:

-- load json package
cjson = require "cjson"
-- read POST body
local body = ngx.req.get_body_data()
if body then
  -- decode JSON string into object
  local data = cjson.decode(body)
  -- check lower case stmt value
  if not string.match(string.lower(data.stmt), "^select.*") then
    -- if sql statement does not start with SELECT
    -- return HTTP 403
    ngx.exit(ngx.HTTP_FORBIDDEN)
  end
else
   -- if no POST date
   -- return HTTP 403
   ngx.exit(ngx.HTTP_FORBIDDEN)
end 

This Lua snippet can live directly in the Nginx configuration file. Either you pass it directly as a string using access_by_lua, or if you prefer to put it into a file, you can use access_by_lua_file and specify the path to the file.

Don't forget to set lua_need_request_body to on, otherwise Lua is not able to read the request body!

upstream crate {
    server 10.0.0.101:4200;
    server 10.0.0.102:4200;
    server 10.0.0.103:4200;
}

server {
    listen 4220;

    location = /_sql {
        # only permit POST requests
        if ($request_method !~ ^(POST)$ ) {
            return 403;
        }
        
        # request access to POST body
        lua_need_request_body on;

        # define access using lua
        access_by_lua '
          -- load json package
          cjson = require "cjson"
          -- read POST body
          local body = ngx.req.get_body_data()
          if body then
            -- decode JSON string into object
            local data = cjson.decode(body)
            -- check lower case stmt value
            if not string.match(string.lower(data.stmt), "^select.*") then
              -- if sql statement does not start with SELECT
              -- return HTTP 403
              ngx.exit(ngx.HTTP_FORBIDDEN)
            end
          else
            -- if no POST date
            -- return HTTP 403
            ngx.exit(ngx.HTTP_FORBIDDEN)
          end
        ';

        # set proxy headers
        proxy_set_header User-Agent $http_user_agent;
        proxy_set_header Accept "*/json";

        # set upstream timeout
        # complex queries may take some time
        proxy_read_timeout 60s;

        # conditions in which case to try the next upstream server
        # note: on timeout you must not try another upstream,
        #       otherwise you would invoke a statement several times
        proxy_next_upstream error invalid_header http_500 http_502 http_503;

        # define upstream
        proxy_pass http://crate;

    }

}

View gist.

Testing The Read-Only Cluster

After restarting Nginx, we are able to test what we've done!

SELECT statement is allowed

$ curl -i 127.0.0.1:4220/_sql -d '{"stmt": "SELECT * FROM sys.nodes"}'
HTTP/1.1 200 OK
Server: ngx_openresty/1.4.3.6
Date: Wed, 04 Feb 2015 22:28:56 GMT
Content-Type: application/json; charset=UTF-8
Content-Length: 2734
Connection: keep-alive
... 

CREATE/UPDATE/DELETE are not permitted

$ curl -i 127.0.0.1:4220/_sql -d '{"stmt": "CREATE TABLE foo (name STRING)"}' 
HTTP/1.1 403 Forbidden
Server: ngx_openresty/1.4.3.6
Date: Wed, 04 Feb 2015 22:29:42 GMT
Content-Type: text/html
Content-Length: 178
Connection: keep-alive
... 

Works!

If you have any questions, let us know on our Support channel on Slack.

Additional Notes

Additional to the steps above, you need to prevent every single node from being exposed to the public.

Also, you probably want to restrict access to the proxy by adding allow and <codeclass="language-nginx">deny to</codeclass="language-nginx">

server { 
    ... 
    allow 127.0.0.1;
    deny all;
    ...
} 

Please note that by default nginx does not forward query parameters. This can cause the admin-ui to not work properly, since it's relying on query parameters for some requests. In order to avoid this, please add the following to the nginx configuration:

location /crate/ {
    proxy_pass http://crate/;
    break;
 }