Independent Time Series Benchmark Confirms CrateDB’s Top-Tier Performance

Read now
Skip to content
Blog

Building Query Lambdas with CrateDB's Views and the HTTP Endpoint

This guide demonstrates how to create reusable, parameterized queries using CrateDB's views and execute them via the CrateDB HTTP Endpoint. This method streamlines query management, making it easier for former Rockset users to transition to CrateDB.

Step 1: Create a CrateDB Cloud environment

(if you already have one, you can skip this step)

  • Create a user as described here
  • Deploy a cluster as described here

Step 2: Create a table

After you have set up a CrateDB Cloud instance, you can proceed by creating a table. This could be done using the HTTP endpoint, but in this example, we use the CrateDB Cloud Console.

Use this code, paste it into the console, and execute.

CREATE TABLE test.weather_data (
    timestamp TIMESTAMP,
    location VARCHAR,
    temperature DOUBLE,
    humidity DOUBLE,
    wind_speed DOUBLE
);

a57f0819-ebd5-4993-9d08-2ec9740d431e

Step 3: Insert sample data

Now that the table is created, it is time to insert some rows. In this example, we use a publicly available weather dataset. Copy the code into the Console and execute it to load the 70k rows.

COPY test.weather_data 
FROM 'https://github.com/crate/cratedb-datasets/raw/main/cloud-tutorials/data_weather.csv.gz' 
WITH (format='csv', compression='gzip', empty_string_as_null=true);

Step 4: Query the data

Let's assume a query where NULL readings are interpolated based on the previous and the next value.

WITH OrderedData AS (
    SELECT timestamp,
          location,
          temperature,
          LAG(temperature, 1) IGNORE NULLS OVER w AS prev_temp,
          LEAD(temperature, 1) IGNORE NULLS OVER w AS next_temp
    FROM test.weather_data
    WINDOW w AS (PARTITION BY location ORDER BY timestamp)
)
SELECT timestamp,
      location,
      temperature,
      CASE
          WHEN temperature IS NOT NULL THEN temperature
          WHEN prev_temp IS NOT NULL AND next_temp IS NOT NULL THEN (prev_temp + next_temp) / 2
          ELSE NULL
      END AS interpolated_temperature
FROM OrderedData
WHERE temperature is NULL
ORDER BY location, timestamp
LIMIT 50;

This could also be executed using the HTTP endpoint.

In a shell, set the following environment variables.

export your_username=<USERNAME>
export your_password=<PASSWORD>
export cluster_url=<Host of the CrateDB Cluster - can be found on the overview page of your cluster>>
When the variables are set, you can execute the following.

curl -sS -u ${your_username}:${your_password} -H 'Content-Type: application/json' \
-X POST https://${cluster_url}:4200/_sql -d@- <<- EOF
{
  "stmt": "
    WITH OrderedData AS (
        SELECT timestamp,
              location,
              temperature,
              LAG(temperature, 1) IGNORE NULLS OVER w AS prev_temp,
              LEAD(temperature, 1) IGNORE NULLS OVER w AS next_temp
        FROM test.weather_data
        WINDOW w AS (PARTITION BY location ORDER BY timestamp)
    )
    SELECT timestamp,
          location,
          temperature,
          CASE
              WHEN temperature IS NOT NULL THEN temperature
              WHEN prev_temp IS NOT NULL AND next_temp IS NOT NULL THEN (prev_temp + next_temp) / 2
              ELSE NULL
          END AS interpolated_temperature
    FROM OrderedData
    WHERE temperature is NULL
    ORDER BY location, timestamp
    LIMIT 50;
  "
}
EOF

Even though this works fine, you end up with a significant POST command. The best way to simplify things is by creating views.

Step 5: Create a View

Instead of including large SQL statements in the post of your command, you could simply create a view using the code.

CREATE VIEW v_interpolated_temp AS
WITH OrderedData AS (
     SELECT timestamp,
            location,
            temperature,
            LAG(temperature, 1) IGNORE NULLS OVER w AS prev_temp,
            LEAD(temperature, 1) IGNORE NULLS OVER w AS next_temp
     FROM test.weather_data
     WINDOW w AS (PARTITION BY location ORDER BY timestamp)
)
    SELECT timestamp,
          location,
          temperature,
          CASE
              WHEN temperature IS NOT NULL THEN temperature
              WHEN prev_temp IS NOT NULL AND next_temp IS NOT NULL THEN (prev_temp + next_temp) / 2
              ELSE NULL
          END AS interpolated_temperature
    FROM OrderedData
    WHERE temperature is NULL
    ORDER BY location, timestamp
    LIMIT 50;

Copy the code and execute it in the Console like this example:3af1517e-46ed-45c8-8403-c8db3989baea-1

With the view in place, the HTTP endpoint command looks like this.

curl -sS -u ${your_username}:${your_password} -H 'Content-Type: application/json' \
-X POST https://${cluster_url}:4200/_sql -d@- <<- EOF
{
  "stmt": "
      select *
      from v_interpolated_temp
  "
}
EOF

Using views still gives you the power to use filters, for example. Let's assume you want to filter on location='Berlin' and interpolate_temperature > 10. You can still use the view.

curl -sS -u ${your_username}:${your_password} -H 'Content-Type: application/json' \
-X POST https://${cluster_url}:4200/_sql -d@- <<- EOF
{
  "stmt": "
      select location, interpolated_temperature 
      from v_interpolated_temp 
      WHERE location = \$1 
      AND interpolated_temperature > \$2 
  ",
  "args": ["Berlin", 10]
}
EOF

Conclusion 

Using views combined with the CrateDB HTTP endpoint, you can efficiently manage and execute complex queries, providing an accessible pathway for Rockset users to transition to CrateDB.