Synopsis: Text-to-SQL with LlamaIndex

Text-to-SQL: Talk to your data using human language and contemporary large language models, optionally offline.

Install

Project dependencies. For example, use them in a requirements.txt file.

langchain-openai<0.4
llama-index-embeddings-langchain<0.5
llama-index-embeddings-openai<0.6
llama-index-llms-azure-openai<0.5
llama-index-llms-ollama<0.8
llama-index-llms-openai<0.6
sqlalchemy-cratedb

Walkthrough

Import Python modules.

import os
from llama_index.llms.ollama import Ollama
from llama_index.llms.openai import OpenAI
from llama_index.core import SQLDatabase
from llama_index.core.query_engine import NLSQLTableQueryEngine
import sqlalchemy as sa

Provision an LLM using an OpenAI model.

llm = OpenAI(
    model=os.getenv("OPENAI_MODEL", "gpt-4.1"),
    temperature=0.0,
    api_key=os.getenv("OPENAI_API_KEY"),
)

Alternatively, provision an LLM using a self-hosted model.

llm = Ollama(
    base_url=os.getenv("OLLAMA_BASE_URL", "http://localhost:11434"),
    model=os.getenv("OLLAMA_MODEL", "gemma3:1b"),
    temperature=0.0,
    request_timeout=120.0,
    keep_alive=-1,
)

Connect to CrateDB.

database = sa.create_engine(os.getenv("CRATEDB_SQLALCHEMY_URL", "crate://crate@localhost:4200"))

Invoke Text-to-SQL query.

sql_database = SQLDatabase(engine=database)
nlsql = NLSQLTableQueryEngine(sql_database=sql_database, llm=llm)
answer = nlsql.query("What is the average value for sensor 1?")

Also try other languages.

answer = nlsql.query("Яке середнє значення для датчика 1?")
answer = nlsql.query("¿Cuál es el valor promedio del sensor 1?")
answer = nlsql.query("Was ist der Durchschnittswert für Sensor 1?")
answer = nlsql.query("Quelle est la valeur moyenne pour le capteur 1 ?")

Full code example

import os
import sqlalchemy as sa

from llama_index.core import SQLDatabase
from llama_index.core.query_engine import NLSQLTableQueryEngine
from llama_index.core import Settings

engine = sa.create_engine("crate://localhost:4200/")
engine.connect()

sql_database = SQLDatabase(
    engine, 
    include_tables=["testdrive"]
)

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=[os.getenv("CRATEDB_TABLE_NAME")],
    llm=Settings.llm
)

query_str = "What is the average value for sensor 1?"
answer = query_engine.query(query_str)
print(answer.get_formatted_sources())
print("Query was:", query_str)
print("Answer was:", answer)

# query was: What is the average value for sensor 1?
# answer was: The average value for sensor 1 is 17.03.

Note

Please find the executable example at CrateDB Examples » llama-index.