Structured Query Language (SQL) is the lingua-franca for working with data. However, it's not easy to create complex analytics code beyond one-off queries because the language and its abstractions, e.g. views and functions, are entangled as state in the underlying database system. QueryScript aims to solve this problem by empowering you to write powerful, robust analytics code in SQL with abstractions that you can manage as code, not database state.
QueryScript consists of a compiler, runtime, engine library, and materializer which build upon each other to solve for use cases like data transformation, BI-as-code, and ETL.
Compiler
The core innovation in QueryScript is its compiler, which can process complex SQL expressions and functional abstractions, and turn them into vanilla ANSI SQL that runs against any SQL database system. Unlike a macro system, QueryScript deeply understands the underlying expressions you write, enabling typechecking, transpilation across dialects, and advanced optimizations.
The compiler processes QueryScript schemas, which each contain declarations and expressions. For example:
-- Define a type for each User record. QueryScript will validate-- at runtime that the underlying data matches this type. If you-- do not specify the type, QueryScript will infer it from the file.type User {id int,org_id int,};-- Define a relation named users, which is an array of `User`-- whose data is in 'users.json'. This let expression is lazily-- evaluated, which means that the data is only loaded when you-- query `users`.-- Because the variable is marked with `export`, it's public to-- any schemas that import this schema.export let users [User] = load('users.json');-- Import the orgs declaration from another schema, whose-- path is 'organizations.qs' (qs stands for QueryScript)import orgs from organizations;-- Define a function to retrieve an organization's info from-- its id. This function can be referenced from other expressions.fn get_org(org_id) {SELECT * FROM orgs WHERE id=org_id}-- This inline expression is run when you run the schema file,-- on the command line, but not when you import it.SELECT(SELECT name FROM get_org(users.org_id)) name,COUNT(*)FROM users GROUP BY 1;
- A schema is like a module in Typescript or Rust — it is a self-contained block of variables which can be run as a script, or imported from another schema.
- A declaration can be a type (
type
), variable (let
ormat
), or schema (import
). Declarations can be referenced by other declarations and expressions in a schema. - An expression is similar to a declaration, except it does not have a name associated with it. Expressions are run automatically when you run the schema file on the command line. They're primarily useful for scripting and debugging.
Runtime
Once a schema is compiled, it can be run in QueryScript's runtime. Just like a relational database, QueryScript can run its own SQL queries (more on how, below). That makes it significantly more powerful than other transformation, BI, and ETL tools which rely on the underlying database to run logic. For example, QueryScript can run joins across two different databases, or read data from one system and write to another, all expressed in SQL.
Both the QueryScript compiler and runtime are available as Rust libraries which can be embedded directly into your applications.
Engine library
QueryScript builds on the shoulder of giants, by enabling database engines to be plugged in directly. Currently, QueryScript supports DuckDB with imminent plans to support Postgres and Snowflake. QueryScript does its best to push as much logic as possible into the underlying database. It runs the remaining logic, for example joining between two separate database systems, with DuckDB as an embedded engine.
You can import data from supported database backends like you would declarations from a schema file. For example:
import 'postgres://host:port/db';SELECT COUNT(*) FROM db.t;
will access the t
table in db
. Behind the scenes, QueryScript will determine the type of t
using type annotations or talking to the database.
Materializer
QueryScript makes it just as easy to save your work to a database as to read from it. By simply running the QueryScript
utility with --save
, you can save a schema file back to the database it references. For example, saving
import 'postgres://host:port/db';export let num_users = SELECT COUNT(*) FROM db.users;
will create a view named num_users
in the db
database. If you use the mat
keyword instead of let
, it will precompute the view and save its results as a table.