Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

SQE — Sovereign Query Engine

SQE is a Rust-based distributed SQL query engine for Apache Iceberg tables. It replaces a patched Trino fork with a purpose-built engine based on Apache DataFusion and iceberg-rust.

graph LR
    Client["JDBC / Flight SQL Client"] --> Coordinator
    Coordinator --> Worker1["Worker 1"]
    Coordinator --> Worker2["Worker 2"]
    Coordinator --> WorkerN["Worker N"]
    Worker1 --> S3["S3 / MinIO"]
    Worker2 --> S3
    WorkerN --> S3
    Coordinator --> Polaris["Polaris Catalog"]
    Coordinator --> Keycloak["Keycloak OIDC"]

Key Properties

  • No service account — every query runs as the authenticated user. Bearer tokens pass through from client to Polaris catalog and S3 storage.
  • Arrow-native — columnar data flows from Parquet files through the entire query pipeline to the client. No row-based serialization anywhere.
  • Iceberg-native — built on iceberg-rust, not a connector bolted onto a generic engine. Partition pruning, metadata caching, and Iceberg v3 support are first-class.
  • Fine-grained security — row filters and column masks enforced at the logical plan level, before the optimizer runs. Invisible columns, transparent row filtering, no information leakage.
  • Rust performance — single binary, no JVM, no GC pauses, predictable memory usage, fast startup.

Quick Start (embedded, no server)

cargo install --path crates/sqe-cli
sqe-cli --embedded                    # ~/.sqe/warehouse persistent Iceberg catalog

sqe> SELECT * FROM '/data/sales.parquet' LIMIT 5;
sqe> SELECT * FROM read_csv('s3://bucket/orders.tsv.gz');
sqe> SELECT * FROM 'hf://datasets/squad/plain_text/train-00000-of-00001.parquet';
sqe> SELECT * FROM read_delta('/data/delta/sales', version => '5');

Full embedded reference: cli-embedded.md. DuckDB comparison: duckdb-comparision.md.

Quick Start (cluster mode)

# Build
cargo build --release --bin sqe-coordinator --bin sqe-cli

# Start coordinator
SQE_CONFIG=sqe.toml ./target/release/sqe-coordinator

# Connect
./target/release/sqe-cli --host localhost --port 50051

Project Status

SQE is production-ready against Apache Iceberg. The cluster mode runs distributed (coordinator + stateless workers) with OIDC bearer-token passthrough, Polaris / Nessie / Glue / HMS / S3 Tables / JDBC / Hadoop catalogs, and 167/189 (88.4%) on the public Iceberg matrix scoreboard. The embedded mode (V8 through V12.1) adds DuckDB-style file-format TVFs (read_csv, read_json, read_delta), HuggingFace hf:// URLs, and a single-binary CLI for laptop analytics.