SQE — Sovereign Query Engine
SQE is a Rust-based distributed SQL query engine for Apache Iceberg tables. It replaces a patched Trino fork with a purpose-built engine based on Apache DataFusion and iceberg-rust.
graph LR
Client["JDBC / Flight SQL Client"] --> Coordinator
Coordinator --> Worker1["Worker 1"]
Coordinator --> Worker2["Worker 2"]
Coordinator --> WorkerN["Worker N"]
Worker1 --> S3["S3 / MinIO"]
Worker2 --> S3
WorkerN --> S3
Coordinator --> Polaris["Polaris Catalog"]
Coordinator --> Keycloak["Keycloak OIDC"]
Key Properties
- No service account — every query runs as the authenticated user. Bearer tokens pass through from client to Polaris catalog and S3 storage.
- Arrow-native — columnar data flows from Parquet files through the entire query pipeline to the client. No row-based serialization anywhere.
- Iceberg-native — built on iceberg-rust, not a connector bolted onto a generic engine. Partition pruning, metadata caching, and Iceberg v3 support are first-class.
- Fine-grained security — row filters and column masks enforced at the logical plan level, before the optimizer runs. Invisible columns, transparent row filtering, no information leakage.
- Rust performance — single binary, no JVM, no GC pauses, predictable memory usage, fast startup.
Quick Start (embedded, no server)
cargo install --path crates/sqe-cli
sqe-cli --embedded # ~/.sqe/warehouse persistent Iceberg catalog
sqe> SELECT * FROM '/data/sales.parquet' LIMIT 5;
sqe> SELECT * FROM read_csv('s3://bucket/orders.tsv.gz');
sqe> SELECT * FROM 'hf://datasets/squad/plain_text/train-00000-of-00001.parquet';
sqe> SELECT * FROM read_delta('/data/delta/sales', version => '5');
Full embedded reference: cli-embedded.md. DuckDB comparison: duckdb-comparision.md.
Quick Start (cluster mode)
# Build
cargo build --release --bin sqe-coordinator --bin sqe-cli
# Start coordinator
SQE_CONFIG=sqe.toml ./target/release/sqe-coordinator
# Connect
./target/release/sqe-cli --host localhost --port 50051
Project Status
SQE is production-ready against Apache Iceberg. The cluster mode runs distributed (coordinator + stateless workers) with OIDC bearer-token passthrough, Polaris / Nessie / Glue / HMS / S3 Tables / JDBC / Hadoop catalogs, and 167/189 (88.4%) on the public Iceberg matrix scoreboard. The embedded mode (V8 through V12.1) adds DuckDB-style file-format TVFs (read_csv, read_json, read_delta), HuggingFace hf:// URLs, and a single-binary CLI for laptop analytics.