Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

From Trino to DataFusion

Architecture Comparison

graph LR
    subgraph Trino
        TC[Coordinator<br/>JVM ~2GB heap] --> TW1[Worker<br/>JVM ~8GB heap]
        TC --> TW2[Worker<br/>JVM ~8GB heap]
        TC -->|Hive Metastore<br/>protocol| HMS[Hive Metastore<br/>or Polaris]
        TW1 -->|service account| TS3[S3]
        TW2 -->|service account| TS3
    end

    subgraph SQE
        SC[sqe-server<br/>coordinator<br/>~50MB binary] --> SW1[sqe-server<br/>worker]
        SC --> SW2[sqe-server<br/>worker]
        SC -->|user bearer token<br/>Iceberg REST| POL[Polaris]
        SW1 -->|user credentials| SS3[S3]
        SW2 -->|user credentials| SS3
    end

What Changes

AspectTrino (DCAF fork)SQE
LanguageJava 21Rust
Binary size~1.2GB (with plugins)~50MB
Startup time10-30 seconds< 1 second
Memory modelJVM heap + GCDirect allocation, no GC
Catalog protocolHive Metastore / Iceberg RESTIceberg REST (native)
Auth to catalogService accountUser bearer token passthrough
Auth to storageService account IAM roleUser credentials from catalog vending
Wire protocolTrino HTTP (custom)Arrow Flight SQL (gRPC)
Data format in-flightRow-based JSON pagesArrow columnar batches
Security modelSystem/catalog access controlLogicalPlan rewriting (row filters, column masks)
Query engineCustom cost-based optimizerApache DataFusion
Table formatIceberg connectoriceberg-rust (native)
MaintenanceFork of 2M+ LOC Java projectPurpose-built ~5K LOC Rust

What Stays the Same

  • Apache Iceberg as the table format
  • Apache Polaris as the REST catalog
  • Keycloak as the identity provider
  • S3 as the storage layer
  • dbt as the transformation framework (new native adapter instead of Trino adapter)
  • JDBC connectivity (via Arrow Flight SQL JDBC driver instead of Trino JDBC)

Migration Path

SQE includes an optional Trino-compatible HTTP endpoint (/v1/statement) that speaks enough of the Trino wire protocol to support existing dashboards and tools during the migration period. This is not a full Trino emulation — it covers SELECT, SHOW, and basic DDL, enough to keep things running while teams migrate to Flight SQL.

timeline
    title Migration Timeline
    Phase 1 : SQE single-node : Flight SQL : CLI
    Phase 2 : Write path : dbt-sqe adapter : Views
    Phase 3 : Distributed execution : Workers
    Phase 4 : Trino compat layer : Dashboard migration
    Phase 5 : Security policies : Row filters : Column masks
    Phase 6 : Decommission Trino fork