Write Path
SQE supports writing data to Iceberg tables through SQL. Writes go through the coordinator, which executes the SELECT portion, writes Parquet files to S3, and commits to the Iceberg catalog.
Supported Operations
CREATE TABLE AS SELECT (CTAS)
CREATE TABLE analytics.monthly_sales AS
SELECT
DATE_TRUNC('month', order_date) AS month,
region,
SUM(amount) AS total
FROM raw.orders
GROUP BY 1, 2;
Flow:
- Parse SQL, extract target table name and SELECT query
- Execute SELECT → get Arrow RecordBatches
- Convert Arrow schema to Iceberg schema
- Create table in Polaris catalog
- Write RecordBatches as Parquet files to S3
- Commit data files to Iceberg via AppendAction
CREATE OR REPLACE TABLE
CREATE OR REPLACE TABLE analytics.monthly_sales AS
SELECT ... ;
Drops the existing table (if it exists) and creates a new one. Useful for dbt table materializations.
INSERT INTO
INSERT INTO analytics.monthly_sales
SELECT
DATE_TRUNC('month', order_date) AS month,
region,
SUM(amount) AS total
FROM raw.orders
WHERE order_date >= '2024-06-01'
GROUP BY 1, 2;
Flow:
- Parse SQL, extract target table and SELECT query
- Execute SELECT → get Arrow RecordBatches
- Write RecordBatches as Parquet files to S3
- Commit data files to Iceberg via AppendAction (new snapshot)
Write Architecture
sequenceDiagram
participant QH as Query Handler
participant DF as DataFusion
participant WH as Write Handler
participant S3
participant POL as Polaris
QH->>DF: Execute SELECT query
DF-->>QH: RecordBatches
QH->>WH: handle_ctas(table_name, schema, batches)
WH->>POL: Create table (schema)
POL-->>WH: Table created
WH->>S3: Write Parquet data files
S3-->>WH: Written (paths + sizes)
WH->>POL: Commit AppendAction<br/>(data file list → new snapshot)
POL-->>WH: Committed
WH-->>QH: Success
Row-Level Operations (Copy-on-Write)
Row-level write operations are implemented via Copy-on-Write using the RisingWave iceberg-rust fork’s rewrite_files() transaction API. Affected data files are read, filtered/transformed, and rewritten as new files in a single atomic Iceberg commit.
DELETE FROM
DELETE FROM sales.orders WHERE status = 'cancelled';
-- Cross-table subqueries in WHERE
DELETE FROM sales.orders
WHERE customer_id IN (SELECT id FROM blacklist);
-- DELETE without WHERE = truncate
DELETE FROM sales.orders;
Flow:
- Scan table metadata to identify affected data files
- Read each affected file, apply the WHERE filter
- If all rows match: mark file for removal
- If partial match: rewrite file without matching rows
- Commit via
rewrite_files()(remove old files, add rewritten files)
UPDATE
UPDATE sales.orders SET status = 'shipped' WHERE tracking_id IS NOT NULL;
-- CASE WHEN transformations
UPDATE sales.orders SET amount = CASE
WHEN amount > 1000 THEN amount * 0.9
ELSE amount
END;
Flow:
- Scan table metadata to identify affected data files
- Read each affected file, apply the WHERE filter
- For matching rows: apply SET expressions
- Rewrite file with modified rows
- Commit via
rewrite_files()
MERGE INTO
MERGE INTO target USING source ON target.id = source.id
WHEN MATCHED THEN UPDATE SET value = source.value
WHEN NOT MATCHED THEN INSERT (id, value) VALUES (source.id, source.value);
Flow:
- Execute a full outer join of source and target via DataFusion
- Classify each result row: matched (UPDATE/DELETE) or not matched (INSERT)
- Rewrite affected target data files with modifications applied
- Add new data files for INSERT rows
- Commit via
rewrite_files()(remove old files, add new + rewritten files)
Iceberg Dependency
Row-level writes depend on the risingwavelabs/iceberg-rust fork (rev 1978911ec4) which provides the rewrite_files() transaction support not yet available in upstream iceberg-rust. When upstream ships OverwriteAction, the dependency can be migrated back to the official crate.
dbt Compatibility
The write path is designed to support dbt Core via a native dbt-sqe adapter:
| dbt Materialization | SQL | Status |
|---|---|---|
table | CREATE OR REPLACE TABLE AS SELECT | Supported |
incremental (append) | INSERT INTO SELECT | Supported |
incremental (merge) | MERGE INTO | Supported (CoW) |
view | CREATE VIEW AS SELECT | Supported |
seed | INSERT INTO (from CSV) | Supported |