Skip to main content
Version: Preview-4.1

Iceberg DML operations

StarRocks Iceberg Catalog supports a variety of Data Manipulation Language (DML) operations, including inserting data into Iceberg tables.

You must have the appropriate privileges to perform DML operations. For more information about privileges, see Privileges.

INSERT

Inserts data into an Iceberg table. This feature is supported from v3.1 onwards.

Similar to loading data into StarRocks native tables, if you have the INSERT privilege on an Iceberg table, you can use the INSERT statement to sink the data to the Iceberg table. Currently, only Parquet-formatted Iceberg tables are supported.

Syntax

INSERT {INTO | OVERWRITE} <table_name>
[ (column_name [, ...]) ]
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }

-- If you want to sink data to specified partitions, use the following syntax:
INSERT {INTO | OVERWRITE} <table_name>
PARTITION (par_col1=<value> [, par_col2=<value>...])
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
note

NULL values are not allowed in partition columns. Therefore, you must make sure that no empty values are loaded into the partition columns of the Iceberg table.

Parameters

INTO

Appends the data to the Iceberg table.

OVERWRITE

Overwrites the existing data of the Iceberg table.

column_name

The name of the destination column to which you want to load data. You can specify one or more columns. Multiple columns are separated with commas (,).

  • You can only specify columns that actually exist in the Iceberg table.
  • The destination columns must include the partition columns of the Iceberg table.
  • The destination columns are mapped one on one in sequence to the columns in the SELECT statement (source columns), regardless of what the destination column names are.
  • If no destination columns are specified, the data is loaded into all columns of the Iceberg table.
  • If a non-partition source column cannot be mapped to any destination column, StarRocks writes the default value NULL to the destination column.
  • If the data types of the source and destination columns mismatch, StarRocks performs an implicit conversion on the mismatched columns. If the conversion fails, a syntax parsing error will be returned.
note

You cannot specify the column_name property if you have specified the PARTITION clause.

expression

Expression that assigns values to the destination column.

DEFAULT

Assigns a default value to the destination column.

query

Query statement whose result will be loaded into the Iceberg table. It can be any SQL statement supported by StarRocks.

PARTITION

The partitions into which you want to load data. You must specify all partition columns of the Iceberg table in this property. The partition columns that you specify in this property can be in a different sequence than the partition columns that you have defined in the table creation statement.

note

You cannot specify the column_name property if you have specified the PARTITION clause.

Examples

  1. Insert three data rows into the partition_tbl_1 table:

    INSERT INTO partition_tbl_1
    VALUES
    ("buy", 1, "2023-09-01"),
    ("sell", 2, "2023-09-02"),
    ("buy", 3, "2023-09-03");
  2. Insert the result of a SELECT query, which contains simple computations, into the partition_tbl_1 table:

    INSERT INTO partition_tbl_1 (id, action, dt) SELECT 1+1, 'buy', '2023-09-03';
  3. Insert the result of a SELECT query, which reads data from the partition_tbl_1 table, into the same table:

    INSERT INTO partition_tbl_1 SELECT 'buy', 1, date_add(dt, INTERVAL 2 DAY)
    FROM partition_tbl_1
    WHERE id=1;
  4. Insert the result of a SELECT query into the partitions that meet two conditions, dt='2023-09-01' and id=1, of the partition_tbl_2 table:

    INSERT INTO partition_tbl_2 SELECT 'order', 1, '2023-09-01';

    Or

    INSERT INTO partition_tbl_2 partition(dt='2023-09-01',id=1) SELECT 'order';
  5. Overwrite all action column values in the partitions that meet two conditions, dt='2023-09-01' and id=1, of the partition_tbl_1 table with close:

    INSERT OVERWRITE partition_tbl_1 SELECT 'close', 1, '2023-09-01';

    Or

    INSERT OVERWRITE partition_tbl_1 partition(dt='2023-09-01',id=1) SELECT 'close';

DELETE

You can use the DELETE statement to delete data from Iceberg tables based on specified conditions. This feature is supported from v4.1 and later.

Syntax

DELETE FROM <table_name> WHERE <condition>

Parameters

  • table_name: The name of the Iceberg table you want to delete data from. You can use:

    • Fully qualified name: catalog_name.database_name.table_name
    • Database-qualified name (after setting catalog): database_name.table_name
    • Table name only (after setting catalog and database): table_name
  • condition: The condition to identify which rows to delete. It can include:

    • Comparison operators: =, !=, >, <, >=, <=, <>
    • Logical operators: AND, OR, NOT
    • IN and NOT IN clauses
    • BETWEEN and LIKE operators
    • IS NULL and IS NOT NULL
    • Sub-queries with IN or EXISTS

Examples

Basic DELETE operations

Delete rows matching a simple condition:

DELETE FROM iceberg_catalog.db.table1 WHERE id = 3;

DELETE with IN and NOT IN

Delete multiple rows using IN clause:

DELETE FROM iceberg_catalog.db.table1 WHERE id IN (18, 20, 22);
DELETE FROM iceberg_catalog.db.table1 WHERE id NOT IN (100, 101, 102);

DELETE with logical operators

Combine multiple conditions:

DELETE FROM iceberg_catalog.db.table1 WHERE age > 30 AND salary < 70000;
DELETE FROM iceberg_catalog.db.table1 WHERE status = 'inactive' OR last_login < '2023-01-01';

DELETE with pattern matching

Use LIKE for pattern-based deletion:

DELETE FROM iceberg_catalog.db.table1 WHERE name LIKE 'A%';
DELETE FROM iceberg_catalog.db.table1 WHERE email LIKE '%@example.com';

DELETE with range conditions

Use BETWEEN for range-based deletion:

DELETE FROM iceberg_catalog.db.table1 WHERE age BETWEEN 30 AND 40;
DELETE FROM iceberg_catalog.db.table1 WHERE created_date BETWEEN '2023-01-01' AND '2023-12-31';

DELETE with NULL checks

Delete rows with or without NULL values:

DELETE FROM iceberg_catalog.db.table1 WHERE name IS NULL;
DELETE FROM iceberg_catalog.db.table1 WHERE email IS NULL AND phone IS NULL;
DELETE FROM iceberg_catalog.db.table1 WHERE age IS NOT NULL;

DELETE with sub-queries

Use sub-queries to identify rows to delete:

-- DELETE with IN sub-query
DELETE FROM iceberg_catalog.db.table1 WHERE id IN (SELECT id FROM temp_table WHERE expired = true);

-- DELETE with EXISTS sub-query
DELETE FROM iceberg_catalog.db.table1 t1 WHERE EXISTS (SELECT user_id FROM inactive_users t2 WHERE t2.user_id = t1.user_id);

TRUNCATE

You can use the TRUNCATE TABLE statement to quickly delete all data from Iceberg tables.

Syntax

TRUNCATE TABLE <table_name>

Parameters

  • table_name: The name of the Iceberg table that you want to truncate data from. You can use:
    • Fully qualified name: catalog_name.database_name.table_name
    • Database-qualified name (after setting catalog): database_name.table_name
    • Table name only (after setting catalog and database): table_name

Examples

Example 1: Truncate a table using fully qualified name

TRUNCATE TABLE iceberg_catalog.my_db.my_table;

Example 2: Truncate a table after setting catalog

SET CATALOG iceberg_catalog;
TRUNCATE TABLE my_db.my_table;

Example 3: Truncate a table after setting catalog and database

SET CATALOG iceberg_catalog;
USE my_db;
TRUNCATE TABLE my_table;