Skip to main content
Version: Preview-4.1

Iceberg Procedures

StarRocks Iceberg Catalog supports a variety of procedures for managing Iceberg tables, including snapshot management, branch management, data maintenance, metadata management, and table management.

You must have the appropriate privileges to execute procedures. For more information about privileges, see Privileges.

Snapshot management

Rollback to snapshot

Rolls back the table to a specific snapshot. This operation sets the table's current snapshot to the specified snapshot ID.

rollback_to_snapshot Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE rollback_to_snapshot(<snapshot_id>)

Parameters

snapshot_id: ID of the snapshot to which you want to roll back the table.

Example

Roll back the table to snapshot with ID 98765:

ALTER TABLE iceberg.sales.order
EXECUTE rollback_to_snapshot(98765);

Cherry pick a snapshot

Cherry picks a specific snapshot and applies it to the current state of the table. This operation creates a new snapshot based on an existing snapshot, while the original snapshot remains unchanged.

cherrypick_snapshot Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE cherrypick_snapshot(<snapshot_id>)

Parameters

snapshot_id: ID of the snapshot which you want to cherry pick.

Example

ALTER TABLE iceberg.sales.order
EXECUTE cherrypick_snapshot(54321);

Branch management

Fast forward a branch to another

Fast-forwards one branch to another branch's latest snapshot. This operation updates the source branch's snapshot to match the target branch's snapshot.

fast_forward Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE fast_forward('<from_branch>', '<to_branch>')

Parameters

  • from_branch: The branch you want to fast forward. Wrap the branch name in quotes.
  • to_branch: The branch to which you want to fast forward the from_branch. Wrap the branch name in quotes.

Example

Fast forward the main branch to the branch test-branch:

ALTER TABLE iceberg.sales.order
EXECUTE fast_forward('main', 'test-branch');

Data maintenance

Rewrite data files

Rewrites data files to optimize file layout. This procedure merges small files to improve query performance and reduce metadata overhead.

rewrite_data_files Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE rewrite_data_files
("key"=value [,"key"=value, ...])
[WHERE <predicate>]

Parameters

rewrite_data_files properties

"key"=value pairs that declare the manual compaction behaviors. Note that you need to wrap the key in double quotes.

min_file_size_bytes
  • Description: The upper limit of a small data file. Data files whose size is less than this value will be merged during the compaction.
  • Unit: Byte
  • Type: Int
  • Default: 268,435,456 (256 MB)
batch_size
  • Description: The maximum size of data that can be processed in each batch.
  • Unit: Byte
  • Type: Int
  • Default: 10,737,418,240 (10 GB)
rewrite_all
  • Description: Whether to rewrite all data files during the compaction, ignoring the parameters that filter data files with specific requirements.
  • Unit: -
  • Type: Boolean
  • Default: false
batch_parallelism
  • Description: The number of parallel batches to process during the compaction.
  • Unit: -
  • Type: Int
  • Default: 1
WHERE clause
  • Description: The filter predicate used to specify the partition(s) to be involved in the compaction.

Example

The following example performs manual Compaction on specific partitions in the Iceberg table t1. The partitions are represented by the clause WHERE part_col = 'p1'. In these partitions, data files that are smaller than 134,217,728 bytes (128 MB) will be merged during the Compaction.

ALTER TABLE t1 EXECUTE rewrite_data_files("min_file_size_bytes"= 134217728) WHERE part_col = 'p1';

Metadata management

Expire snapshots

Expires snapshots older than a specific timestamp. This operation deletes the data files of the expired snapshots, helping to manage storage usage.

expire_snapshots Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE expire_snapshots(
[ [older_than =] '<datetime>' ] [, [retain_last =] <int> ]
)

Parameters

older_than
  • Description: The timestamp before which snapshots will be removed. If not specified, files older than 5 days (from the current time) will be removed by default. Format: 'YYYY-MM-DD HH:MM:SS'.
  • Type: DATETIME
  • Required: No
retain_last
  • Description: The maximum number of most recent snapshots to retain. The less recent snapshots will be removed when this threshold is reached. If not specified, only one snapshot will be retained by default.
  • Type: Integer
  • Required: No

Example

Expire snapshots before '2023-12-17 00:14:38' and retain two snapshots:

-- With the parameter key specified:
ALTER TABLE iceberg.sales.order
EXECUTE expire_snapshots(older_than = '2023-12-17 00:14:38', retain_last = 2);

-- With the parameter key unspecified:
ALTER TABLE iceberg.sales.order
EXECUTE expire_snapshots('2023-12-17 00:14:38', 2);

Remove orphan files

Removes orphan files from the table that are not referenced by any valid snapshot and are older than a specified timestamp. This operation helps clean up unused files and reclaim storage space.

remove_orphan_files Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE remove_orphan_files(
[ [older_than =] '<datetime>' ] [, [location =] '<string>' ]
)

Parameters

older_than
  • Description: The timestamp before which orphan files will be removed. If not specified, files older than 7 days (from the current time) will be removed by default. Format: 'YYYY-MM-DD HH:MM:SS'.
  • Type: DATETIME
  • Required: No
location
  • Description: The directory from which you want to remove orphan files. It must be a sub-directory of the table location. If not specified, the table location will be used by default.
  • Type: STRING
  • Required: No

Example

Remove orphan files older than '2024-01-01 00:00:00' from the sub-directory sub_dir of the table location:

-- With the parameter key specified:
ALTER TABLE iceberg.sales.order
EXECUTE remove_orphan_files(older_than = '2024-01-01 00:00:00', location = 's3://iceberg-bucket/iceberg_db/iceberg_table/sub_dir');

-- With the parameter key unspecified:
ALTER TABLE iceberg.sales.order
EXECUTE remove_orphan_files('2024-01-01 00:00:00', 's3://bucket-test/iceberg_db/iceberg_table/sub_dir');

Rewrite manifests

Rewrites data manifest files and merges them by partition to avoid performance degradation caused by excessive small manifests.

note

This operation rewrites the data manifests of the current snapshot only.

rewrite_manifests Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE rewrite_manifests()

Example

Rewrite the data manifests of the current snapshot:

ALTER TABLE iceberg.sales.order
EXECUTE rewrite_manifests()

Table management

Add files

Adds data files to an Iceberg table from either a source table or a specific location. This procedure supports Parquet and ORC file formats.

add_files Syntax

ALTER TABLE [catalog.][database.]table_name
EXECUTE add_files(
[source_table = '<source_table>' | location = '<location>', file_format = '<format>']
[, recursive = <boolean>]
)

Parameters

Either source_table or location must be provided, but not both.

source_table
  • Description: The source table from which to add files. Format: 'catalog.database.table'.
  • Type: String
  • Required: No
location
  • Description: The directory path or file path from which to add files.
  • Type: String
  • Required: No
file_format
  • Description: The format of the data files. Supported values: 'parquet', 'orc'.
  • Type: String
  • Required: No (required when using location)
recursive
  • Description: Whether to recursively scan subdirectories when adding files from a location.
  • Type: Boolean
  • Default: true
  • Required: No

Example

Add files from a source table:

ALTER TABLE iceberg.sales.order
EXECUTE add_files(source_table = 'hive_catalog.sales.source_order');

Add files from a specific location with Parquet format:

ALTER TABLE iceberg.sales.order
EXECUTE add_files(location = 's3://bucket/data/order/', file_format = 'parquet', recursive = true);

Add files from a single file:

ALTER TABLE iceberg.sales.order
EXECUTE add_files(location = 's3://bucket/data/order/data.parquet', file_format = 'parquet');

Register table

Registers an Iceberg table using a metadata file. This procedure allows you to add an existing Iceberg table to the catalog without migrating data.

register_table Syntax

CALL [catalog.]system.register_table(
database_name = '<database_name>',
table_name = '<table_name>',
metadata_file = '<metadata_file_path>'
)

Parameters

database_name
  • Description: The name of the database in which to register the table.
  • Type: String
  • Required: Yes
table_name
  • Description: The name of the table to register.
  • Type: String
  • Required: Yes
metadata_file
  • Description: The path to the Iceberg table metadata file (e.g., metadata.json).
  • Type: String
  • Required: Yes

Example

Register a table using a metadata file:

CALL iceberg_catalog.system.register_table(
database_name = 'sales',
table_name = 'order',
metadata_file = 's3://bucket/metadata/sales/order/metadata/00001-xxxxx-xxxxx-xxxxx.metadata.json'
);

Or use the current catalog:

CALL system.register_table(
database_name = 'sales',
table_name = 'order',
metadata_file = 's3://bucket/metadata/sales/order/metadata/00001-xxxxx-xxxxx-xxxxx.metadata.json'
);