Trino version 411 introduced 'migrate' procedure in Iceberg connector. This procedure coverts the existing Hive tables with ORC, Parquet & Avro format to Iceberg table. This article explains the details of the procedure. If you execute CREATE TABLE AS SELECT statement to convert Hive to Iceberg, I would recommend trying this procedure. The procedure will be much faster because it doesn't rewrite files.
The procedure accepts 3 arguments (schema_name, table_name and optional recursive_directory). The possible values for recursive_directory argument are true, false and fail. The default value is fail that throws an exception if the nested directory exists under the table or partition location.
CALL iceberg.system.migrate(schema_name => 'testdb', table_name => 'customer_orders', recursive_directory => 'true');
Let me explain the details of the implementation next.
- Generate Iceberg schema object based on Hive table definition
- Iterate over the table or partition location to create Iceberg metadata files
- Build Iceberg Metrics
- Build Iceberg DataFile
- Update the table definition in the metastore
All logic exists in MigrateProcedure.java if you want to check the code.
Limitation
- The procedure scan files sequentially. If the target table has a lot of files, it will take a long time to complete.
- The PR to migrate Delta Lake tables to Iceberg is in progress https://github.com/trinodb/trino/pull/17131. There's no plan to support other format (e.g. Hudi) at this time.