@ebyhr: 2023

6月14, 15日に開催されたTrino FestのキーノートTrino for lakehouses, data oceans, and beyondを見たので日本語訳サマリーです。動画は以下のリンクで見ることができます。

By the numbers

2022年11月から16回リリースされました
2023年に入ってからのコミット数は約2,250
合計のコントリビュータ数は660人以上
Slackメンバーは現在9,900人以上
1,800社を超える2万人以上のコミュニティメンバー
db-enginesランキングは96位から69位へ

New maintainers

新しいメンテナが2人追加されました。AWSのJames Pettyと、StarburstのManfred Moserです🎉全体のメンテナのリストはhttps://trino.io/development/roles.html#maintainersに載っています。

Table function improvements

新たに以下のテーブル関数が追加されています。

exclude_columns

SELECT文で特定のカラムだけ除いて結果を返却

sequence

指定された範囲の値を生成して返却。従来のスカラー関数は最大で10,000のエントリまでという制限がありましたがテーブル関数ではその制限は撤廃されています。

query/raw_query

リモートで実行するクエリを文字列として受け取って実行結果を返却します。queryテーブル関数はJDBC系のコネクタやBigQuery、Cassandra、MongoDB、raw_queryテーブル関数はElasticsearchコネクタでサポートされています。

procedure

SQL Server内のストアドプロシージャを実行

Fault-tolerant execution

性能の改善やストレージとしてHDFSも対象となりました。MongoDB, BigQuery, Redshift and Oracleコネクタへの対応が追加されました。

Schema evolution, (meta)data, and tools

ALTER COLUMN … SET DATA TYPEが新しいシンタックスとして追加
ALTER TABLE … RENAME COLUMNの対応コネクタの追加
ALTER TABLE ... DROP COLUMNでROWタイプ内のフィールドを削除する機能が追加
Hudiコネクタで$timelineメタデータテーブルが追加
Delta LakeコネクタでChange Data Feedを返却するtable_changesテーブル関数が追加
IcebergコネクタでREST、JDBCおよびNessieカタログが追加

Lakehouse migration - table procedures

migrate

HiveテーブルをIcebergテーブルへファイルの書き換えなしに変換するプロシージャ

register_table / unregister_table

Iceberg, Delta Lakeテーブルをメタストアへ登録、もしくはメタストアから削除するプロシージャです。DROP TABLEはファイルを削除しますがunregister_tableではファイルは削除しません。

Tons of performance improvements

数が多すぎるのでスクリーンショットを添付します🐰

Tracing with OpenTelemetry

OpenTelemetryを使ったオブザーバビリティの向上を進めています。添付はDatadogでフレームグラフを表示しています。

Client tool news

PythonクライアントではSQLAlchemy 2.0やEXECUTE IMMEDIATE等々、継続的に開発が進めめられています。dbt Cloudのサポートも最近発表されました。

Roadmap

今後のロードマップは以下のようなタスクが予定されています。

SQL 2023に関連したJSON周りやNumericリテラル(例 0xFFFF, 1_000_000)の対応
json_table関数の追加
Snowflakeコネクタの追加
Java 21対応
Project Hummingbird

Trino: The Definitive Guide

2ndエディションが発売されました。Starburstから無料でPDFをダウンロードできます🐸 https://www.starburst.io/info/oreilly-trino-guide/

Getting involved

コミュニティへのSlackはhttps://trino.io/slack.htmlからご参加ください
コントリビューションの際はhttps://trino.io/developmentにプロセスが載っています。GitHubのissueではgood first issueラベルが用意されています。

Trino version 411 introduced 'migrate' procedure in Iceberg connector. This procedure coverts the existing Hive tables with ORC, Parquet & Avro format to Iceberg table. This article explains the details of the procedure. If you execute CREATE TABLE AS SELECT statement to convert Hive to Iceberg, I would recommend trying this procedure. The procedure will be much faster because it doesn't rewrite files.

The procedure accepts 3 arguments (schema_name, table_name and optional recursive_directory). The possible values for recursive_directory argument are true, false and fail. The default value is fail that throws an exception if the nested directory exists under the table or partition location.

CALL iceberg.system.migrate(schema_name => 'testdb', table_name => 'customer_orders', recursive_directory => 'true');

Let me explain the details of the implementation next.

Generate Iceberg schema object based on Hive table definition
Iterate over the table or partition location to create Iceberg metadata files

Build Iceberg Metrics
Build Iceberg DataFile

Update the table definition in the metastore

All logic exists in MigrateProcedure.java if you want to check the code.

Limitation

The procedure scan files sequentially. If the target table has a lot of files, it will take a long time to complete.
The PR to migrate Delta Lake tables to Iceberg is in progress https://github.com/trinodb/trino/pull/17131. There's no plan to support other format (e.g. Hudi) at this time.

About Me

2023-06-21

Trino Fest Keynote: Trino for lakehouses, data oceans, and beyond