About Me

My photo
Software Engineer at Starburst. Maintainer at Trino. Previously at LINE, Teradata, HPE.


MSCK in Trino

Presto SQL release 304 contains new procedure system.sync_partition_metadata() developed by @luohao.  This is similar to hive's MSCK REPAIR TABLE.

Document about Hive Connector Procedures is https://prestosql.io/docs/current/connector/hive.html#procedures

The syntax is `system.sync_partition_metadata(schema_name, table_name, mode)`. The supported mode are add, drop and full. Example query is

  • call system.sync_partition_metadata('default', 'test_partition', 'add');
  • call system.sync_partition_metadata('default', 'test_partition', 'drop');
  • call system.sync_partition_metadata('default', 'test_partition', 'full');

# Mode DROP
hive> create table default.test_partition (c1 int) partitioned by (dt string);
hive> insert overwrite table default.test_partition partition(dt = '20190101') values (1);
hive> dfs -mv hdfs://hadoop-master:9000/user/hive/warehouse/test_partition/dt=20190101 /tmp/;
hive> show partitions default.test_partition;

presto> use hive.default;
presto> call system.sync_partition_metadata('default', 'test_partition', 'drop');

hive> show partitions default.test_partition;
→ Empty

# Mode ADD
hive> dfs -mv /tmp/dt=20190101 hdfs://hadoop-master:9000/user/hive/warehouse/test_partition/;

presto> call system.sync_partition_metadata('default', 'test_partition', 'add');

hive>  show partitions default.test_partition;

# Mode FULL performs both DROP and ADD.