Mar 18, 2019

MSCK in Presto

Presto release 304 contains new procedure system.sync_partition_metadata() developed by @luohao.  This is similar to hive's MSCK REPAIR TABLE.

Document about Hive Connector Procedures is https://prestosql.io/docs/current/connector/hive.html#procedures

The syntax is `system.sync_partition_metadata(schema_name, table_name, mode)`. The supported mode are add, drop and full. Example query is

  • call system.sync_partition_metadata('default', 'test_partition', 'add');
  • call system.sync_partition_metadata('default', 'test_partition', 'drop');
  • call system.sync_partition_metadata('default', 'test_partition', 'full');

# Mode DROP
hive> create table default.test_partition (c1 int) partitioned by (dt string);
hive> insert overwrite table default.test_partition partition(dt = '20190101') values (1);
hive> dfs -mv hdfs://hadoop-master:9000/user/hive/warehouse/test_partition/dt=20190101 /tmp/;
hive> show partitions default.test_partition;
dt=20190101

presto> use hive.default;
presto> call system.sync_partition_metadata('default', 'test_partition', 'drop');

hive> show partitions default.test_partition;
→ Empty

# Mode ADD
hive> dfs -mv /tmp/dt=20190101 hdfs://hadoop-master:9000/user/hive/warehouse/test_partition/;

presto> call system.sync_partition_metadata('default', 'test_partition', 'add');

hive>  show partitions default.test_partition;
dt=20190101

# Mode FULL performs both DROP and ADD. 

No comments:

Post a Comment