Mar 18, 2019

MSCK in Presto

Presto release 304 contains new procedure system.sync_partition_metadata() developed by @luohao.  This is similar to hive's MSCK REPAIR TABLE.

Document about Hive Connector Procedures is https://prestosql.io/docs/current/connector/hive.html#procedures

The syntax is `system.sync_partition_metadata(schema_name, table_name, mode)`. The supported mode are add, drop and full. Example query is

  • call system.sync_partition_metadata('default', 'test_partition', 'add');
  • call system.sync_partition_metadata('default', 'test_partition', 'drop');
  • call system.sync_partition_metadata('default', 'test_partition', 'full');

# Mode DROP
hive> create table default.test_partition (c1 int) partitioned by (dt string);
hive> insert overwrite table default.test_partition partition(dt = '20190101') values (1);
hive> dfs -mv hdfs://hadoop-master:9000/user/hive/warehouse/test_partition/dt=20190101 /tmp/;
hive> show partitions default.test_partition;
dt=20190101

presto> use hive.default;
presto> call system.sync_partition_metadata('default', 'test_partition', 'drop');

hive> show partitions default.test_partition;
→ Empty

# Mode ADD
hive> dfs -mv /tmp/dt=20190101 hdfs://hadoop-master:9000/user/hive/warehouse/test_partition/;

presto> call system.sync_partition_metadata('default', 'test_partition', 'add');

hive>  show partitions default.test_partition;
dt=20190101

# Mode FULL performs both DROP and ADD. 

8 comments:

  1. Hi Yuya, I have been exactly looking for this particular solution. But when I call system.sync_partition_metadata(myschema, table_name, ADD), I am getting error "Procedure not registered: hive.system.sync_partition_metadata". Could you share any pointer on how to register the system procedures.

    ReplyDelete
    Replies
    1. What Presto version are you using?

      Delete
    2. It's quite old version. You need to upgrade version >= 304.

      Delete
    3. Let me know if there is a way for the older versions. Any way thank you very much for the prompt response.

      Delete
    4. The workaround is implementing the procedure by yourself on 0.151-dirty codebase.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete