About Me

My photo
Software Engineer at Starburst. Maintainer at Trino. Previously at LINE, Teradata, HPE.

2018-12-25

Try HDP 3.0.1

Started using HDP 3.0.1. Past version I was used is 2.6.5, therefore I found many changes, new Ambari design, hive command starts beeline and so one.


One major change to me, hive view has been deleted since HDP 3.0.
https://community.hortonworks.com/questions/202444/where-is-hive-view-on-hdp-3.html

Tried Data Analytics Studio instead of that.

You can move the page by
Ambari → Services → Data Analytics Studio  → Data Analytics Studio UI
http://sandbox-hdp.hortonworks.com:30800

I guess it's BI tool from the name, but it seems the DBA tool. It has 4 components.

1. Queries
You can see query history. And Recommendations tab gives us suggested changes.
Queries
Recommendations

Query Details

Visual Explain

Configs

Timeline

DAG Info


2. Compose
This is almost the same as class hive view. We can use auto suggest.
Compose

3. Database
Partitions tab shows column name, column type and a comment if the table is partitioned. I want to see partition's numbers, size and so on.
Columns

Partitions

Storage Information

Detailed Information

Statistics

Data Preview

4. Reports
This page visualizes lineage from the query history. It looks really useful. 
Reports

Read and Write Report
Join Report

2018-12-23

Trino Storage Connector

When I played Apache Drill, I felt it's useful to analyze local csv and parquet. Then, I developed a Trino connector to support accessing for local files. This is GitHub repository.

As you may already know, Trino specify a table name like `catalog.schema.table`. The connector identify the file type from schema. Current supported types are  csv, tsv, txt, raw and excel. All types except for raw returns multi rows if the file contains EOL. A raw type returns 1 column and 1 record. I assume the raw type can be used for converting to JSON file.

`table` name will be like below. You can access local and remote file.
"file:///tmp/numbers.csv"
"https://raw.githubusercontent.com/ebyhr/trino-storage/master/src/test/resources/example-data/numbers.tsv"

Here is the entire query example. The `csv` on schema and the extension of `file:///tmp/numbers.csv` are not irrelevant. Therefore, if you change the schema to `tsv`, it returns the result split by tab.

select
  * 
from 
 storage.csv."file:///tmp/numbers.csv"
;

Current Trino doesn't have importer like PostgreSQL's COPY. This connector may useful in such case🍭

2018-12-08

AES CBC ISO7816 and SHA256 on Java

This is Java snippet using AES/CBC/ISO7816-4Padding and SHA256. javafx.cypto doesn't support ISO7816 padding, therefore I use org.bouncycastle.

The group id and artifact is is below and my sample code uses 1.52.
groupId: org.bouncycastle
artifactId: bcprov-jdk15on
version: 1.52

import java.security.Security;

import javax.crypto.Cipher;
import javax.crypto.spec.IvParameterSpec;
import javax.crypto.spec.SecretKeySpec;

import org.apache.commons.codec.digest.DigestUtils;
import org.bouncycastle.jce.provider.BouncyCastleProvider;

public final class Encryptor {

private static final byte[] KEY = "0123456789abcdef".getBytes();
private static final byte[] INIT_VECTOR = "abcdef0123456789".getBytes();

static {
Security.insertProviderAt(new BouncyCastleProvider(), 1);
}

private static String encrypt(String value)
throws Exception
{
SecretKeySpec keySpec = new SecretKeySpec(KEY, "AES");
Cipher cipher = Cipher.getInstance("AES/CBC/ISO7816-4Padding");
cipher.init(Cipher.ENCRYPT_MODE, keySpec, new IvParameterSpec(INIT_VECTOR));
byte[] encrypted = cipher.doFinal(value.getBytes());
return DigestUtils.sha256Hex(encrypted);
}
}

2018-12-07

Try HBase Java Client

This is HBase Java test example. My repository is here (https://github.com/ebyhr/hbase-embed). I've changed log level to INFO because default log level is little noisy.

import org.apache.hadoop.hbase.HBaseTestingUtility;
import org.apache.hadoop.hbase.TableNotDisabledException;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;

import java.io.IOException;
import java.util.Arrays;
import java.util.List;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertThrows;

class HBaseTest {
    private static final HBaseTestingUtility HBASE = new HBaseTestingUtility();
    private static final byte[] TABLE_NAME = Bytes.toBytes("EMPLOYEE");
    private static final byte[] CF = Bytes.toBytes("cf1");
    private static final List PEOPLES = Arrays.asList(
            new People("山田", 20),
            new People("영수", 33),
            new People("John", 18)
    );

    @BeforeAll
    static void setUp()
            throws Exception
    {
        HBASE.startMiniZKCluster();
        HBASE.startMiniCluster();

        HTable table = HBASE.createTable(TABLE_NAME, CF);
        for (People people : PEOPLES) {
            int i = PEOPLES.indexOf(people);
            String rowKey = "rowkey" + i;
            Put put = new Put(Bytes.toBytes(rowKey));
            put.add(CF, Bytes.toBytes("name"), Bytes.toBytes(people.name));
            put.add(CF, Bytes.toBytes("age"), Bytes.toBytes(people.age));
            table.put(put);
        }
    }

    @AfterAll
    static void tearDown()
            throws Exception
    {
        HBASE.cleanupTestDir();
    }

    @Test
    void testAccessTable()
            throws IOException
    {
        HTable table = new HTable(HBASE.getConfiguration(), TABLE_NAME);
        String rowKey = "rowkey1";
        Result getResult = table.get(new Get(Bytes.toBytes(rowKey)));
        assertEquals("영수", Bytes.toString(getResult.getValue(CF, Bytes.toBytes("name"))));
        assertEquals(33, Bytes.toInt(getResult.getValue(CF, Bytes.toBytes("age"))));
    }

    @Test
    void testDrop()
            throws Exception
    {
        HTable table = HBASE.createTable(Bytes.toBytes("TMP-TABLE1"), CF);
        HBaseAdmin admin = HBASE.getHBaseAdmin();
        assertThrows(TableNotDisabledException.class, () -> admin.deleteTable(table.getTableName()));
        admin.disableTable(table.getTableName());
        admin.deleteTable(table.getTableName());
    }

    private static class People {
        private String name;
        private int age;

        private People(String name, int age)
        {
            this.name = name;
            this.age = age;
        }
    }
}