Dec 23, 2018

Implement Presto Flex Connector

When I played Apache Drill, I felt it's useful to analyze local csv and parquet. Then, I developed presto connector to support accessing for local files. This is GitHub repository.

As you may already know, Presto specify a table name like `catalog.schema.table`. The connector identify the file type from schema. Current supported types are  csv, tsv, txt, raw and excel. All types except for raw returns multi rows if the file contains EOL. A raw type returns 1 column and 1 record. I assume the raw type can be used for converting to JSON file.

`table` name will be like below. You can access local and remote file.
"file:///tmp/numbers.csv"
"https://raw.githubusercontent.com/ebyhr/presto-flex/master/src/test/resources/example-data/numbers.tsv"

Here is the entire query example. The `csv` on schema and the extension of `file:///tmp/numbers.csv` are not irrelevant. Therefore, if you change the schema to `tsv`, it returns the result split by tab.

select
  * 
from 
 flex.csv."file:///tmp/numbers.csv"
;

Current Presto doesn't have importer like PostgreSQL's COPY. This connector may useful in such case🍭