About Me

My photo
Software Engineer at Starburst. Maintainer at Trino. Previously at LINE, Teradata, HPE.

2018-12-23

Trino Storage Connector

When I played Apache Drill, I felt it's useful to analyze local csv and parquet. Then, I developed a Trino connector to support accessing for local files. This is GitHub repository.

As you may already know, Trino specify a table name like `catalog.schema.table`. The connector identify the file type from schema. Current supported types are  csv, tsv, txt, raw and excel. All types except for raw returns multi rows if the file contains EOL. A raw type returns 1 column and 1 record. I assume the raw type can be used for converting to JSON file.

`table` name will be like below. You can access local and remote file.
"file:///tmp/numbers.csv"
"https://raw.githubusercontent.com/ebyhr/trino-storage/master/src/test/resources/example-data/numbers.tsv"

Here is the entire query example. The `csv` on schema and the extension of `file:///tmp/numbers.csv` are not irrelevant. Therefore, if you change the schema to `tsv`, it returns the result split by tab.

select
  * 
from 
 storage.csv."file:///tmp/numbers.csv"
;

Current Trino doesn't have importer like PostgreSQL's COPY. This connector may useful in such case🍭