When I played Apache Drill, I felt it's useful to analyze local csv and parquet. Then, I developed a Trino connector to support accessing for local files. This is GitHub repository.
As you may already know, Trino specify a table name like `catalog.schema.table`. The connector identify the file type from schema. Current supported types are csv, tsv, txt, raw and excel. All types except for raw returns multi rows if the file contains EOL. A raw type returns 1 column and 1 record. I assume the raw type can be used for converting to JSON file.
`table` name will be like below. You can access local and remote file.
Here is the entire query example. The `csv` on schema and the extension of `file:///tmp/numbers.csv` are not irrelevant. Therefore, if you change the schema to `tsv`, it returns the result split by tab.
Current Trino doesn't have importer like PostgreSQL's COPY. This connector may useful in such case🍭
As you may already know, Trino specify a table name like `catalog.schema.table`. The connector identify the file type from schema. Current supported types are csv, tsv, txt, raw and excel. All types except for raw returns multi rows if the file contains EOL. A raw type returns 1 column and 1 record. I assume the raw type can be used for converting to JSON file.
`table` name will be like below. You can access local and remote file.
"file:///tmp/numbers.csv"
"https://raw.githubusercontent.com/ebyhr/trino-storage/master/src/test/resources/example-data/numbers.tsv"
Here is the entire query example. The `csv` on schema and the extension of `file:///tmp/numbers.csv` are not irrelevant. Therefore, if you change the schema to `tsv`, it returns the result split by tab.
select
*
from
storage.csv."file:///tmp/numbers.csv"
;
Current Trino doesn't have importer like PostgreSQL's COPY. This connector may useful in such case🍭