About Me

My photo
Software Engineer at Starburst. Maintainer at Trino. Previously at LINE, Teradata, HPE.

2018-10-04

Bulk Insert to Teradata using Python ​


This snipet is bulk-loading csv to Teradata via Python. Recently teradatasql was released, but this code uses PyTd. If you haven't setup PyTd, please install the library by `pip install teradata`.

import teradata
import csv

udaExec = teradata.UdaExec()
session = udaExec.connect("tdpid")
data = list(csv.reader(open("testExecuteManyBach.csv")))
batchsize = 10000
for num in range(0, len(data), batchsize):
    session.executemany("insert into testExecuteManyBatch values (?, ?, ?, ?)"),
    data[num:num+batchsize], batch=True)


The points are batch=True and specifying batchsize. If you don't set the batchsize or the size is too large, it will be failed (forgot the actual message though). The performance in my environment (1 node) was 10,000 rows/sec. The table has 4 columns. I assume tens of thousands looks fine, but more rows should be imported with FastLoad or MLOAD.