About Me

My photo
Software Engineer at Starburst. Maintainer at Trino. Previously at LINE, Teradata, HPE.


Introduction to Big Data and Teradata Aster

This a note for Teradata Aster Basics 6.10 Exam a.k.a TACP(Teradata Aster Certified Professional).
Recommended courses are followings and this note is for the 2nd course.
  • Teradata Certification, What’s New and How to Prepare
  • Introduction to Big Data and Teradata Aster
  • Introduction to Teradata Aster Analytics*
  • Introduction to Teradata Aster Database Administrator*
SQL vs SQL-MR: SQL is better for standard transformation. SQL-MR is better for custom transformation(e.g. log extraction)
R creates multiple copies of data during processing, and doesn’t automatically run in parallel. Aster R run in parallel across the Aster MPP architecture.
FSE(Foreign Server Encapsulation): Supports remote data platforms other than Aster and Teradata. (e.g. Oracle, Hadoop, DB2, etc)
QueryGrid Aster-Teradata: Join tables in Taeradata and Aster Database
QueryGrid Aster-Hadoop: Copy data from Hadoop to Aster, from Aster to Hadoop. HCatalog: Table metastore service for Hive, Pig, and so on.
Deployment Options: Aster Apliance, Cloud, Software Only(RHEL) and Aster on Hadoop.
Data Prepartion: IPGeo, Pivot, JsonParser, Apach Log Parser and PSTParserAFS
Aster Analytics Portfolio
  • Data Acquistion
  • Data Prepartion
  • Advanced Analytics
  • Visualization
Aster Database
  • Analytic Engine
    • Aster SQL-MR
    • Aster SQL-GR (Based on Bulk Synchronous Processing)
    • Aster R
  • SNAP Framework
    • Integrated Optimizer
    • Integrated Executor
    • Unified SQL Interface
    • Common Storage System and Services
  • Multi-Type Storage
  • AFS(Aster File Store)
Queen: Cluster Coordination, Distributed Query Planning, System Tables
Worker Node: Send back results to Queen
Loader: Loading data to Aster
Access Control
  • Aster username/password
  • TD Wallet
  • LDAP
Multi-Version Concurrency Control(MVCC): Eliminate the needs of read locks while ensuring that the database maintains the key ACID(Atomicity, Consistency, Isolation, Durability)
Two Level Query Optimization
  • Queen Global Optimizer: Rule Based
  • v-Worker Optimizer: Cost Based. The cost is determined by the demographics of the v-Worker fragment of the distributed data.
Dynamic Workload Management
  • User-based policies
  • Time-based policies
  • Object-based policies
  • IP-based policies
  • Periodic Re-evaluation
nCluster’s columnar capability is a custom development of Aster. Not part of PostgreSQL. Columnar limitation is append only(no updates or deletes)
Columnar advantage and limitation
  • Use NOT NULL whenever possible
  • Avoid variable length data
  • Don’t SELECT/ANALYZE any columns unless it is necessary
Three compression levels
  • Hot data: No or low compression
  • Cold data: Medium or High compression
Informatica has Aster connector. Others uses nCluster loader.
Aqua Data Studio: http://www.aquafold.com/
Viewpoint portlet for Aster
  • Aster Node Monitor
  • Aster Completed Processes


Shinkai Makoto Movies

新海 誠監督の「君の名は。」がiTunesでレンタル開始されていたので観てみました。ちなみにこの3連休で「言の葉の庭」と「秒速5センチメートル」も観ています。 君の名は。の感想としては良かったんですが、イマイチ内容に入り込めなかった部分もある…というのが正直な感想です。内容が学生同士だからか、それとも田舎が舞台の話だからか登場人物目線で考えることがなんか難しく。 新海監督の評価として”写真(現実)よりも綺麗な絵”といったことを聞きますが、あそこまで綺麗だと絵の方にばかり目が行ってしまい、こんな綺麗な景色、現実ではないよなぁとかそんなことを考えてしまいました。 アニメなので家でいいかなと思って上映期間中に行かなかったんですが、もっと集中するためにも映画館で観た方が良かったかも?とちょっと思いました。映画だと消化不良な部分もあり小説も買ってみたので、のんびり読み進めてみます。