About Me

My photo
Software Engineer at Starburst. Maintainer at Trino. Previously at LINE, Teradata, HPE.

2017-08-21

Introduction to Big Data and Teradata Aster


This a note for Teradata Aster Basics 6.10 Exam a.k.a TACP(Teradata Aster Certified Professional).
Recommended courses are followings and this note is for the 2nd course.
  • Teradata Certification, What’s New and How to Prepare
  • Introduction to Big Data and Teradata Aster
  • Introduction to Teradata Aster Analytics*
  • Introduction to Teradata Aster Database Administrator*
SQL vs SQL-MR: SQL is better for standard transformation. SQL-MR is better for custom transformation(e.g. log extraction)
R creates multiple copies of data during processing, and doesn’t automatically run in parallel. Aster R run in parallel across the Aster MPP architecture.
FSE(Foreign Server Encapsulation): Supports remote data platforms other than Aster and Teradata. (e.g. Oracle, Hadoop, DB2, etc)
QueryGrid Aster-Teradata: Join tables in Taeradata and Aster Database
QueryGrid Aster-Hadoop: Copy data from Hadoop to Aster, from Aster to Hadoop. HCatalog: Table metastore service for Hive, Pig, and so on.
Deployment Options: Aster Apliance, Cloud, Software Only(RHEL) and Aster on Hadoop.
Data Prepartion: IPGeo, Pivot, JsonParser, Apach Log Parser and PSTParserAFS
Aster Analytics Portfolio
  • Data Acquistion
  • Data Prepartion
  • Advanced Analytics
  • Visualization
Aster Database
  • Analytic Engine
    • Aster SQL-MR
    • Aster SQL-GR (Based on Bulk Synchronous Processing)
    • Aster R
  • SNAP Framework
    • Integrated Optimizer
    • Integrated Executor
    • Unified SQL Interface
    • Common Storage System and Services
  • Multi-Type Storage
  • AFS(Aster File Store)
Queen: Cluster Coordination, Distributed Query Planning, System Tables
Worker Node: Send back results to Queen
Loader: Loading data to Aster
Access Control
  • Aster username/password
  • TD Wallet
  • LDAP
Multi-Version Concurrency Control(MVCC): Eliminate the needs of read locks while ensuring that the database maintains the key ACID(Atomicity, Consistency, Isolation, Durability)
Two Level Query Optimization
  • Queen Global Optimizer: Rule Based
  • v-Worker Optimizer: Cost Based. The cost is determined by the demographics of the v-Worker fragment of the distributed data.
Dynamic Workload Management
  • User-based policies
  • Time-based policies
  • Object-based policies
  • IP-based policies
  • Periodic Re-evaluation
nCluster’s columnar capability is a custom development of Aster. Not part of PostgreSQL. Columnar limitation is append only(no updates or deletes)
Columnar advantage and limitation
  • Use NOT NULL whenever possible
  • Avoid variable length data
  • Don’t SELECT/ANALYZE any columns unless it is necessary
Three compression levels
  • Hot data: No or low compression
  • Cold data: Medium or High compression
Informatica has Aster connector. Others uses nCluster loader.
Aqua Data Studio: http://www.aquafold.com/
Viewpoint portlet for Aster
  • Aster Node Monitor
  • Aster Completed Processes