@ebyhr: Introduction to Big Data and Teradata Aster

This a note for Teradata Aster Basics 6.10 Exam a.k.a TACP(Teradata Aster Certified Professional).

Recommended courses are followings and this note is for the 2nd course.

Teradata Certification, What’s New and How to Prepare
Introduction to Big Data and Teradata Aster
Introduction to Teradata Aster Analytics*
Introduction to Teradata Aster Database Administrator*

SQL vs SQL-MR: SQL is better for standard transformation. SQL-MR is better for custom transformation(e.g. log extraction)

R creates multiple copies of data during processing, and doesn’t automatically run in parallel. Aster R run in parallel across the Aster MPP architecture.

FSE(Foreign Server Encapsulation): Supports remote data platforms other than Aster and Teradata. (e.g. Oracle, Hadoop, DB2, etc)

QueryGrid Aster-Teradata: Join tables in Taeradata and Aster Database

QueryGrid Aster-Hadoop: Copy data from Hadoop to Aster, from Aster to Hadoop. HCatalog: Table metastore service for Hive, Pig, and so on.

Deployment Options: Aster Apliance, Cloud, Software Only(RHEL) and Aster on Hadoop.

Data Prepartion: IPGeo, Pivot, JsonParser, Apach Log Parser and PSTParserAFS

Aster Analytics Portfolio

Data Acquistion
Data Prepartion
Advanced Analytics
Visualization

Aster Database

Analytic Engine

Aster SQL-MR
Aster SQL-GR (Based on Bulk Synchronous Processing)
Aster R

SNAP Framework

Integrated Optimizer
Integrated Executor
Unified SQL Interface
Common Storage System and Services

Multi-Type Storage
AFS(Aster File Store)

Queen: Cluster Coordination, Distributed Query Planning, System Tables

Worker Node: Send back results to Queen

Loader: Loading data to Aster

Access Control

Aster username/password
TD Wallet
LDAP

Multi-Version Concurrency Control(MVCC): Eliminate the needs of read locks while ensuring that the database maintains the key ACID(Atomicity, Consistency, Isolation, Durability)

Two Level Query Optimization

Queen Global Optimizer: Rule Based
v-Worker Optimizer: Cost Based. The cost is determined by the demographics of the v-Worker fragment of the distributed data.

Dynamic Workload Management

User-based policies
Time-based policies
Object-based policies
IP-based policies
Periodic Re-evaluation

nCluster’s columnar capability is a custom development of Aster. Not part of PostgreSQL. Columnar limitation is append only(no updates or deletes)

Columnar advantage and limitation

Use NOT NULL whenever possible
Avoid variable length data
Don’t SELECT/ANALYZE any columns unless it is necessary

Three compression levels

Hot data: No or low compression
Cold data: Medium or High compression

Informatica has Aster connector. Others uses nCluster loader.

Aqua Data Studio: http://www.aquafold.com/

Viewpoint portlet for Aster

Aster Node Monitor
Aster Completed Processes

About Me

2017-08-21

Introduction to Big Data and Teradata Aster