About Me

My photo
Software Engineer at Starburst. Maintainer at Trino. Previously at LINE, Teradata, HPE.

2017-09-21

Introduction to Teradata Aster Database Administration


This a note for Teradata Aster Basics 6.10 Exam a.k.a TACP(Teradata Aster Certified Professional).
Recommended courses are followings and this note is for the 2nd course.
  • Teradata Certification, What’s New and How to Prepare
  • Introduction to Big Data and Teradata Aster
  • Introduction to Teradata Aster Analytics
  • Introduction to Teradata Aster Database Administrator
nc_system schema holds system information.
3 categories of DD views
  • nc_all
  • nc_user
  • nc_user_owned
Replication Factor
  • RF=1: No secondary v-wokers. No fallback
  • RF=2: If a woker goes down, secondary v-worker will be promoted to the new primary v-worker. Max is 2. Primay and its replica are not located in the same Worker node.
Ganglia is a open source, web-based, scalable distributed system monitoring tool.
AMC Status
  • Green: operating normally
  • Blue: decrease in performance
  • Yellow: unable to process statement requests
  • Red: stopped
  • White/Clear: no longer able to establish a connection
Aster Database only supports B-tree indexes, cannot enforce referential integrity.
There is no data sharing among Aster databases.

/*Change database*/
beehive=> \connect retails_sales;
retails_sales=>
retails_sales=>database beehive;
beehive=>

/*help*/
beehive=>\?

/*List database*/
beehive=> \l

/*Exit database*/
beehive=> \q

/*List schemas*/
beehive=> \dn

/*View tables in the PROD schemas*/
beehive=> \dt prod.*

/*View columns/data types*/
beehive=> \d prod.sales_fact

/*Show current schema*/
show search_path;
ALTER USER beehive SET SEARCH_PATH = 'public', 'mkt';

Select table by walking the path until finding the name
CONNECT privilege must be given to access the database. USAGE privilege must be given to access the schema.
Two Serial types: Global and Local.
  • A Serial Global type ensures the serial property across all of the nodes in the system.
  • A Serial Local type ensures the serial property local to each logical partition of data.
PARTITION BY RANGE: START include the value but END exclude the value
partition sales_june (START'2017-06-01'::date END'2017-07-01'::date)
PARTITION BY LIST: If an incoming row doesn not fit into any partition, that row will not be loaded into the table
1. Data Modeling Quiz
  • Q. Best schemas for Teradata Aster databases
    A. Star schema and Snowflake schema
Aster column name rules
  • Starts with a character
  • Must be < 63 characters
  • Names may include special characters (_ , $)
Constraint Options
  • Null/Not Null
  • Primary key
  • Default values
  • Check values
create table stuff
(
emp int NOT NULL PRIMARY KEY,
dept varchar DEFAULT 'none',
age smallint CHECK(age >= 18 and age <= 70),
name varchar
)
distribute by replication
;

Data Types
  • CHAR, CHARACTER VARYING, VARCHARA(n) maximum is 10MB
  • TEXT is unlimited
  • Special type are Boolean, Bytea, Serial, Big Serial
Supported data types of distribution
  • smallint
  • integer
  • bigint
  • numeric
  • text
  • varchar
  • uuid
  • bytea
For large tables (> 1million rows, usually Fact table)
For small tables (<= 1million rows, usually Dimension table)
If ASH key and JOIN columns doesn’t match, SHUFFLE will occure.
TRUNCATE: Quickly remove all rows and it reclaims disk space immediately
VACUUM: Converts dead space into usable free space
VACUUM FULL ANALYZE: Physically rearrange the data on disk
NC_RELATIONSTATS: Generate various reports

2: Creating Tables Quiz
Tables in a Teradata Aster Database can be of which four variations? (Choose four.)
  • Temporary(Fact/Dim)
  • Analytic(Fact/Dim)
  • Fact
  • Dimension
What data type is commonly used for “payload” columns? Click on the correct data type in the image.
TEXT
In Teradata Aster, table data may be partitioned in which two ways? (Choose two.)
Logically Partitioned tables (Logical)
Fact tables(Physical)
How do these two partitioning types improve performance? Match the partitioning type to how it improves performance.
Physical: More v-Workers equal more parallelism
Logical: Reduced disk I/O by only reading needed partitions
Scenario: You join 2 FACT tables where the Hash column matches the JOIN column. Will a shuffling of data occur?
No, the JOIN will commence immediately since JOIN column values are guaranteed to be on the same v-Worker.
nCluster loader arguments
  • -B –begin-script
  • -E –end-script
  • -d –dbname
  • -D –delimiter
  • -c –csv
  • -l –loader
  • –truncate-table
  • -w –password
  • -z –auto-analyze
  • -U –username
  • -p –port
  • –el-enabled
  • –skip-rows-1
Default delimited format is TSV
-B and -E specify script name to execute it
Parallelizing the Load tier
  • Add more loader nodes
  • Add more staging machines
  • Add more nCluster loaders running on the staging machines
Error logging is turned off by default. This means the load job will abort and rollback the data on encoutering the first error.
  • –el-enabled
  • –el-limit <#>
  • –el-table
  • –el-label
  • –el-errfile
ncluster_export example
ncluster_export -h 192.168.100.100 -d beehive -U beehive -w beehive
\"aaf\".\"accesslog\" myfile.txt

3: Data Loading Quiz
What is the name of the Teradata Aster Database bulk loading tool? ncluster_loader
Which two node types can handle Teradata Aster data loading ? (Choose two.)
Loader nodes and Queen node (if there are no loader nodes)
Which task do Loader nodes perform during loading?
Hashing the Distribution Key for v-Worker placement
The loading tier can be scaled in which three ways? (Choose three.)
Add more nCluster Loaders, Loader Nodes, Staging Machines
In addition to the nCluster Loader Tool, which four other types of tools are used to load a Teradata Aster Database? (Choose four.)
ETL Tools, SQL Statements, Connectors, Teradata QueryGrid (Aster-to-Hadoop, Aster-to-Teradata)

Final Exam
  • Q. You have 5 Teradata Aster Databases. How many Data Dictionaries do you have?
  • A. 5 - one for each Teradata Aster Database
  • Q. Which two statements are true regarding a Teradata Aster Database? (Choose two.)
  • A. Each user must be given the CONNECT privilege on a database to access objects on the database,
  • By default, there is one database in a new installed Teradata Aster cluster called, beehive
  • Q. True or False: The Aster Loader Tool must point to the Queen and can optionally point to the Cluster Loader Node for hashing. A. True
  • Q. Which two statements are true regarding a Teradata Aster Database? (Choose two.)
  • A. Data objects may be shared across schemas in the same database,Users can join tables from one schema with tables in another schema if they have proper privileges for the schemas/tables