@ebyhr: September 2017

2017-09-29

Teradata Aster Certified Professional

Teradata Aster Certified Professional 6.10

2017-09-28

Aster Basic Book

Below are personal memo about the book, Teradata Aster BASICS.

Chapter 1
Four key characteristics of Big Data?
Complexity, Variety, Velocity, Volume

Chapter 2
During Partition Splitting (Adding v-Workers), Aster Database is unavailable until the process is completed
Adding Worker Nodes (More processing power per v-Worker)
Adding v-Workers(Partition Splitting) More parallelism across the system
Queen node be RAID5
Worker and Loader be RAID0 for small configurations
Worker and Loader be RAID5 for large configurations

Chapter 3
What is QueryGrid?
is a client-based user interface that leverages the updates
manages optimization so that each platform does what it does best when resolving the query request

Chapter 5
MVCC: The result is that over time, the table can contain logically deleted (invisible) rows that are no longer used, but are consuming table space

Chapter 6
Multi-structured data: consists of files that contain a wide variety of different formats and data types in a non-fixed manner that must be parsed interpreted properly

Chapter 8
Map/Row function must implement the operateOnSomeRows method
Reduce/Partition function must implement the operateOnPartitionmethod

Chapter 10
Rows returned in the RESULT operator are always?
Aggregations

2017-09-23

Introduction to Teradata Aster Analytics

This a note for Teradata Aster Basics 6.10 Exam a.k.a TACP(Teradata Aster Certified Professional).
Recommended courses are followings and this note is for the 3rd course.

Teradata Certification, What’s New and How to Prepare
Introduction to Big Data and Teradata Aster*
Introduction to Teradata Aster Analytics
Introduction to Teradata Aster Database Administrator*

Map function doesn’t have PARTITION BY
Reduce function has PARTITION BY
PARTITION BY affects SHUFFLE phases

Quiz
What are the two different types of SQL-MR functions? (Choose two.)
Partition functions (Reduce) & Row functions (Map)
In SQL-MR the ON clause can be what three things? (Choose three)
Function, Table & Query
Fill in the missing word. How do you distinguish between a Map vs. Reduce function?
A REDUCE function has a PARTITION BY clause, whereas the Map function does not
What criteria would you use to determine if you want to run SQL versus SQL-MR? Select the four criteria that are better suited to run with SQL-MR. (Choose four)
Unstructured or multi-structured data, Machine learning algorithm, Recursive querie & Self-joins

Acquistion functions

load_from_hadoop
load_to_hadoop
load_from_hadoop_dir
load_from_pst
load_tweets
anydatabase2aster
load_from_s3
load_to_s3

Define foreign server
create foreign server hdp21
using server('192.168.100.21')
dbname('default') username('hue')
do import with load_from_hcatalog,
do export with load_to_hcatalog;

create foreign server td15
using tdpid('192.168.100.15')
username('td01') password('td01')
do import with load_from_teradata,
do export with load_to_teradata;
Pull & Push-down query

--pull
select c1, sum(c2)
from t1@td15
group by 1;

--push down
select * from FOREIGN SERVER
($$ select c1, sum(c2) from t1 group by 1 $$)@td15;

Quiz
Which two Teradata QueryGrid connectors can acquire data for Aster? (Choose two)
Aster-to-Teradata & Aster-to-Hadoop
Why move data between Teradata and Aster? Match Aster and Teradata to what each database is best designed for.
Aster - for analytics by limited number of data scientists
Teradata - for high concurrency (hundreds of users)
What are some Teradata Aster parser functions?
Apache logs, xml, json and pst
Using the Stream API, you can write functions in programming languages that are not native to Teradata Aster (e.g., write non SQL-MR or SQL-GR functions) and run them on Aster, generating output that Aster can receive, including:

writing R functions to run on Aster
write custom python, perl, C/C++/C# functions to run on Teradata Aster

Quiz
nPath is used for Pattern Matching across Time Series
What three expressions are used to specify input data for nPath? on, partition by and order by What three expressions are used to specify nPath search criteria?
mode, pattern and symbols
What kind of function is Kmeans?
clustering
What kind of function is Decision Tree?
predictive function

Quiz
What visualization function(s) are in Teradata Aster AppCenter?
Visualizer (formerly nPathViz and cFilterViz)
What needs to be configured before building a new Application?
Create a JDBC connection
Name three Data Format types. (Choose three)
nPath, Table and cFilter
Name four different chart types that Teradata Aster AppCenter visualizations create. (Choose four)
Tree, SanKey, Sigma and Chord
How can users dynamically change Teradata Aster AppCenter chart visualizations?
By clicking on objects and/or by changing Layout/Format specs

Quiz
How do you connect to Aster via RStudio?
Aster ODBC driver
What is the name of the Aster package for Teradata Aster R?
TeradataAsterR
You want to access Help for Teradata Aster R to see a list of commands. What syntax would accomplish this?
help(package=’TeradataAsterR’)

Final Exam SCORE: 96 PASSED Question 1 Correct

True or False: Map-Reduce is a programming model and an associated implementation for processing and generating large data sets. Your answer: True Question 2 Correct
Each Map function performs an ETL on ____ in the input. Your answer: all rows Question 3 Correct
The ___ gets a key and the array of values emitted with that key and produces the final result. Your answer: Reduce Function Question 4 Correct
The SQL-MR syntax ON clause specifies the input rows, which can be a ___. (Choose four) Your answer: Table,View,Sub-query,SQL-MR function Question 5 Correct
Does the syntax use a Map Function or a Reduce Function? Drag and Drop the Map Function and Reduce Function labels (at left) to the correct syntax (at right). Your answer: 1-2,2-1 Question 6 Correct
6. True or False: Functions can be Map and Reduce functions at the same time. Your answer: False Question 7 Correct
In the syntax below, click on the input. Your answer: 5 Question 8 Correct
8. Match the function (at left) with its description (at right): Your answer: 1L-1R,2L-2R,3L-3R,4L-4R Question 9 Correct
The _____ is used for clustering. Clustering is a fast/simple method for grouping objects into preliminary clusters using an approximate distance method. Each point is represented as a point in a multidimensional feature space. Your answer: Canopy function Question 10 Correct
True or False: Map-Reduce is a programming model and an associated implementation for processing and generating large data sets. Your answer: True Question 11 Correct
The _____ can extract multiple columns of structured data from standard Apache Web Logs. Your answer: Apache Log Parser Question 12 Correct
Match the function (at left) with what it’s used for (at right): Your answer: 1L-1R,2L-2R,3L-3R,4L-4R,5L-5R,6L-6R,7L-7R,8L-8R Question 13 Correct
This question tests your knowledge of nPath pattern matching using the mode: non-overlapping and the pattern: ‘B+.C.A’. Given this input table and nPath syntax, which pattern matches will be in the output rows? Your answer: 1 row: BBBCA Question 14 Correct
This question tests your knowledge of nPath pattern matching using the mode: overlapping and the pattern: ‘B+.C.A’. Given this input table and nPath syntax, which pattern matches will be in the output rows? Your answer: 3 rows: BBBCA, BBCA, BCA Question 15 Incorrect
True or False: In Teradata Aster a single SQL-MR statement can call all of the necessary functions to go through acquiring data, to preparing it, to the multi-genre analyzing of it, and finally to visualizing it. Your answer: False Correct answer: True Question 16 Correct
In Teradata Aster the ____function creates a row in a visualization table where Teradata Aster AppCenter can access it, view it, and manipulate it. Your answer: ‘Visualizer’ Question 17 Correct
True or False: Teradata Aster R packages addresses the Challenges of R by allowing programmers to scale R analytics by leveraging Teradata Aster. Your answer: True Question 18 Correct
Before beginning able to connect to the Teradata Aster cluster and start issuing Teradata Aster R commands, you must do which two things? (Choose two) Your answer: Install/configure Teradata Aster 6.20 ODBC driver,Install RODBC and Teradata Aster R packages Question 19 Correct
What are RMapReduce runners? Your answer: Functions to run R-code in Teradata Aster Question 20
Match the Teradata Aster R function to what you would use it for: Question 20.1 Correct Your answer: 1L-1R Question 20.2 Correct Your answer: 2L-2R Question 20.3 Correct Your answer: 3L-3R Question 20.4 Correct Your answer: 4L-4R Question 20.5 Correct Your answer: 5L-5R

2017-09-21

Introduction to Teradata Aster Database Administration

This a note for Teradata Aster Basics 6.10 Exam a.k.a TACP(Teradata Aster Certified Professional).

Recommended courses are followings and this note is for the 2nd course.

Teradata Certification, What’s New and How to Prepare
Introduction to Big Data and Teradata Aster
Introduction to Teradata Aster Analytics
Introduction to Teradata Aster Database Administrator

nc_system schema holds system information.

3 categories of DD views

nc_all
nc_user
nc_user_owned

Replication Factor

RF=1: No secondary v-wokers. No fallback
RF=2: If a woker goes down, secondary v-worker will be promoted to the new primary v-worker. Max is 2. Primay and its replica are not located in the same Worker node.

Ganglia is a open source, web-based, scalable distributed system monitoring tool.

AMC Status

Green: operating normally
Blue: decrease in performance
Yellow: unable to process statement requests
Red: stopped
White/Clear: no longer able to establish a connection

Aster Database only supports B-tree indexes, cannot enforce referential integrity.

There is no data sharing among Aster databases.

/*Change database*/

beehive=> \connect retails_sales;

retails_sales=>

retails_sales=>database beehive;

beehive=>

/*help*/

beehive=>\?

/*List database*/

beehive=> \l

/*Exit database*/

beehive=> \q

/*List schemas*/

beehive=> \dn

/*View tables in the PROD schemas*/

beehive=> \dt prod.*

/*View columns/data types*/

beehive=> \d prod.sales_fact

/*Show current schema*/

show search_path;

ALTER USER beehive SET SEARCH_PATH = 'public', 'mkt';

Select table by walking the path until finding the name

CONNECT privilege must be given to access the database. USAGE privilege must be given to access the schema.

Two Serial types: Global and Local.

A Serial Global type ensures the serial property across all of the nodes in the system.
A Serial Local type ensures the serial property local to each logical partition of data.

PARTITION BY RANGE: START include the value but END exclude the value

partition sales_june (START'2017-06-01'::date END'2017-07-01'::date)

PARTITION BY LIST: If an incoming row doesn not fit into any partition, that row will not be loaded into the table

1. Data Modeling Quiz

Q. Best schemas for Teradata Aster databases
A. Star schema and Snowflake schema

Aster column name rules

Starts with a character
Must be < 63 characters
Names may include special characters (_ , $)

Constraint Options

Null/Not Null
Primary key
Default values
Check values

create table stuff

(

emp int NOT NULL PRIMARY KEY,

dept varchar DEFAULT 'none',

age smallint CHECK(age >= 18 and age <= 70),

name varchar

)

distribute by replication

;

Data Types

CHAR, CHARACTER VARYING, VARCHARA(n) maximum is 10MB
TEXT is unlimited
Special type are Boolean, Bytea, Serial, Big Serial

Supported data types of distribution

smallint
integer
bigint
numeric
text
varchar
uuid
bytea

For large tables (> 1million rows, usually Fact table)

For small tables (<= 1million rows, usually Dimension table)

If ASH key and JOIN columns doesn’t match, SHUFFLE will occure.

TRUNCATE: Quickly remove all rows and it reclaims disk space immediately

VACUUM: Converts dead space into usable free space

VACUUM FULL ANALYZE: Physically rearrange the data on disk

NC_RELATIONSTATS: Generate various reports

2: Creating Tables Quiz

Tables in a Teradata Aster Database can be of which four variations? (Choose four.)

Temporary(Fact/Dim)
Analytic(Fact/Dim)
Fact
Dimension

What data type is commonly used for “payload” columns? Click on the correct data type in the image.

TEXT

In Teradata Aster, table data may be partitioned in which two ways? (Choose two.)

Logically Partitioned tables (Logical)

Fact tables(Physical)

How do these two partitioning types improve performance? Match the partitioning type to how it improves performance.

Physical: More v-Workers equal more parallelism

Logical: Reduced disk I/O by only reading needed partitions

Scenario: You join 2 FACT tables where the Hash column matches the JOIN column. Will a shuffling of data occur?

No, the JOIN will commence immediately since JOIN column values are guaranteed to be on the same v-Worker.

nCluster loader arguments

-B –begin-script
-E –end-script
-d –dbname
-D –delimiter
-c –csv
-l –loader
–truncate-table
-w –password
-z –auto-analyze
-U –username
-p –port
–el-enabled
–skip-rows-1

Default delimited format is TSV

-B and -E specify script name to execute it

Parallelizing the Load tier

Add more loader nodes
Add more staging machines
Add more nCluster loaders running on the staging machines

Error logging is turned off by default. This means the load job will abort and rollback the data on encoutering the first error.

–el-enabled
–el-limit <#>
–el-table
–el-label
–el-errfile

ncluster_export example

ncluster_export -h 192.168.100.100 -d beehive -U beehive -w beehive

\"aaf\".\"accesslog\" myfile.txt

3: Data Loading Quiz

What is the name of the Teradata Aster Database bulk loading tool? ncluster_loader

Which two node types can handle Teradata Aster data loading ? (Choose two.)

Loader nodes and Queen node (if there are no loader nodes)

Which task do Loader nodes perform during loading?

Hashing the Distribution Key for v-Worker placement

The loading tier can be scaled in which three ways? (Choose three.)

Add more nCluster Loaders, Loader Nodes, Staging Machines

In addition to the nCluster Loader Tool, which four other types of tools are used to load a Teradata Aster Database? (Choose four.)

ETL Tools, SQL Statements, Connectors, Teradata QueryGrid (Aster-to-Hadoop, Aster-to-Teradata)

Final Exam

Q. You have 5 Teradata Aster Databases. How many Data Dictionaries do you have?
A. 5 - one for each Teradata Aster Database
Q. Which two statements are true regarding a Teradata Aster Database? (Choose two.)
A. Each user must be given the CONNECT privilege on a database to access objects on the database,
By default, there is one database in a new installed Teradata Aster cluster called, beehive
Q. True or False: The Aster Loader Tool must point to the Queen and can optionally point to the Cluster Loader Node for hashing. A. True
Q. Which two statements are true regarding a Teradata Aster Database? (Choose two.)
A. Data objects may be shared across schemas in the same database,Users can join tables from one schema with tables in another schema if they have proper privileges for the schemas/tables

2017-09-20

RubyKaigi 2017

Followings are my personal memo, so sorry about the dirty and lack of sentence. Even if there’s less memo, it doesn’t mean the talk’s quality is low. I just concentrated on listening the talk.

Day1

Making Ruby?　ゆるふわRuby生活

Matz team. (are only Matz and Nobu)

Daily: Debugging, New features, Bug making etc

Why not Git? Windows is not supported officially. Not enough advantage

Developers’ meeting are held once per month

How to build Ruby is configure and make

BASERUBY: pre-installed ruby. generate source files

MINIRUBY: ruby made during the build. No dynamic loading. Unable to load extension libraries.

Mimic global variables used in mkmf.rb by trace_var

Following is exactly not a bug

p = 2

p (-1.3).abs = -1.3

Demon Castle parse.y by name

Monstrous lex_state

literal symbol by intern

Refining String#intern returns no-symbol

New features in 2.5?

Such as $. 2.3

Unicode case 2.4

Approved Array #append, #prepend

Rejected neko ^..^ operator (make range), User-defined operator

Under Discussion Method extraction operator (Kernel#method -> Method instance), Rightward assignment

Wouldn’t you write New Ruby?

API Development 2017

Ruby History

to_json

jbuilder is very slow

ActiveModel::Serializers JSON-Schema API Blueprint(apiary), OpenAPI(Swagger), RAML, JSON Hyper-Schema, bare JSON Schema

Why choose OpenAPI? It has RESTful definition than API Blueprint

OpenAPI is developed by Swagger originally. Give type into OpenAPI

Use $merge and $ref to OpenAPI porting

GraphQL is API query language like SQL, WYSIWYG, has only 1 endpoint

Ecosystem is still insufficient

BFF is the abbreviation of Backend for Frontend

How Close is Ruby 3x3 For Production Web Apps?

5000 warmup iterations increase 5-7%

Code is available in noahgibbs/rails_ruby_bench Rails version is 4.2.6

Development of Data Science Ecosystem for Ruby

PyCall runs in Ruby interpreter

Use pandas in Rails app

Python is a best friend of Ruby from now on!

PyCall should be a temporary way

Red Data Tools project will be a home

Apache Arrow aims to be Arrow Memory for common tools

Red Arrow is Ruby’s one for Apach Arrrow

Sutou-san (@kou) officially became a member of PMC of Apach Arrow yesterday

Jupyter Notebook also supports Ruby

Python is managed by reference counting

Some types(string, dictionary etc) are converted from python to ruby primitive type

Ruby Commiters vs the World

Method chain is left to right, but why a substituion is left? is the motivation for rightward. Use method chain (e.g. assign_to)

Matz likes Swift and Closure except for Ruby, Emacs Lisp and Stream

ActiveSupport is made for web application

Day2

The Many Faces of Module

RubyKaigi is the biggest ruby conference!

Talk about talent

Love of languages

First language Matz used is Basic

Matz posted about Ruby to Python mailing list

Simula (1968) first object oriented language

Created by Dr. Kristen Nygaard

Lisp (Flavors)

C3 linearization algorithm

Alias method chain vs Module#prepend

CLOS(Common List Object System) Method combination enables hook before/after/around method calls

Gorilla, Guerrilla, Monkey patching has global influence

Scoped monkey patching?

An introduction and future of Ruby coverage library

Line coverage, Branch coverage, Path coverage

C0, C1, C2 coverage

Coverage is just a measure, not a goal

SimpleCov is a wrapper for coverage.so

Method definition is counted as an execution

Concov detets temporal change of coverage

13901 is ticket

Automated Type Contracts Generation for Ruby

RuboCop is an industry standard solution

RubyMine can detect more errors than RuboCop

Coverage is a lie

YARV compiles code into the bytecode

JetBrains/ruby-type-inference

Type Checking Ruby Programs with Annotations

JIT Type Checking for Dynamic Languages (Ren, 2016)

def foo(x)

"".bar if x

end

foo(false)

Steep check types with annotations

Ruby Language Server

Editor specific library & Universal LSP Client plugin

Auto complete & Jump definition is WIP

mtsmfm/language_server-ruby

Syntax check uses ruby -wc filename.rb Auto completion uses rcodetools

Day3

Compacting GC in MRI

Copy on Write(CoW)

memsize_of, dup

CoW Page Fault cause bad performance

Unicorn is a forking webserver which has parent and many childs

Garbage Collector affects CoW

GC Compactions means move objects to avoid OS replicate many pages

Some people said “Moving objects is impossible”

Aaron tried ‘Two Finger Compaction’

Disadvantages slow

Advantage easy

Object Movement uses two finger. One is Free Pointer the other is Scan Pointer

Free Pointer detect free address, Scan Pointer detect used address until the two finger meet

What objects can move?

Everything!?

GC can knows ruby’s reference easily

If reference is C extension, it’s difficult

hash_key uses memory address: fix cache hash key

rb_gc_mark Dual References: fix Call rb_gc_mark, or use only ruby

rb_define_class

string literals It seems like nothing can move, but most can be fixed

46% can move!

ObjectSpace.dump_all can output as json

tenderlove/heap-utils

/proc/{PID}/smaps

Question your assumptions

Ruby for Distributed Storage System

Replication with Quoram

Create 2 replica of data at least (max 3). IF first response is ok, discard a thread for another node

Bigdam: Edge locations on the earch + the Central location

Bigdam-pool: Distributed key-value storage. to build S3 free data ingestion pipeline

chunk id guarantees the uniqueness

Ruby 2.5 supports block-wide rescue

JRuby at 15 Years: Meeting the Challenges

JRuby’s Startup Warmup phase is slower than MRI

Have to lead tons of java classes

Warmup phase has JRuby(Ruby Interpret, Ruby JIT) & JVM(Java Interpret, Java JIT)

Graal is new JIT for all languages on the JVM

jruby/jruby

Towards Ruby 3x3 performance

RTL

In special case, when GCC optimize simple loop, the byte code doesn’t have loop, but JVM cannot do this

MJIT status

* Unstable, doesn’ work on Windows, one more year to mature

* No inlining yet(most important optimizations!), Use C inlining, new GCC/LLVM extension

* Will RTL and MJIT be a part of MRI?

References

Slides

This is my first RubyKaigi but really interesting. Especially, Aaron’s talk left an impression on me. Of course Matz’s talk is also. Until this conference, I simply liked writing Ruby, but now I like Ruby and the community. (Uh, sounds like a poem) Matsuda-san announced that the next RubyKaigi is held in Sendai, Miyagi (June 31st - July 2nd). I will definitely go again next year!