Join Stage in DataStage

The Join stage is a processing stage. It performs join operations on two or more data sets input to the stage and then outputs the resulting data set.

The Join stage is one of three stages that join tables based on the values of key columns.
The three stages differ mainly in the memory they use, the treatment of rows with unmatched keys, and their requirements for data being input.
In the Join stage, the input data sets are notionally identified as the "right" set and the "left" set.
It has any number of input links and a single output link.
The data sets input to the Join stage must be key partitioned and sorted in ascending order.
This ensures that rows with the same key column values are located in the same partition and will be processed by the same node.
The stage can perform one of four join operations:
Inner It joins two or more tables and returns only those records which satisfy join condition. Records whose key columns do not contain equal values are dropped.
Left outer transfers all values from the left data set but transfers values from the right data set and intermediate data sets only where key columns match. The stage drops the key column from the right and intermediate data sets. puts NULL wherever unmatched record from 2nd (Right) table.
Right outer transfers all values from the right data set and transfers values from the left data set and intermediate data sets only where key columns match. The stage drops the key column from the left and intermediate data sets. puts NULL wherever unmatched record from 2nd (Left) table.
Full outer  It joins two or more tables and returns both matched and unmatched records from all tables. (Full outer joins do not support more than two input links.)
 Options:
Join Keys/Key - Type: Input Column
Name of input column you want to join on. Columns with the same name must appear in both input data sets and have compatible data types.
Join Type - Type of join operation to perform 

Comments

Popular Posts