Development & Debug Stages in DataStage

September 08, 2018

Development & Debug Stages in DataStage

Head stage

Head stage can have a single input link and a single output link. It helps you to get sample data. The Head Stage selects the first N rows from each partition of an input data set and copies the selected rows to an output data set. You determine which rows are copied by setting properties which allow you to specify:·

The number of rows to copy.

The partition from which the rows are copied.

The location of the rows to copy.

The number of rows to skip before the copying operation begins.

This stage is helpful in testing and debugging applications with large data sets. For example, the Partition property lets you see data from a single partition to determine if the data is being partitioned as you want it to be. The Skip property lets you access a certain portion of a data set.

Tail stage

Tail stage can have a single input link and a single output link. It helps you to get sample data.

The Tail Stage selects the last N records from each partition of an input data set and copies the selected records to an output data set.

You determine which records are copied by setting properties which allow you to specify:·

The number of records to copy.

The partition from which the records are copied

Peek stage

Peek stage can have a single input link and any number of output links.

The Peek stage lets you print record column values either to the job log or to a separate output link as the stage copies records from its input data set to one or more output data sets.

Row Generator stage

The Row Generator stage is a Development/Debug stage. It has no input links, and a single output link. The Row Generator stage produces a set of mock data fitting the specified Meta data. This is useful where you want to test your job but have no real data available to process. The Meta data you specify on the output link determines the columns you are generating.

For decimal values the Row Generator stage uses dfloat. As a result, the generated values are subject to the approximate nature of floating point numbers. --Not all of the values in the valid range of a floating point number are representable. The further a value is from zero, the greater the number of significant digits, the wider the gaps between representable values.

Column Generator stage

This stage contains a single input link and a single output link.

The Column Generator stage adds columns to incoming data and generates mock data for these columns for each data row processed. The new data set is then output.

Write Range Map stage

It allows you to write data to a range map. The stage can have a single input link. It can only run in sequential mode.

The Write Range Map stage takes an input data set produced by sampling and sorting a data set and writes it to a file in a form usable by the range partitioning method. The range partitioning method uses the sampled and sorted data set to determine partition boundaries.

A typical use for the Write Range Map stage would be in a job which used the Sample stage to sample a data set, the Sort stage to sort it and the Write Range Map stage to write the range map which can then be used with the range partitioning method to write the original data set to a file set.

Search This Blog

IBM DataStage Tutorial and Guide

Development & Debug Stages in DataStage

Comments

Post a Comment

Popular Posts

DataStage Architecture

DataStage File Stages