Development & Debug Stages in DataStage
Head stage
Head stage can have a single
input link and a single output link. It helps you to get sample data. The Head
Stage selects the first N rows from each partition of an input data set and copies
the selected rows to an output data set.
You determine which rows are copied by setting properties which allow you to
specify:·
The number of rows to
copy.
The partition from which the
rows are copied.
The location of the rows to copy.
The number of rows to skip
before the copying operation begins.
This stage is helpful in
testing and debugging applications with large data sets. For example, the
Partition property lets you see data from a single partition to determine if the
data is being partitioned as you want it to be. The Skip property lets you
access a certain portion of a data set.
Tail stage
Tail stage can have a single
input link and a single output link. It helps you to get sample data.
The Tail Stage selects the last
N records from each partition of an input data set and copies the selected
records to an output data set.
You determine which records are
copied by setting properties which allow you to specify:·
The number of records to
copy.
The partition from which the
records are copied
This stage is helpful in
testing and debugging applications with large data sets. For example, the
Partition property lets you see data from a single partition to determine if
the data is being partitioned as you want it to be. The Skip property lets you
access a certain portion of a data set.
Peek stage
Peek stage can have a single
input link and any number of output links.
The Peek stage lets you print
record column values either to the job log or to a separate output link as the
stage copies records from its input data set to one or more output data sets.
Row Generator stage
The Row Generator stage is a
Development/Debug stage. It has no input links, and a single output link. The
Row Generator stage produces a set of mock data fitting the specified Meta
data. This is useful where you want to test your job but have no real data
available to process. The Meta data you specify on the output link determines
the columns you are generating.
For decimal values the Row
Generator stage uses dfloat. As a result, the generated values are subject to
the approximate nature of floating point numbers. --Not all of the values in
the valid range of a floating point number are representable. The further a
value is from zero, the greater the number of significant digits, the wider the gaps between representable
values.
Column Generator stage
This stage contains a single
input link and a single output link.
The Column Generator stage adds
columns to incoming data and generates mock data for these columns for each
data row processed. The new data set is then output.
Write Range Map stage
It allows you to write data to
a range map. The stage can have a single input link. It can only run in
sequential mode.
The Write Range Map stage takes
an input data set produced by sampling and sorting a data set and writes it to
a file in a form usable by the range partitioning method. The range partitioning method uses
the sampled and sorted data set to determine partition boundaries.
A typical use for the Write
Range Map stage would be in a job which used the Sample stage to sample a data
set, the Sort stage to sort it and the Write Range Map stage to write the range map which can then
be used with the range partitioning method to write the original data set to a
file set.
Comments
Post a Comment