Remove Duplicates Stage in DataStage
The Remove Duplicates stage is a processing stage. It can have a single input link and a single output link. The Remove Duplicates stage takes a single sorted data set as input, removes all duplicate rows, and writes the results to an output data set.
Input data should be sorted for this stage so that all records of having similar key values will be adjacent. Instead of adding separate ‘Sort stage’, we can perform ‘Link Level Sort’.
Remove Duplicates stage: Properties tab
Key - Specifies the key column for the operation. This property can be repeated to specify multiple key columns.
Remove Duplicates stage: Options category
Duplicate to retain - Specifies which of the duplicate columns encountered to retain. Choose between First and Last. It is set to First by default.
Consider below input data
ID Name
10 Joe
11 Marsh
12 Shawn
10 Joe
10 Roger
Step 1: Design job structure as shown below.
Step 2: Sort the data on ID column in sort stage
Step 3: Double click on Remove duplicate stage and define
key = ID and Duplicate to retain = First.
Step 4: Map all the required output column under ‘Output’
tab in Remove duplicate stage.
You will see the below output:
ID Name
10 Joe
11 Marsh
12 Shawn
Duplicate entries will get removed.
How to remove duplicates without using remove duplicate
stage in DataStage?
1. Using hash file stage (Specify the keys and check the
unique checkbox, Unique Key is not allowed duplicate values)
2. Using a sort stage,set property: ALLOW DUPLICATES :false
2. You can do it at any stage. Just do a hash partion of the
input data and check the options stable Sort and Unique.
Ruby on rails online training
ReplyDeleteAbinitio online training
Android online training
Ansible online training
ReplyDeleteI have found great and massive information on
Workday payroll training
Workday financials training
Workday training
Python online training
Mulesoft training
Servicenow Online training
Java training
Salesforce training
Mulesoft Online Training
which helps you also. Thanks for Posting
selenium trainings
ReplyDeletejava training
mulesoft training
etl testing training