Lookup Stage in DataStage
The Lookup stage is a processing stage. It is used to perform lookup operations on a data set read into memory from any other Parallel job stage that can output data.
The Lookup stage is most appropriate when the reference data for all lookup stages in a job is small enough to fit into available physical memory. Each lookup reference requires a contiguous block of shared memory. If the Data Sets are larger than available memory resources, the JOIN or MERGE stage should be used.
Lookup stages do not require data on the input link or reference links to be sorted. Be aware, though, that large in-memory lookup tables will degrade performance because of their paging requirements.
Normal lookup: All the data from the database is read into memory, and then lookup is performed.
Sparse lookup: For each incoming row from the primary link, the SQL is fired on database at run time.
Range Lookup: Range Lookup is going to perform the range checking on selected columns.
For Example: -- If we want to check the range of salary, in order to find the grades of the employee than we can use the range lookup.
Lets look at the example shown below.
Source
Emp ID EmpName Dept
1001 John IT
1002 Russ IT
1003 Marie BS
Reference
Emp ID Salary Dept Quarter
1001 2000 IT Q1
1001 3000 IT Q2
1001 4000 IT Q3
Now if you use the lookup stage the with Emp iD as the key then the output would be as below
EMp ID Salary Dept EmpName Quarter
1001 2000 IT John Q1
But if you have a closer look at the data we can see that the reference table actually has three records for that ID. However your lookup stage actually only retrieved the one record. Now if you need to retrieve all 3 records for that ID then you will have to Go to -> constraints page of the lookup stage -> ‘Multiple rows returned from link’ Select the reference link
This will modify your output as below
EMP ID Salary Dept EmpName Quarter
1001 2000 IT John Q1
1001 3000 IT John Q2
1001 4000 IT John Q3
Only one reference link in the lookup stage can return multiple rows. This can’t be done for more than one reference link and can only be done for in-memory lookups
The Lookup stage is most appropriate when the reference data for all lookup stages in a job is small enough to fit into available physical memory. Each lookup reference requires a contiguous block of shared memory. If the Data Sets are larger than available memory resources, the JOIN or MERGE stage should be used.
Lookup stages do not require data on the input link or reference links to be sorted. Be aware, though, that large in-memory lookup tables will degrade performance because of their paging requirements.
Normal lookup: All the data from the database is read into memory, and then lookup is performed.
Sparse lookup: For each incoming row from the primary link, the SQL is fired on database at run time.
Range Lookup: Range Lookup is going to perform the range checking on selected columns.
For Example: -- If we want to check the range of salary, in order to find the grades of the employee than we can use the range lookup.
Lets look at the example shown below.
Source
Emp ID EmpName Dept
1001 John IT
1002 Russ IT
1003 Marie BS
Reference
Emp ID Salary Dept Quarter
1001 2000 IT Q1
1001 3000 IT Q2
1001 4000 IT Q3
Now if you use the lookup stage the with Emp iD as the key then the output would be as below
EMp ID Salary Dept EmpName Quarter
1001 2000 IT John Q1
But if you have a closer look at the data we can see that the reference table actually has three records for that ID. However your lookup stage actually only retrieved the one record. Now if you need to retrieve all 3 records for that ID then you will have to Go to -> constraints page of the lookup stage -> ‘Multiple rows returned from link’ Select the reference link
This will modify your output as below
EMP ID Salary Dept EmpName Quarter
1001 2000 IT John Q1
1001 3000 IT John Q2
1001 4000 IT John Q3
Only one reference link in the lookup stage can return multiple rows. This can’t be done for more than one reference link and can only be done for in-memory lookups
Comments
Post a Comment