Aggregator Transformation in Informatica PowerCenter
Posted by Admin-Lisa Posted by Aug 25, 2023 in Informatica Powercenter Commonly Asked Questions Answers
The Aggregator transformation is an active and connected transformation.
The Aggregator transform is used to perform aggregate calculations such mean/average, First Count, Last, Maximum, Median, Minimum, Percentile, Standard Deviation, Sum, Variance.
The Integration Service performs aggregate calculations as it reads data groups and rows of data and stores them in the aggregate cache. The Aggregator transformation differs from the Expression transformation, but it is used with the Aggregator transformation to perform group calculations.
Filtering rows based on conditional clauses is more flexible than the SQL language.
Elements of the Aggregator Transformation
Aggregate cache
Aggregate expression
Group by port
Sorted input.
Aggregate cache: This Integration Service saves data in the aggregate cache until the aggregate calculation is complete. The Integration Service stores group values in the index cache and row data stored in the data cache.
Aggregate expression: Enter the expression to the output port. Expressions can also include non-aggregate expressions and conditional clauses for further processing.
Group by port: Specifies how groups are created. You can set the group's input, in-out, output, or variable port. When you group data, the Aggregator transformation outputs the last row of each group unless specified otherwise.
Sorted input: This option has been chosen to improve session performance of mapping. To use sorted input, user must pass data sorted in ascending or descending order by group and port to the Aggregator transformation.
Aggregate Caches:
When the user runs a session with an Aggregator transformation, the Integration Service builds indexes and caches data in memory to process these transformations. The Integration Service stores overflow values in cache files when it needs more disk space.
Users can configure the index and data cache in the aggregator transformation or session properties. Alternatively, users can configure the Integration Service to determine cache size at run time of job.
a. Index Cache:
The index cache stores group information from groups on a per-port basis. If users are using 'group by' on ports like 'EMP_ID', this index cache will store grouped port values like 1, 2, 3, etc.
b. Data Cache:
In general, the size of DATA-CACHE is larger than the size of INDEX-CACHE.
Information is stored
Variable port (if created)
I/O Port (Non-Group)
Ports using aggregate functions.
Do not group by input ports used in non-aggregate output expressions.
Aggregate Expressions.
Aggregate expressions are associated with conditional clauses and non-aggregate functions. Expressions can also include aggregate functions within other aggregate functions in mapping, such as MAX (COUNT (ITEM)).
The result of the aggregated expression depends on how the ports are grouped within the transformation. If you do not define grouping by port, the Integration Service returns only one row as output. The total quantity of products sold is determined as follows:
SUM(SALARY) to calculate the salary of all the employees.
Using the same expression concept and grouping by SALARY port, the Integration Service returns the total quantity of total salary paid to employee. You can create an aggregate expression on each output port and use multiple aggregate ports in your transformation.
You can use one aggregate function within other aggregate functions. If you use any of these functions, you must use them in an expression within the Aggregator transformation.
Aggregate function of Aggregator Transformation.
Mean (Average)
Min
Max
STDDEV
Median
Percentile
Variance
Count First
Last
Sums
Normal Aggregate Functions:
MAX (marks), MIN (marks), AVG (marks), SUM (marks), Count(students)
Nested Aggregate Functions:
Various output ports of the Aggregator transformation can contain multiple single-level functions or multiple nested functions. You must not include both single-level functions and nested functions in the Aggregator transformation.
If one output port of the Aggregator transformation contains a single-level function, another port within that transformation cannot be used as nested functions. If you include single-level functions and nested functions in the same Aggregator transformation, the mapping or mapplet is invalid.
Conditional Clauses:
Use of conditional clauses in aggregate transformation to reduce the no. of rows used in the aggregation. A conditional clause can be any clause that results in TRUE or FALSE.
Null Values in Aggregate Functions:
You must configure your integration service to handle null values in aggregate functions. Null valuesin aggregate functions can be treated as null or zero. By default, the Integration Service treats NULL values in aggregate functions as NULL.
This transformation (Aggregator) lets you define groups for aggregation instead of performing aggregation on all input data of the mappings. The Group By port tells the Integration Service how to create groups. Example: If you want to find out the total sales for your company, you can group the total sales by region.
To define groups of aggregate expressions, go to the Ports tab, select the appropriate input, input/output, output, and variable ports, and select them as groupings for the Aggregate transformation.
The Integration Service creates a row for each group. If you do not group the values, the Integration Service returns one row for each input row. The Integration Service normally returns the last row of each group.
Using Sorted Input
Use sorted input to improve session performance. When using sorted input, the input data must be sorted in ascending or descending order by group and port before passing it to the Aggregator transformation.
Using this sorted input option tells the aggregator transform that the incoming data is already sorted by group and port.
To use this option, go to the Ports tab and select Sorted Input.
If you enable the Sorted Input option and the input data is not sorted before submission. After this the session failed. You are suggested not to use sorted input if any of the following conditions apply:
Aggregate expressions use nested aggregate functions.
If The session uses incremental aggregation.
If you use sorted input and the data is not sorted correctly, the session will fail.
Sorted input condition cannot be used if any of the below mentioned case is true.
1. If nested function is used in an aggregate expression
2. if the session uses incremental aggregation.