Sometimes, data extracted from systems will contain rows that are actually parent line items with child line items underneath.
For example, you might be working with a travel and expense system that lists a parent item like ‘Travel’ and then child items like ‘Car Mileage’, ‘Train Tickets’ and ‘Subsistence’ all in the same column. By using tMemorizeRows in Talend, you can identify and aggregate all of the child items to the parent level and get a sum of cost based on the dates the expenses were logged.
Business Requirements (Mocked up Travel and Expense data):
- Identify and aggregate all the child items to its parent level on a daily basis.
- The resultant data should be a sum of cost, aggregated daily to its parent level.
Expense Type (Child Items) | Parent |
Car Mileage | Travel |
Train Tickets | |
Subsistence |
Sample Dataset:
Talend Job:
Results
Before Aggregation (tLogRow_2)
The image below is a snap shot of the data before aggregation. The key thing to note here is the column ParentNum which uses the expression (see previous table) and the number it has assigned when “Travel” appears in the Expense_Type column. This is then used in the tAggregateRows component for grouping.
After Aggregation (Final Output)
The final output snap shot above shows the data after aggregation. For the purpose of this demo, I haven’t deleted ParentNum column as all the data has been grouped using this number assigned in this column.
The result aligns with the requirements which is to aggregate the data on a daily basis summing the costs of child items to its respective parent item (Travel).
0 Comments