Table aggregation functions

A table aggregation function typically summarizes a table by reducing the number of rows. Consider for example the cost table below with the single key figure costs and with attributes category, product and year:

costs(category, product, year)
COSTS per category, product and year

Assume you want to summarize this table to obtain costs per category:
costs(category)

You can achieve such an aggregation in two steps with the available table functions in finaquant libraries:

Step 1: Column partition table to obtain subtable with fields costs and category:

// get subtable costs per category
tblr = MatrixTable.PartitionColumn(tblr,
	TextVector.CreateVectorWithElements("category", "costs")); 
MatrixTable.View_MatrixTable(tblr, 
"COSTS per category, before aggregation with SUM");

Costs per category, before aggregation

Step 2: Aggregate subtable using the default aggregation function SUM:

// aggregate subtable costs per category
tblr = MatrixTable.AggregateAllKeyFigures(tblr, null);
MatrixTable.View_MatrixTable(tblr, 
	"COSTS per category, after aggregation with SUM")

Costs per category, after aggregation

In some cases however, you may want to see detail costs (i.e. costs per category, product and year) in the same table together with costs per category. In that case, you need to produce a second key figure like costs_per_category without reducing the number of rows in the table.

Our starting point is again the initial cost table: costs(category, product, year)

// aggregate input key figure costs w.r.t. reference attribute "category"
// ... to produce a new output key figure "costs_per_category"
tblr = MatrixTable.AggregateSelectedKeyFigure_B(CostTable,
	RefAttributes: TextVector.CreateVectorWithElements("category"),
	InputKeyFigName: "costs", OutputKeyFigName: "costs_per_category",
	AggrOpt: AggregateOption.nSum, 
	IfSuccess: out IfSuccess, Warnings: out Warnings);

Table with new key figure costs_per_category

Notice that the table aggregation operation above didn’t reduce the number of rows in table. It added a new key figure costs_per_category which is the aggregation of the key figure costs with respect to attribute category. In this case, there is a single reference attribute, namely category. You could also select multiple reference attributes like category and year, to produce an aggregated key figure like costs_per_category_and_year.

In all of the aggregation examples above, we used the default aggregation function SUM. Other standard aggregation functions you can select form are MIN, MAX and AVERAGE.

In some advanced scenarios, you might need some aggregation functions other than the available standard ones sum, min, max or average. In such cases, you need to your own user-defined aggregation functions, and pass them to a table aggregation function as function parameter, as some examples show below.

  1. The aggregation patterns explained above can be summarized as follows:
    Aggregate all key figures of table w.r.t. all attributes without producing new key figures; number of rows are typically reduced.
  2. Aggregate a selected key figure of a table to produce a new aggregated key figure; aggregation can be done either w.r.t. all attributes, or w.r.t. selected attributes of table.
  3. The aggregations (per each key figure) can be done either with standard functions SUM, MIN, MAX or SUM, or with user-defined functions.


Aggregate all key figures w.r.t. all attributes with standard aggregation operations

In the example below, a table with two key figures, costs and weight, is aggregated. The standard aggregation function for costs is SUM, for weight is MAX. Note that the default aggregation function is always SUM if it is not explicitly indicated for any key figure.
Cost & Weight table before aggregation

// define aggregation operation per key figure
var dickey = new Dictionary<string, AggregateOption>();
dickey["costs"] = AggregateOption.nSum;
dickey["weight"] = AggregateOption.nMax;
// aggregate all key figures
tblr = MatrixTable.AggregateAllKeyFigures(tblr, dickey);
MatrixTable.View_MatrixTable(tblr, 
	"Cost/Weight table after aggregation");

Cost & Weight table after aggregation

Aggregate selected key figure w.r.t. all attributes with standard aggregation operation

In the example below, a new aggregated key figure costs_per_product is added to table.
Table before aggregation of selected key figure costs

// aggregate "costs" w.r.t. all attributes: category & product
tblr = MatrixTable.AggregateSelectedKeyFigure_A(tbl7,
	InputKeyFigName: "costs", OutputKeyFigName: "costs_per_product",
	AggrOpt: AggregateOption.nSum, IfSuccess: out IfSuccess, 
	Warnings: out Warnings);
// view table
MatrixTable.View_MatrixTable(tblr, 
"Table after aggregation of selected key figure costs");

Table after aggregation of selected key figure costs

Aggregate selected key figure w.r.t. selected attributes with standard aggregation operation

In this example, the selected key figure is aggregated not w.r.t. all attributes, but w.r.t. selected attributes category and product, leaving year out. The standard aggregation operation is SUM.
Table before aggregation of costs, RefAttributes: category & product

// aggregate "costs" w.r.t. selected attributes: category & product
// i.e. ignore attribute year for aggregation
tblr = MatrixTable.AggregateSelectedKeyFigure_B(tbl5,
	TextVector.CreateVectorWithElements("category", "product"),
	InputKeyFigName: "costs", OutputKeyFigName: "costs_per_product",
	AggrOpt: AggregateOption.nSum, IfSuccess: out IfSuccess, 
	Warnings: out Warnings);
// view table
MatrixTable.View_MatrixTable(tblr, 
"Table after aggregation of costs, RefAttributes: category & product");

Table after aggregation of costs, RefAttributes: category & product

Aggregate all key figures w.r.t. all attributes with user-defined aggregation operation

In this scenario, a non-standard aggregation operation is required; namely population variance. Therefore, the user needs to write her own aggregation function which takes a single vector parameter as input, and produces a scalar value as output, as shown below:

User defined aggregation function population variance

public static double PopulationVariance(KeyVector V)
{
	// check if vector is null or empty
	if (V == null || V.IsEmpty) 
		throw new Exception("Null or empty input vector!");
	// calculate mean (average value)
	Double Mean = V.toArray.Average();
	// calculate population variance
	Double VarSum = 0.0;
	for (int i = 0; i < V.nLength; i++)
	{
		VarSum = VarSum + Math.Pow(V[i] - Mean, 2.0);
	}
	return VarSum / V.nLength;
}

Writing such user-defined functions requires basic programming skills with some experience in the programming language C#. In my opinion, the best way to obtain the necessary skills is buying a beginner book for C#, like Introduction to C# Joes 2 Pros.

The example below shows how the population variance is used as a user-defined aggregation function. Note that the attribute year is excluded from the input table before aggregation. That is, the aggregation is done over the years.
Aggregate all key figures, before aggregation

// available standard delegate functions:
// - Utility.Aggregate_Sum
// - Utility.Aggregate_Avg
// - Utility.Aggregate_Max
// - Utility.Aggregate_Min
// default aggregation function is Utility.Aggregate_Sum
 
// set user defined function for key figure costs
var AggrFunc = new Dictionary<string, AggregateVector>();
AggrFunc["costs"] = UserFunctions.PopulationVariance;
// aggregate all key figures
tblr = MatrixTable.AggregateAllKeyFigures_UserFunc(tbl7, AggrFunc);
MatrixTable.View_MatrixTable(tblr, 
	"(1) Aggregate costs: Cost variance per product");

1) Aggregate costs: Cost variance per product

Aggregate selected key figure w.r.t. all attributes with user-defined aggregation operation

In this example, a new aggregated key figure cost_variation is generated with the same user-defined aggregation function population variation we have formulated above.

// aggregate "costs" w.r.t. all attributes
tblr = MatrixTable.AggregateSelectedKeyFigure_A_UserFunc(tbl7,
	InputKeyFigName: "costs", OutputKeyFigName: "cost_variation",
	AggrFunc: UserFunctions.PopulationVariance, 
	IfSuccess: out IfSuccess, Warnings: out Warnings);
// view table
MatrixTable.View_MatrixTable(tblr, 
	"(2) Aggregate costs: Cost variance per product");

2) Aggregate costs: Cost variance per product

Aggregate selected key figure w.r.t. selected attributes with user-defined aggregation operation

In this example, the selected key figure costs is not aggregated w.r.t. all attributes, but w.r.t. only selected attributes category and product. That is, the numeric attribute year is not considered for the aggregation. Again, population variance is selected as the user-defined aggregation function.

3) Aggregate costs: Cost variance per product - before aggregation

// aggregate "costs" w.r.t. selected attributes: category & product
// i.e. ignore attribute year for aggregation
MatrixTable.View_MatrixTable(tbl5, 
	"(3) Aggregate costs: Cost variance per product - before aggregation");
// aggregate selected key figure
tblr = MatrixTable.AggregateSelectedKeyFigure_B_UserFunc(tbl5,
	TextVector.CreateVectorWithElements("category", "product"),
	InputKeyFigName: "costs", OutputKeyFigName: "cost_variation",
	AggrFunc: UserFunctions.PopulationVariance,
	IfSuccess: out IfSuccess, Warnings: out Warnings);
// view table
MatrixTable.View_MatrixTable(tblr, 
	"(3) Aggregate costs: Cost variance per product - after aggregation");

3) Aggregate costs: Cost variance per product - after aggregation

Copyright secured by Digiprove © 2012 Tunc Ali Kütükcüoglu

2 Responses to Table aggregation functions

  1. swapnil says:

    I couldn’t understand how cost variation is calculated in example of “Aggregate all key figures w.r.t. all attributes with user-defined aggregation operation”.

    • tuncalik tuncalik says:

      Hello swapnil, you are perfectly right in not understanding the function, because a critical information was missing; namely the input table “before aggregation”. We added this input table to the page. Note that without the field “year” we obtain recurring attribute combinations like Computer-HP and Computer-Asus. The aggregation is done for each attribute pair. For example, we get zero variation for the attribute pair Computer-HP because its cost is the same (100) for both pairs.

Leave a Reply