AutoReg

Choosing the number of classes for a variable to group

When we use X or O variables, we must know what is the computational weight for the machine.

The macro variable passo is the parameter with the strongest influence: the value given in input to the process expresses the range (in percentage) of each generated class.
The default value of the macro input variable is 10, so the program will try to create classes that contain 10% of the population: unless a specific concentration of values occurs, the new variable will probably have 9-11 different cases.

The algorithm first uses these classes individually, then it groups them so that it is possible to estimate the potential number of regressions as the number of classes changes (we can estimate such value in terms of the available class grouping according to the stepwise method til we reach a final, whole grouping, excluding any backwise steps).

If, for example, we have a single class (not a good thing in statistics), the procedure calculates only one regression because there aren't classes to merge or split.
In case of two classes (a and b), we have two regressions (the first, a-b, with two separated classes and the second a-a, after the compression of two classes in one).
If we decide to divide the variable into three classes, the number of potential regressions rises to four (a-b-c / a-a-c / a-b-b / a-a-a); the number increases to seven if the classes are four (a-b-c-d / a-a-c-d / a-b-b-d / a-b-c-c / a-a-a-d / a-a-c-c / a-a-a-a, with input parameter passo equals 25-30).

By increasing the number of modalities, the number of potential regressions increases as well; if we think of it as a function of the number of classes, we can notice that it follows this trend: f(n) = f(n-1) + (n-1), so the number of potential regressions for a variable with n classes is the number of potential regressions for a (n-1) variable classes plus (n-1).

The growth rate decreases while increasing the number of classes. This can be noticed by leaving the default value for the passo parameter, so that we have 46 regressions for each X or O variable at each step of the procedure.

In order to have a clearer idea of the numbers above, here find a summary table.

Num.Classes Val.Passo Potent.Reg.
1 100 1
2 50 2
3 34 4
4 25 7
5 20 11
6 17 16
7 15 22
8 13 29
9 11 37
10 10 46
11 9 56
12 8 67
13 8 79
14 7 92
15 7 106
16 6 121
17 6 137
18 6 154
19 5 172
20 5 191




  Main index     Programs index     Autoreg index  
Vai alla versione Italiana

Creation date: 17 Sep 2010
Translation date: 30 Dec 2012
Last change: 14 Apr 2013

Translation reviewed by Giulia Di Lallo