First of all, the procedure is based on logistic regression. Despite the great potential of this method,
this choice is a limitation for the process: you can adjust this restriction by improving
the code or you can wait for a future release.

The program is developed in some steps, whose core is regression.

Here is the process step by step, with reference to the code as it appears in the
corresponding page (in particular, line numbers are
quoted from the pdf).

*Utility macros*

- Step 0 - Definition of utility macros (lines 25 - 200):
- definition of
macro, used in the code for the calculation of Simpson pseudo-correlation index (lines 25 - 130)*simpson_c* - definition of
macro, used to statistically compare two different models (lines 130 - 200)*mod_b_meno_a*

- definition of

- Step 1 - Loading and controlling parameters and
input data (lines 200 - 560):
- the program verifies the existence and validity of the information given in input by the user.

- Step 2 - Management and reorganization of input information (lines 560 - 1370):
- data is copied in order to preserve the integrity of the starting dataset (lines 560 - 603)
- nominal variables 'K' are handled (lines 603 - 850)
- ordinal variables 'O' are handled (lines 850 - 1030)
- 'X' variables (that will be categorized) are handled (lines 1030 - 1275)
- single mode variables (statistically irrelevant) are erased (lines 1275 - 1370)

- Step 3 - Correlation analysis (lines 1370 - 1700):
- the correlation between pairs of variables is analyzed with the scheme presented here (the matrix below is a summary of correlation index used).

**O (Ordinal var.)****X (Numeric var. to compact)****Q (Quantitative var.)****C (Qualitative var.)****O (Ordinal var.)**Spearman (S) Spearman (S) Spearman (S) Simpson (C) **X (Numeric var. to compact)**Spearman (S) Pearson (P) Pearson (P) Simpson (C) **Q (Quantitative var.)**Spearman (S) Pearson (P) Pearson (P) Simpson (C) **C (Qualitative var.)**Simpson (C) Simpson (C) Simpson (C) Simpson (C)

- Step 4 - Regression (lines 1700 - 3445):
- variables and tables used in estimation process are inizialized (lines 1700 - 1810)
- all variables that potentially could be used (all variables that are not already part of the model and which are not correlated to other variables already in the model) are tested to be introduced into the model. The variable that gives the most successful model, if statistically significant, is inserted into the model (stepwise - lines 1810 - 2980)
- the variables of the model are eliminated one at a time, so we can take the top performing model (if it is not statistically different from the previous model) (backward - lines 2980 - 3393)
- ending of regression cycle (lines 3393 - 3445)

- Step 5 - Cleaning of the system and writing on output files (lines 3445 - 3960):
- cleaning up of temporary tables and creating output files to use the model

Main index | Programs index | Autoreg index |

Vai alla versione Italiana |

* Creation date: 17 Sep 2010 *

* Translation date: 30 Dec 2012 *

* Last change: 17 May 2013 *

* Translation reviewed by
Giulia Di Lallo*