AutoReg

Output data

The procedure provides a variety of tables as output. Here we'll explain what name they have and how to use them.
Almost all of them have a macro-prefix: it will be passed by the user as well as the input parameter in.

We can group them in four areas:

Input tables

&in table
&in is the table of input data. This table is not modified by the process.

&in._var table
&in._var table is directly generated from the same name input table: it can be different only in the column "utilizzo" because of the exclusion of variables with only one class or for the presence of new rows (result of columns generated in case of "K" variables).

&in._cond table
&in._cond table, as the previous one, derives from the same name input table: it differs only for the presence of new lines, when we have "K" type variables.

&in._esccon table
&in._esccon table is a copy of the same name input table and, also in this case, we can find new rows when we have "K" variables.

&passi table
&passi table (optional) is exactly the same table given in input.



Tables directly originated from input and regression

&in._pt table
&in._pt table is a copy of the input data: the program adds to it some columns used during classification.
In particular, the process generates new columns for K, O and X variables with the following criteria:

&in._dcorr table
&in._dcorr table is a copy of previous dataset: it differs only in some more columns derived from the final regression model.
There you find the value estimated by the model (column predetti) with its 95% confidence interval (columns inf and sup), the standardized Pearson residual (column residui) and the estimated value of the pre-link function (column xbet).

&in._mcorr table
&in._mcorr table contains estimates of model parameters.
The columns are:
    - Parameter, indicating the variable of the model,
    - Level1, which contains the class of the variable (for qualitative variables),
    - DF, which indicates the degrees of freedom for the parameter,
    - Estimate, which contains the estimate of parameter,
    - StdErr, which contains the standard error of estimate,
    - LowerWaldCL, indicating the lower limit of Wald confidence interval for the estimate,
    - UpperWaldCL, indicating the higher limit of Wald confidence interval for the estimate,
    - ChiSq, which contains the value of the Chi-square statistic used to determine the significativity of the parameter,
    - ProbChiSq, which contains the reciprocal of the cumulative distribution of Chi-square described above.

&in._smcorr table
&in._smcorr table contains some statistical indicators used to measure the goodness of the model (Log-Likelihood, AIC, ...).
The columns that compose this table are:
    - Criterion, which contains the calculated indicator,
    - DF, which contains the degrees of freedom for the indicator,
    - Value, which contains the value of the statistic,
    - ValueDF, which contains the value of the indicator divided by degrees of freedom.



Utility tables

&in._corr4 table
&in._corr4 table reports the correlation and pseudo-correlation values calculated between different variables with the indication of values that exceed the threshold set by the user (to determine whether two variables are correlated). Note that two variables analyzed using the derived value of Simpson concentration index will appear in this table only if they are correlated.
The columns of this table are:
    - v1, which contains the first variable to analyze,
    - v2, which contains the second variable,
    - corr, which is the correlation value calculated,
    - tipo_corr, which indicates the type of correlation used (for a decoding of this field, have a look of types of variables),
    - ut_v1, which contains indication about the type of the first variable,
    - ut_v2, which contains indication about the type of the second variable,
    - corr2, that is a dummy variable, set to 1 if the two variables are correlated (that is, if the calculated correlation is bigger than the threshold set by the user), set to 0 if the two variables are not correlated.

&in._kcl table
&in._kcl table contains numeric values used in grouping process of K variables.
These variables are originally character fields; the procedure converts them in numbers (classified by concentration) in order to process the new variable as an X or O type.
The columns are:
    - var_orig, indicating the variable,
    - cl_orig, which indicates the original value of the new variable,
    - cl_nuova, indicating the numerical value associated to the old character value.

&in._kvar table
&in._kvar table contains the details of categorized variables that were introduced in the model.
The columns of this table are:
    - var, containing the name of the column,
    - giro, indicating in which step the program did the compression,
    - kvar, listing the non-SAS category of the original value,
    - kvar_b, which indicates the starting value of the class in numeric format,
    - kvar_c, which contains the post-grouping class in numeric format,
    - kvar_d, containing the post-grouping class in natural language (non-SAS).

&in._mod table
&in._mod table contains the variables that were introduced in the model.
The columns of this table are:
    - nome, containing the name of the variable,
    - utilizzo, indicating how the column was used.

&in._po table
&in._po table contains a list of the variables that could be used in the model (probably not introduced because of low significativity).
The columns of this table are:
    - utilizzo, which describes the type of the variable,
    - nome, which contains the variable name,
    - po, that is a dummy: it is valued at 1 if the column is a potential new variable of the model, equal to 0 if this column cannot be introduced in the model.



Summary tables for post-modelization use

&in._passi table
&in._passi table contains the steps (stepwise-backwise) did by the engine to estimate the final model.
The columns of the table are:
    - passo, which identifies the step,
    - modello, which contains the alphabetical list of variables used in each step.
N.B.: Please note that variables in the modello column could be different from the variables given in input to the process, as explained previously.

&in._zgri table
&in._zgri table is useful to summarize the model to help in using it on new data.
You can find an example on the page with sample code.
The columns of this table are:
    - nome, which contains the variable name,
    - level1, which contains the category of the variable (valued only for qualitative variables),
    - kvar_d, which contains a non-SAS indication of category,
    - estimate, that is the estimated value of this variable,
    - condizione, which defines the SAS condition to identify the class in the dataset,
    - df, which indicates the degrees of freedom of this variable,
    - utilizzo, which describes the type of variable.
N.B.: in case of user-defined classes (set by user in the appropriate table), the conditions might not be univocal. For this reason, the order in this table is not random: the user's classes should be at the end of each group.






  Main index     Programs index     Autoreg index  
Vai alla versione Italiana

Creation date: 17 Sep 2010
Translation date: 30 Dec 2012
Last change: 18 May 2013

Translation reviewed by Giulia Di Lallo