in parameter has two tasks: it provides the engine with the name of input datasets and defines the prefix that will be used for the output datasets name.
The default value for this parameter is a: in this case, the expected input of the procedure will be the datasets a (data), a_cond (specific conditions), a_var (variables in the input dataset), a_esccon (conditional exclusion added by user).
taglio_correlazione parameter expresses the limit beyond which two variables are considered correlated.
Default value of the parameter is 0.4: in case the correlation (regardless of how it is calculated) has a value exceeding 40%, the two variables will be considered correlated. So, if one of them is already in the model, we won't use the other one.
At first, distribuzione parameter (default value binomial ) was introduced in order to make the macro code more generic.
The goal was to allow the user to choose within a set of distributions for the target variable.
So far the procedure does not allow (or, rather, has not been structured to) this option: we cannot exclude the availability of a wider distribution choice in further development of the code.
alfa parameter (default value 0.05 ) indicates the percentage threshold used in tests to determine the significativity of a model.
If we use the default value, tests will be accepted or rejected with a significance level of 5%.
passo parameter is used to determine the size (in percentage terms) of each class generated for the variables defined as K or X .
The default value for this macro is 10 , so, for this type of variables, the program tries to create new variables with classes each of which contains 10% of population.
With the exception of a particular concentration of values, the new variables will probably have 9-11 different modalities. If you want to have a better understanding of the use of this parameter, please read the relevant page.
log_0 parameter is used to redirect the sas summary log of the process to an external file.
By default this parameter is set to log : the summary log will be written in the standard sas log window.
output_log parameter is used to redirect the extended sas log to an external file. This file can be very heavy.
The default value is log : in this case, all the information is written in the standard sas log window.
max_giri parameter indicates the maximum number of stepwise/backwise cycles before the program ends. If this parameter is set to -11 or less, the program will not have a limit for the number of cycles; if the parameter is set to 0 (default value), this limit will be calculated as three times the number of potential variables specified by the user.
If we want to have a fixed limit, the parameter must be set to a positive number.
max_format parameter is used to determine the length of some columns that could be particularly large.
The variables formatted with max_format are:
passi parameter is used by the program to know the name of a possible table containing the variables to be included in exact sequence into the model.
The table must have the same structure of the output table &in._passi: please, note that the variables will not be forced to enter, but the program will only try to use it in the model before the others. Therefore, if these variables are not statistically relevant, they will not enter into the model.
If the macro variable is set to NIENTE (default value), the procedure does not facilitate any column.
simpson parameter is a dummy that indicates that we want to use concentration index of Simpson as pseudo-correlation (the process is described here).
By default the variable is set to 1: in this case, the index is used. If you want to disable it, just put the parameter to 0 .
|Vai alla versione Italiana
Creation date: 17 Sep 2010
Translation date: 30 Dec 2012
Last change: 18 May 2013
Translation reviewed by Giulia Di Lallo