AutoReg
Input data
The procedure examines as input a variety of tables. Here we will explain what name
they must have and how they must be structured.
All these datasets, with the exception of the first listed below (&in),
may be subject to updates during the process.
So, if you don't want to recreate it, we recommend to make a backup of the files before starting.
All tables have a macro prefix: this common root will be passed by user with the input parameter
in.
&in table
The &in table must contain the input data.
This data (column names and formats) must correspond to what is specified in the other tables.
By default, this table will be called a: the name can be changed simply by changing
the corresponding input parameter.
&in._var table
&in._var table is used to describe the columns in the input file and define their
uses and formats.
The structure results directly from the output file of sas proc contents.
The columns are:
- name, that indicates the column name in the input file (type variable: character);
- type, numeric column indicating what is the type of the variable (coded as 1 for numeric
and 2 for character);
- format, character column that indicates the format of the field (if present);
- formatl, numeric column indicating the length of the field
(whose value is 0 if the column does not have a specific format);
- formatd, numeric column indicating the number of decimal places of the field
(whose value is 0 if the column does not have a specific format or decimal numbers);
- utilizzo, indicating how the procedure will use the variable (type variable: character).
The last column is the only one not resulting directly from the sas proc contents.
Its value varies as described in the page on the types of variables accepted.
If the input parameter takes the default value a,
this table will be named a_var.
An example of this table (namely, estimating weight given height and age)
may be as follow:
name |
type |
format |
formatl |
formatd |
utilizzo |
First name |
2 |
|
0 |
0 |
i |
Surname |
2 |
|
0 |
0 |
i |
height |
1 |
|
5 |
0 |
x |
date_of_birth |
1 |
DATE |
9 |
0 |
x |
weight |
1 |
|
1 |
0 |
r |
&in._cond table
&in._cond table is used to identify user-defined classes of a variable that must be grouped by the process
(clearly, it must pertain to K, O and X variables).
In particular, the columns of the table (character) are:
- variabile, which identifies the column that will be grouped;
- condizione, which identifies the sas condition of a specific class;
- classe, which defines the name of the new class;
If the input parameter has a (default) as its value,
this table will be named a_cond.
An example of this table is:
variabile |
condizione |
classe |
height |
height <= 100 and height ^= . |
Under 100 |
height |
height >= 200 |
Over 200 |
... |
... |
... |
date_of_birth |
date_of_birth <= '01JAN1900'd |
Date of birth missing |
&in._esccon table
&in._esccon table is designed to allow the user to force the program so that it considers
two variables correlated: thus, the procedure adopts a policy of conditional exclusion of the variables
involved.
For example, in order to be sure that the variables height and date_of_birth
can't be used at the same time in the model, we'll insert a row like that:
var1 |
var2 |
height |
date_of_birth |
Therefore, if the variable height is added to our regression, then the procedure
will exclude the variable date_of_birth from the list of potential variables
in the next steps (and vice versa).
If the input parameter takes the default value a,
this table will be named a_esccon.
As in our example, the two columns (character) in the table are:
- var1, which identifies the first correlated column;
- var2, which identifies the second correlated column
&passi table
&passi table (defined by the corresponding input parameter)
indicates the preferred sequence of variables that the program uses to build the model.
This table is not mandatory: if the macro parameter passi is valued as
NIENTE (default value), the procedure does not use any table of "preferred variables".
The columns of the table are:
- passo, numeric variable that identifies the priority order;
- modello, which indicates the model at the end of each specific step;
This structure originates directly from the output table &in._passi:
so you can simply start from a pre-exhisting output to generate a new model.
In our example, the table could look like that:
passo |
modello |
1 |
cl_height |
2 |
cl_height cl_date_of_birth |
In this case, the program tries to insert the variable cl_height (derived from the variable height,
as described here) in the first step of regression
and then the variable cl_date_of_birth (derived from date_of_birth).
Only at this point the process considers other input variables.
For a practical example on the use of the tables described above, you can read the relevant page;
in particular, if you have any doubts on &passi table, you can look up its
specific section.
Creation date: 17 Sep 2010
Translation date: 30 Dec 2012
Last change: 17 May 2013
Translation reviewed by
Giulia Di Lallo