Guided Tour: Multiple Linear RegressionTo start with multiple linear regression (MLR), let's first load the data file BOILPTS.IDT. You may remember from a previous section that this file contains 185 objects of 13 variables each. The data describe the normal boiling points of 185 chemical compounds and some structural descriptors of these compounds. Now let's try to find an answer to the question whether it is possible to estimate the boiling points from the structural descriptors by using MLR. For a first trial just select the command Math/Multiple Linear Regression/Calculate Model... (toolbar button ). The window that appears provides you with several command buttons, among which is the command Calculate which is only enabled if you have selected both the independent variables and the target variable. Let's perform a first attempt by selecting as an example the variables 4, 6, and 8 (nHetAt, toporad, and nbranch) as input variables, and variable 13 (the boiling points) as target variable. For that purpose first click into the list of descriptors and select the desired variables. Next, click the "Dependent Variable" field and select the boiling point as the target variable. In order to calculate the regression press the "Calculate" button. The results are displayed in three switchable windows:
You might wonder how to find out the best combination of variables, since the number of possible combinations is quite large in our example (in general there are 2^{p}1 combinations for p independent variables, which results in 4095 combinations in our particular case). In principle, there are several ways of selecting a more or less adequate combination of variables: e.g. stepwise regression, backward elimination, forward selection, or just trying all possible combinations. DataLab provides all of these methods; use the command Math/Multiple Linear Regression/Variable Selection or the toolbar button to start the variable selection process. Now, try to start the forward selection. For that purpose specify the target variable by ticking off the variable 13 (boil. point) in the third column. After clicking the start button a list of submodels is displayed. The best model is indicated by a black bar. This model uses the variables 10,2,8,12, and 5 as independent variables. Now click the button in order to copy the selected variables into the MLR window, and start the regression calculation once again. The new model delivers much improved results showing a standard deviation of the residuals of 7.45°C and a coefficient of determination of 0.9767.

