Deutsch | English | ||||

## Multiple Regression Model
Suppose you want to create a mathematical model which is able to estimate the boiling points of chemical substances from their structural parameters. Such a model would have the benefit of being able to approximately know the boiling point of a substance without having physical access to it (even if the substance has not yet been synthesized, you can already estimate its boiling point). For that purpose we need a set of known data containing the structural parameters (which can be calculated from the chemical structure) and the corrsponding boiling points. Our sample data set contains the boiling points of 185 substances, each of which is characterized by 12 structural parameters. When creating the model one of the most important questions is to find out which of the 12 independent variables (structural parameters) are suited best to set up the model. DataLab offers the following variable selection methods: forward selection, backward elimination, stepwise regression, and the test of all possible combinations of independent variables. In order to perform the variable selection we call the command "Math/Multiple Linear Regression/Variable Selection" (button in the DataLab toolbar). Next, the variable "boil.point" has to be marked as the target variable and the selection mode has to be specified. A few seconds after clicking the "Start" button, the "best" model is indicated by a black bar at the right side of the dialog window. The listed submodels are characterized by a bunch of parameters which give us hints on the quality of the corresponding model. In our case the model using the variables 10,2,8,12 and 5 shows up with the best performance. The variables of this model can be copied to the MLR window (button ) in order to calculate the model: As one can see from the plot of the estimated values against the actual ones, the estimation of the boiling points based on the structural parameters works quite well. The standard deviation of the residuals comes close to 7.5°C. The details on the results of the multiple regression can be found in the protocol (button ): ============================================================ Multiple Linear Regression: d:\datalab\data\boilpts.idt ============================================================ Number of Objects .............: 185 Number of Input Variables .....: 5 Degrees of Freedom ............: 179 Target Variable ...............: [13] boil.point Mean of Target Values .........: 132.714054 Std.Dev. of Target Values .....: 48.223876 Mean of Calculated Values .....: 132.714054 Std.Dev. of Calc. Values ......: 47.660251 Standard Dev. of Residuals ....: 7.4533 Quality of Fit ................: 0.9768 Adjusted Quality of Fit .......: 0.9761 F-Statistic ...................: 1504.731 (p=0.0000) Durbin-Watson Statistic .......: 1.2748 ------------------------------------------------------- ANOVA DF sum of squares mean square F ------------------------------------------------------- Regression 5 4.17956E+05 8.35912E+04 1504.731 Residual 179 9.94385E+03 5.55522E+01 Total 184 4.27900E+05 ------------------------------------------------------- Regression coefficients: Col-# Var-Name Coefficient Std.Err.(coeff) t-Test alpha ------------------------------------------------------------------------ - INTERCEPT -7.0960574E+01 +/- 5.5103328E+00 -12.878 0.0000 10 RandicToz 7.6873275E+00 +/- 1.1242126E-01 68.380 0.0000 2 O-Atoms -1.3123226E+01 +/- 7.9273468E-01 -16.554 0.0000 8 n-Branch -4.6668763E+00 +/- 1.1711391E+00 -3.985 0.0001 12 Topo-J 7.2078089E+00 +/- 2.3775368E+00 3.032 0.0028 5 JHET -8.5553223E-01 +/- 3.4827518E-01 -2.456 0.0150 |
|||||