Under usual conditions, this is the value of the dependent variable when all the independent variables are held at zero (that is the x-intercept). Yet here is a case where the other variables cannot be held at zero. Therefore, for his work, we will not give it a physical interpretation. The model also reveals that an additional 3-point shot per game should be expected to add some 2. 24 percentage points to the winning chance of a team in this group, when all the other variables are held constant.
When we did the marginal analysis we observed that there was a negative correlation, yet here we see a different case, as we have built a multiple regression model. An additional free throw per game will be expected to add 1. 76 percentage points, holding all the other variables in the model fixed; an additional opponent turnover per game will be expected to add 4. 14 and a home rebound per game will add 2. 67 percentage points. Furthermore, an additional turnover per game will rather reduce the winning percentage by 6. 22 percentage points, just as we observed in the marginal regression analysis.
An additional opponent rebound per game will be expected to reduce a teams winning percentage by 2. 96 percentage points, also confirming what we observed during the marginal analysis. The same goes for an additional opponent 3-point per game, as it will be expected to reduce the winning chance by 1. 72 percentage points. This also is in agreement with what we observed in the marginal analysis. From the R2 value we understand that this model accounts for 81. 1% of the variability in the percentage win of any given team in this group of teams. This clearly shows that our model is valid. The adjusted R2 is 78.
8%, and is so close to the unadjusted R2. This might be an indication that the model can be improved with more data or less predictor variables. The result also shows us that our analysis has a standard error of 0. 0747588. This means that we can be confident that about 95% of the time, we can correctly predict the winning percentage of a team in this group of teams by as close as (+ 2 x 0. 0747588) = +0. 1495 percentage points. Now from the descriptive statistics (see Appendix I), we observe that the dependent variable (Winning Percentage) has a standard deviation of 0. 1625, interquartile range of 0.
2696 ad a range of 0. 7154, we are convinced that our analysis has reduced the variability in this data. 6. 3 Verifying the Assumptions Regression analysis assumes normality in the variables and also in the error term. To verify this requires a plot of the residuals against each variable. The details of the plots are presented n Appendix V. From the Normal Probability Plot and the Histogram, we observe a normal distribution. The Normal probability Plot shows a potential outlier. However, the plots of the residuals versus the independent variables indicate conformity to the assumption of normality.
The general nature of the plots appears cyclical, except for some outliers. 6. 4 Model Improvement: Best Subsets Best subsets regression examines all possible models using selected numbers of the independent variables (starting with one at a time), and then chooses the two models with the highest value of R2. The Minitab printout is presented in Appendix VI. It appears at first glance that a model made of all seven variables has the highest R2 value, and even the least standard error (looking at the combinations that have four variables and more).
We note, however, that the difference between the R2 and adjusted R2 is 2. 3, signifying that there is room for improvement in our prediction. This difference is less in all the other combinations of four and more variables. We choose the variable combination that has five variables, and that has a standard error of 0. 07979, an R2 value of 77. 7 and adjusted R2 value of 75. 9. The difference R2 and adjusted R2 is 1. 8. This selected combination is indicated in Appendix VI with a rectangular dotted line around it.
When we run the analyses using these variables, we observed that there were some outliers in the residual plots. The regression details are presented in Appendix VII and the residual plots are presented in Appendix VIII. The regression equation obtained when we analyzed the five variables produced the following regression model: The regression equation is Winning percentage = 0. 528 + 0. 0250 3-point per game 0. 0631 Turn-over, pg + 0. 0471 Opponent Turn-over,pg + 0. 0349 Home rebound per game 0. 0336 Oppnt rebound per game S = 0. 0797903 R-Sq = 77. 7% R-Sq(adj) = 75. 9% As we said before, the constant value of 0.
528 cannot be interpreted physically, since it is not possible to hold all the independent variables in the model at zero value at the same time. We also observe that an additional turnover per game will produce a reduction in the percentage points per game for a team in this group of teams by 6. 3; and a similar effect will be observed for an additional opponent rebound per game, only this variable will reduce the percentage points per game by 3. 36.
Furthermore, an additional 3-point per game, opponent rebound per game and home rebound per game will add to the percentage points per game by 2. 5, 4. 71 and 3.49, respectively. Since S = 0. 07979, we can be confident to predict the winning percentage of a team in this group as close as (+2—0. 07979) = +0. 1596 percentage points, about 95% of the time.
This still presents a reduction in variability from the standard deviation of the winning percentage (0. 1625). The R2 value of 77. 7 thus obtained means that our analysis (our choice of the variables) actually accounts for 77. 7% of the variations in the winning percentage. The adjusted R2 value of 75. 9 tells us that the analysis may still be better with change in data size of number and choice of variables.