A MATLAB program related to Regression Models.
Regression analysis is a statistical method for examining the relationships between a dependent variable and independent variables.
Since it is not possible to find out the exact model which explains the relationships between a dependent variable and independent variables, I would like to simulate several regression models and find out the better model based on residual analysis and influence diagnostics.
Definition of residual: Residual is the vertical distance between the value on the true regression model (unknown) and the fitted value on the regression line.
To study the residuals, we have to check whether the residuals match the following assumptions:
- Constant (Zero) mean
- Constant variance
Definition of influence diagnostics: Influence diagnostics are the measures to determine if an observation is an influential point, leverage point or outlier.
- Leverage
- Cook’s Distance
- Difference in Fitted Values (DFFITS)
- Studentized Residuals
fishdata.txt
contains the data of fish based on its length, height, width, species and weight.
- Length: Length of a fish (cm)
- Height: Height of a fish (cm)
- Width: Width of a fish (cm)
- Species: Species of a fish (from 1 to 6)
- Weight: Weight of a fish (gram)
According to my program, I extract a total of 201 observations (665th-865th) from the data fishdata.txt
,
then simulate 3 different regression models and perform residual analysis.
The regression models are based on the following dependent and independent variables:
- Dependent (Response) variable:
Weight
- Independent (Explanatory) variables:
Length
,Height
,Width
,Species