Among all the techniques used in multivariate analysis, I am going just to mention
the linear regression, because it is the simplest technique used in a multivariate analysis.
And as a practical example of the use of linear regression,
let's suppose, that we want to explain consumers' willingness to pay for strawberries,
with lower greenhouse gas emission from production, in terms of - here the explanatory variables are:
consumption level of organic fruits, gender, education level and income level.
This is the mathematical formula of the model. The multivariate analysis in this case consists in estimating a linear regression model,
where respondent willingness to pay is the response variable,
so the response variable or the dependent variable in this case is
respondents willingness to pay for one kilogram of strawberries labeled as having
lower greenhouse gas emission and the explanatory variables or the independent variables are:
organic, education level, income level and gender.
The betas are the coefficient to be estimated. The coefficient corresponding to a quantitative explanatory variable
represents the effect on the response variable of an increase of the explanatory variable by 1 unit.
The interpretation is different in the case of dummy variables or categorical variables.
So, in the case of a dummy variables the coefficient is explained or interpreted as follows:
it represents the effect on the response variable of a change of the explanatory variable from the level coded as 0, to the level coded as 1.
So, in this table, I am displaying the results of the estimation of the linear regression model.
First, you should look at the fit of your model, given by the R-squared value.
It tells you how much of the variation of the response variable is explained by the explanatory variable, considered in the analysis.
Note that the closer the value of the R-squared to 1, the better the explicative power of the model, okay.
In our case here, it is 0.85, so it is a good fit of this model to the data.
The significance of the effect of each explanatory variable on the response variable,
is given by the probability value, that you can find in the fifth column of table 8.
The effect of an explanatory variable is significant, if the probability value of the test, which the t-test in this case, is lower than 0.05.
In this example, the results show, that only the variable consumption of organic foods and gender
have a significant effect on respondents willingness to pay for strawberries with lower greenhouse gas emissions.
So, only two variables have a significant effect on the dependent variables,
in this case, it is the willingness to pay or consumers or respondents willingness to pay for one kilogram of strawberries.
The estimated coefficient are interpreted as follows:
for the variable 'organic', the coefficient value of 1.466 indicates that
when respondents' intake of organic fruits increases by one kilogram, her/his WTP will increase by 1.466 pound.
For the variable 'gender', the coefficient value of 0.540, indicates that female respondents,
they were quoted in that data as 1, are willing to pay 0.540 higher for strawberries then male respondents.
This is how we interpret the coefficient corresponding to a dummy variable.
So, the effect that you can see that the effect of the rest of the explanatory variables
is not significantly different from 0, so they don't have a statistically significant effect on the response variable,
in this case is consumers WTP for strawberries, labelled as having lower greenhouse gas emission.
So, before concluding this lesson, I encourage you to answer this question:
the techniques that are commonly used in multivariate analysis in addition to the
linear regression, that I mentioned earlier, are nonlinear regression,
logistic regression and factorial analysis, okay. So these are three
examples of models, that you can use to conduct the multivariate analysis.