That is, we want to compare Hispanic individuals, the reference group,

to individuals from other ethnic racial groups, the comparison groups,

on number of nicotine dependent symptoms after controlling for

the other explanatory variables in the model.

To do this, we will use the same GLM procedure

that we used to test our earlier multiple regression model.

However, this time we're going to add a line of code to tell SAS

that the ethnicity race explanatory variable is a categorical variable.

We do this using the class command.

We type class, then the name of our categorical explanatory variable,

then in parentheses we type ref, which tells SAS to use the reference group

parameterization for comparing our groups, then an equal sign, and,

in quotes, the group that we want to designate as our reference group.

In this example, we want to compare the Hispanic group to the other three

ethnic racial groups, so this will be our reference group.

If you remember, our ethnicity race variable is coded 0 for

Hispanic, so put the value 0 in quotes after the equal sign.

If we did not specify the reference group parameterization and

the reference group of interest, SAS, by default, would use reference group

parameterization and would designate the last group as the reference group.

So the SAS default would be to compare the ethnicity race

group representing other ethnicity or racial group.

This is because it had the highest numerically coded value of the four

groups, with a value equal to 3.

So SAS, by default, will consider it the last group of the explanatory variable.

If we use the default, it's important to know the default parameterization,

because it can be different for other SAS progression procedures and

will have an impact on how we interpret the group comparisons.

If we hadn't used a class command,

SAS would have assumed that our ethnicity race variable is a quantitative variable,

so the regression coefficient would make no sense.

Finally, we simply add our ethnicity race variable, which was named ethrace

to the list of explanatory variables in the model command.

Here's the output.

Basically, it is the same output that we see with the GLM procedure, but

if we look at our table of parameter estimates, we see that there

are three regression coefficients for a categorical ethnicity race variable.

Note first that our Hispanic reference group, coded 0,

has a regression coefficient of 0 and no estimate of the standard error or p-value.

This is because it is our reference group.

The other three regression coefficients compare our other

ethnicity race groups to the Hispanic group.

So, ethrace 1 with a value of 1 is the dummy code for

the non-Hispanic White group, compares non-Hispanic White to Hispanic,

ethrace 2 compares non-Hispanic Black to Hispanic,

and ethrace 3 compares the other non-Hispanic racial group to Hispanic.

We can see that none of these three groups were significantly different from

the Hispanic group in number of nicotine dependent symptoms,

because the p-values all exceed our alpha level of 0.05.

As with the previous regression analysis, we see that major life depression and

number of cigarettes smoked are positively associated with

number of nicotine dependent symptoms.

If we wanted to make other comparisons, for example, to compare non-Hispanic White

to non-Hispanic Black, then we would simply change our reference group from 0

to 1, from Hispanic to non-Hispanic White in the SAS code, and rerun the analysis.

This would provide a comparison of the three other ethnicity racial groups,

the non-Hispanic White group.

Here's an example of the code in which we change the reference group from 0 to 1.

You can see that it's the same code with the exception

of changing ref="0" to ref="1".

And here's the output.