A linear combination of two normally distributed variables is not necessarily normally distributed unless they are independently distributed.
Both test the association between two categorical variables. The difference is that the Chi-square test requires the expected cell counts in the crosstabulation of these two categorical variables to be larger than 5. When this assumption fails Fisher's exact test is recommended.
We begin by assuming there is no association between the two (categorical) variables. In technical terms this is called the null hypothesis. The alternative hypothesis would state the two variables are associated is some way.
The p-value of a Chi-square test or Fisher’s exact test tells us the likelihood of getting more extreme results than what we got. If our assumption is correct then a p-value of 0.01 would suggest the chance of getting more extreme results than we currently got is very small. In this case we have evidence to suggest our assumption of no association is not correct. Hence it would be reasonable to claim there is an association between the two variables.
What we usually do is compare the p-value with some pre-specified (a-priori) value which we call ∝ (the significance level). If p is less than ∝, we reject the null hypothesis and accept the alternative hypothesis.
Of course we could be wrong! It is possible to get very extreme results by chance even though two variables are not associated at all. This is called a Type I error. The probability of making a Type I error is less than ∝. Loosely speaking, the smaller the value of p, the stronger is the evidence to claim a significant association.
Common testing significance levels are 0.01, 0.05, and 0.10.
In the following example residuals (variable RESID) from a regression model are output to the dataset RESDAT via the OUTPUT statement in the GLM procedure.
To test for normality use the NORMAL option in the UNIVARIATE procedure. Note the residuals are read from the RESDAT dataset via the DATA option in this procedure.
proc glm; model Y=X; output out=RESDAT r=RESID; proc univariate data=RESDAT normal; var RESID; run;
If the p-value for the normality test is greater than 0.05 (your a priori probability) you may consider the residuals to be normally distributed.
Levene's test is widely considered to be the standard homogeneity of variance test. Use the GLM procedure and specify the HOVTEST option in the MEANS statement. In this example we test if the variances of A in each level of GROUP are equal.
proc glm; class GROUP; model A=GROUP; means GROUP /hovtest; run;
If p > 0.05 then equal variances may be assumed. Note that the GLM procedure allows homogeneity of variance testing for simple one-way models only. Homogeneity of variance testing for more complex models is a subject of current research.
Edit > Options > Viewer > Display commands in the log
Note that Yates’ corrected chi-square (continuity correction) may only be calculated for tables with two rows and two columns.
Analyze > Descriptive Statistics > Crosstabs > Statistics > Chi-square
In the output viewer Yates’ corrected chi-square is found in the Chi-square Tests table on the line labeled Continuity Correction.
In the output viewer, double-click the scatter plot to bring it into the chart editor. Choose Options from the Chart menu. Click on the box beside Total if it is not already checked. Click the Fit Options button and choose the Fit Method Linear regression.
Create 2 syntax documents and save them in the same location. One of the documents contains a macro calculating the correlations. The other document employs this macro via the INCLUDE command.
Suppose you have two sets of variables. One set comprises variables A and B. The other set contains variables C and D.
Run the following syntax
include 'CanCorrRoutine.sps'. CANCORR set1=A B /set2=C D.
which includes the syntax document CanCorrRoutine.sps defining the macro CANCORR.
Analyze > Scale > Reliability Analysis
and select Alpha as the model.