Chapter 19 main topics. Sociology 360 Statistics for Sociologists I Chapter 19 Two-Sample Problems. Chapter 19 homework assignment

January 1, 2018 | Author: Lynne Lillian Baldwin | Category: N/A
Share Embed Donate


Short Description

1 Sociology 360 Statistics for Sociologists I Chapter 19 Two-Sample Problems Chapter 19 main topics Two-sample t procedu...

Description

Chapter 19 main topics Two-sample t procedures Robustness of two-sample t procedures Details of the t approximation Avoid the pooled two-sample t procedures

Sociology 360 Statistics for Sociologists I Chapter 19 Two-Sample Problems

Avoid inference about standard deviations Topic to omit: The F test for comparing two standard deviations

1

2

Chapter 19 homework assignment

One- vs. two-sample t-tests

Problems: 19.6, .7, .9, .14, .16, .30, .32

One-sample test: Is a population mean >, !2 or Option 2: HA: !1 - !2 > 0

Let !bm = Mean calories in Big Macs

In each case, Option 1 is equivalent to Option 2.

Write the null and alternative hypotheses using both methods (option 1 and option 2)

One-tailed (left) Option 1: HA: !1 < !2 or Option 2: HA: !1 - !2 < 0

9

Sampling distribution of the difference in means

10

Sampling distribution of the difference in means

Our interest centers on the difference between the two population means, !1 - !2, which I will emphasize is a single numerical value by writing it within parentheses, like this: (!1 - !2). We can estimate (!1 - !2) by its sample analog, (x¯1 − x¯2) .

µG − µB = 0.4

Since (x¯1 − x¯2) is a number calculated only from sample information, it is a statistic. As a statistic, (x¯1 − x¯2) has a sampling distribution.

The sampling distribution of (x¯1 − x¯2) will be Normal under the right circumstances. And the mean of that sampling distribution will be (!1 - !2). All that remains to be discovered about the sampling distribution is its standard error (or estimated standard deviation).

11

12

Standard error

Degrees of freedom

The two-sample t statistic follows approximately the t distribution with a standard error SE reflecting variation from both samples.

Since we are using a standard error, estimated from the data, rather than a known standard deviation, the procedures will be t rather than z based.

In fact, its standard error is simply the square root of the sum of the standard errors of each sample considered separately:

SE =

!

That means we need to have a value for the degrees of freedom of the t distribution.

s21 s22 + n1 n2

A conservative approach is to use the smaller of (n1 - 1) and (n2 - 1) as the degrees of freedom. This rule is conservative in that it may give a value larger than is really appropriate, which leads to wider confidence intervals and larger P-values (meaning we are a bit less likely to reject H0).

df

You should use this rule for problems done by hand; for example, on the exam.

µ1"µ2 13

14

Two-sample t-test

Ideal number of children

The null hypothesis is that both population means !1 and !2 are equal,

Do men and women have different beliefs about the ideal number of children in a family?

thus their difference is equal to zero: H0: (µ1 − µ2) = 0

2004 General Social Survey asked,

with either a one-sided or a two-sided alternative hypothesis.

“What do you think is the ideal number of children for a family to have?”

We construct a t statistic via the usual comparison of the observed statistic to the hypothesized value:

Here is a summary of the responses:

(x¯1 − x¯2) − (µ1 − µ2)0 t= SE =

Gender

(x¯1 − x¯2) − 0 !2 s1 s22 n1 + n2



s

n

Male

2.58 0.89 374

Female

2.62 0.92 416

This statistic has an approximate t distribution if H0 is true. 15

16

Ideal number of children Gender



s

Confidence interval

n

Male

2.58 0.89 374

Female

2.62 0.92 416

As before, we often supplement a hypothesis test by a CI. (And sometimes we omit the test.) For two-sample problems, the question is to estimate the mean of the distribution of the difference scores in the population.

What are the null and alternative hypotheses? Choose an # level. Draw a picture of the sampling distribution and the p-value you are looking for.

The statistic continues to be

(x¯1 − x¯2)

and the confidence interval is

Perform the test and evaluate the result.

Note:

!

CI = (x¯1 − x¯2) ± t ∗

.892 .922 + = .064 374 416

!

s21 s22 + n1 n2

17

Effects of Reading Program on Reading Comprehension

95% CI for the example Gender



s

• New reading activities for elementary school children

n

Male

2.58 0.89 374

Female

2.62 0.92 416

18

• RA 3rd graders to treatment group and control group • Compare reading comprehension

df = min(373, 416) = 373 ∗ For C = .95, t373 ≈ z∗ = 1.96. ! s21 s22 CI = (x¯1 − x¯2) ± t ∗ + n1 n2 ! .892 .922 Note: + = .064 374 416

• Calculate a 95% CI for the effect of the new reading activities on reading comprehension Note: 19

!

11.012 17.152 + = 4.31 21 23 20

Robustness of the two-sample t procedures

Details of the t approximation

We must have an SRS or randomized comparative experiment.

The actual distribution of the “two-sample t statistic” is not really t (!).

t procedures are only exact if the population distribution is exactly normal.

But it is a distribution that can be very closely approximated by a t distribution with this number of degrees of freedom:

But, we will consider two-sample t procedures “good enough” approximations in these cases:

df =

1. When n1 + n2 < 15, the data from both samples must be close to normal (roughly symmetric, single peak) and without outliers. 2. When 15 " n1 + n2 < 40, mild skewness is acceptable, but not outliers.

1 n1−1

!

s22 s21 n1 + n2

! 2 "2 s1 n1

"2

1 n2−1

! 2 "2 s2 n2

This is known as the Satterthwaite approximation. The formula typically produces a non-integer degree of freedom value.

3. When n1 + n2 " 40, the t statistic will be valid even with strong skewness.

Computers routinely calculate this approximation. You should recognize it when you see it. But on exams, use the smaller of (n1 - 1) and (n2 - 1) instead. 21

22

Avoid the pooled two-sample t procedures

Avoid inference about standard deviations

Your textbook’s author, Moore, recommends completely avoiding the pooled two-sample t procedures, and I agree.

In an extension of the ideas behind not using the pooled t procedures, Moore also warns us not to try to make inferences about standard deviations at all, at least in smaller samples, and at least without expert statistical help.

Pooled procedures are often the default choice in stat packages (e.g., Stata, including the current version, 10.0). The reasons that the pooled approach is often used are: 1) it was historically easier to calculate; 2) it leads to a smaller estimated standard error when the assumptions are met; 3) it amounts to a special case of a very important technique called the analysis of variance. But Moore is right to emphasize: 1) the assumption of normality and equal variances can’t be tested effectively when the sample sizes are small (i.e., when the pooled procedure would be most advantageous); 2) the pooled procedure can lead to incorrect inferences when the assumptions aren’t met; 3) the reduction in SE’s is small for large n’s.

The problem is that it is hard to make a useful test of the hypothesis that the standard deviations in two populations are the same unless we are willing to assume the shapes of the two distributions are the same. (Things are even easier if we assume the shapes are normal.) But when the sample is small there is no easy way to tell if the shapes of two distributions are the same. So, says Moore, avoid testing of hypotheses that standard deviations are the same. My only reservation about this recommendation would be in cases where there are strong reasons to expect normality in both populations.

So you are asked to know not to accept a default assumption of equal (pooled) variances, and why not! 23

24

View more...

Comments

Copyright � 2017 SILO Inc.