Lesson 5: Confidence Intervals

So far, we learned how to collect and summarize data (Lesson 1). Then we learned how to quantify the likelihood of events using probability (Lesson 2). Next, we learned how to model these events as random variables (Lesson 3). In the previous Lesson, we learned how to find the sampling distributions of sample statistics (Lesson 4).

In Lesson 4, the sampling distributions for the sample statistics assumed we knew the population parameters (fantasy land). In real life, we do not know these parameters (or we would not need statistics!). In this lesson, we switch from "fantasy land" to real life. We know what to do when the parameters are known, let's see how we can use that information when they are unknown.

Objectives

Upon successful completion of this lesson, you should be able to:

Describe the role of statistical inference in estimation in terms of the population and sample.
Explain the general form of a confidence interval and apply it to different statistics and conditions.
Construct a confidence interval to estimate a population mean or proportion.
Given a confidence interval, interpret the meaning in terms of the population.
Identify when to use the t-distribution as opposed to the normal distribution given the sample size and population distribution.
Define and interpret the margin of error.
Given the population standard deviation and a confidence level, calculate the required sample size needed to obtain the desired margin of error.

5.1 - Introduction to Inferences

The real power of statistics comes from applying the concepts of probability to situations where you have data but not necessarily the whole population. The results, called statistical inference, give you probability statements about the population of interest based on that set of data.

Types of Statistical Inference

There are two types of statistical inferences: Estimation and Statistical Tests.

Use information from the sample to estimate (or predict) the parameter of interest.

For instance, using the result of a poll about the president's current approval rating to estimate (or predict) his or her true current approval rating nationwide.

Use information from the sample to determine whether a certain statement about the parameter of interest is true. Statistical tests are also referred to as hypothesis tests.

For instance, suppose a news station claims that the President’s current approval rating is more than 75%. We want to determine whether that statement is supported by the poll data.

5.2 - Estimation and Confidence Intervals

Estimation

Two common estimation methods are point and interval estimates.

An estimate for a parameter that is one numerical value. An example of a point estimate is the sample mean or the sample proportion.

Interval estimates give an interval as the estimate for a parameter. This is a new concept which is the focus of this lesson. Such intervals are built around point estimates which is why understanding point estimates is important to understanding interval estimates.

In this course, the interval estimates we find are referred to as confidence intervals.

An interval of values computed from sample data that is likely to cover the true parameter of interest.

There are many estimators for population parameters. For example, if we want to know the "center" of a distribution, why use the mean? Could we use the median? How about using the middle value, i.e. (max+min)/2? We choose particular estimators for various reasons with information based on their sampling distributions. Here are some properties of "good" estimators.

Properties of 'Good' Estimators

In determining what makes a good estimator, there are two key features:

The center of the sampling distribution for the estimate is the same as that of the population. When this property is true, the estimate is said to be unbiased. The most often-used measure of the center is the mean.
The estimate has the smallest standard error when compared to other estimators. For example, in the normal distribution, the mean and median are essentially the same. However, the standard error of the median is about 1.25 times that of the standard error of the mean. We know the standard error of the mean is \(\frac>\). Therefore in a normal distribution, the SE(median) is about 1.25 times \(\frac>\). This is why the mean is a better estimator than the median when the data is normal (or approximately normal).

Note!

We should stop here and explain why we use the estimated standard error and not the standard error itself when constructing a confidence interval. The answer is because, typically, the population values are not known. Take, for example, the standard error of the sample proportion. It is.

If the goal is to estimate \(p\) and \(p\) is unknown, we would also then have to estimate the standard error. In this case the estimated standard error is.

For the case for estimating the population mean, the population standard deviation, \(\sigma\), may also be unknown. When it is unknown, we can estimate it with the sample standard deviation, s. Then the estimated standard error of the sample mean is.

General Format of a Confidence Interval

In putting the two properties above together, the center of our interval should be the point estimate for the parameter of interest. With the estimated standard error of the point estimate, we can include a measure of confidence to our estimate by forming a margin of error.

This you may have readily seen whenever you have heard or read a sample survey result (e.g. a survey of the current approval rating of the President, or attitude citizens have on some new policy). In such surveys, you may hear reference to the "44% of those surveyed approved of the President's reaction" (this is the sample proportion), and "the survey had a 3.5% margin or error, or ± 3.5%." This latter number is the margin of error.

With the point estimate and the margin of error, we have an interval for which the group conducting the survey is confident the parameter value falls (i.e. the proportion of U.S. citizens who approve of the President's reaction). In this example, that interval would be from 40.5% to 47.5%.

This example provides the general construction of a confidence interval:

General form of a confidence interval \(sample\ statistic \pm margin\ of\ error\)

The margin of error will consist of two pieces. One is the standard error of the sample statistic. The other is some multiplier, \(M\), of this standard error, based on how confident we want to be in our estimate. This multiplier will come from the same distribution as the sampling distribution of the point estimate; for example, as we will see with the sample proportion this multiplier will come from the standard normal distribution. The general form of the margin of error is shown below.

General form of the margin of error \(\text=M\times \hat(\text)\)

*the multiplier, \(M\), depends on our level of confidence

Interpretation of a Confidence Interval

The interpretation of a confidence interval has the basic template of: "We are 'some level of percent confident' that the 'population of interest' is from 'lower bound to upper bound'. The phrases in single quotes are replaced with the specific language of the problem. We will discuss more about the interpretation of a confidence interval after we provide a few more examples.

Note!

Some might say, "Why not just be 100% confident?", but that does not make practical sense. For instance, what value comes from me saying I am 100% confident that the approval rating for the President is from 0% to 100%. That is the only interval in which one can be truly confident will capture the actual proportion. Similarly, if you were to ask your professor what they think your score will be on an exam and they reply, "zero to one hundred", what would you think of that answer?

However, one does want to be as confident as reasonably possible. Most confidence levels use ranges from 90% confidence to 99% confidence, with 95% being the most widely used. In fact, when you read a report that includes a margin of error, you can usually assume this has a 95% confidence attached to it unless otherwise stated.

Moving forward.

We're going to begin exploring confidence intervals for one population proportions. The important issue of determining the required sample size to estimate a population proportion will also be discussed in detail in this lesson.

5.3 - Inference for the Population Proportion

Earlier in the lesson, we talked about two types of estimation, point, and interval. Let's now apply them to estimate a population proportion from sample data.

Point Estimate for the Population Proportion

The point estimate of the population proportion, \(p\), is:

Point Estimate of the Population Proportion

From our previous lesson on sampling distributions, we know the sampling distribution of the sample proportion under certain conditions. We can use this information to construct a confidence interval for the population proportion.

Confidence Interval for the Population Proportion

If \(np\) and \(n(1-p)\) are greater than five, then \(\hat

\) is approximately normal with mean, \(p\), standard error \(\sqrt>\).

Under these conditions, the sampling distribution of the sample proportion, \(\hat

\), is approximately Normal. The multiplier used in the confidence interval will come from the Standard Normal distribution.

5.3.1 - Construct and Interpret the CI

Constructing a Confidence Interval for the Population Proportion

To construct a confidence interval we're going to use the following 3 steps:

CHECK CONDITIONS Check all conditions before using the sampling distribution of the sample proportion. We previously used \(np\) and \(n(1-p)\). But \(p\) is not known. Therefore, for the confidence interval, we will use
- \(n\hat
  >5\) and
- \(n(1-\hat
  )>5\)

What can one do if the conditions are NOT satisfied?

For a confidence interval for a proportion, there is a technique called exact methods. These methods can be used if the software offers it. These exact methods are more complicated and are based on the relationship between the binomial and another distribution we will later learn called the F-distribution. The Z-method is much simpler and fairly easy to compute. In fact if you ever come across a published random survey (e.g. a Gallup poll) you can use the methods in this lesson to construct a reliable proportion confidence interval rather quickly.

\(\boldsymbol<\left(1-\alpha \right) 100\%>\) confidence interval for the population proportion, \(\boldsymbol

\) \(\hat

\pm z_\sqrt<\dfrac<\hat

(1-\hat

)>>\) where \(z_\) represents a z-value with \(\alpha/2\) area to the right of it.

The \(\pm\) in the formula above means "plus or minus". It is a shorthand way of writing \((\hat
-z_\sqrt<\frac<\hat
(1-\hat
)>>, \hat
+z_\sqrt<\frac<\hat
(1-\hat
)>>)\)
It is centered at the point estimate, \(\hat
\).
The width of the interval is determined by the margin of error.
You must determine the multiplier.

Think about it! What terms in the margin of error would change the width of the confidence interval? Do the changes make it narrower or wider?

Derivation of the Confidence Interval

To calculate the confidence interval, we need to know how to find the z-multiplier. So where does this \(z_\) come from?

The confidence interval can be derived from the following fact:

The figure shows the general confidence interval on the normal curve.