# Sampling

Sampling can be grouped as probability sampling or non-probability sampling. In probability sampling, every person in a population has an equal chance of being chosen; for instance, randomly querying people to gauge public opinion or collecting random patients from hospitals across the country to participate in a cancer study. In non-probability sampling, not everybody in the population of interest has an equal chance of being chosen; for instance, calling up one's own relatives for their opinions or selecting study participants from the researcher's own clinic.

Non-probability sampling is a haphazard approach that imbues a bias and prevents the sample statistics from being extrapolated to the population. However, oftentimes financial or practical constraints mean that non-probability samples are the only available option. For instance, it may be impractical to scour every hospital to find patients with a particular type of cancer, or prohibitively expensive to go door-to-door to poll people who do not respond to telephone polls. In this case, the non-probability sample is usually treated like probability samples extrapolate to the population.

## Simple random sampling

A simple random sample is simply randomly choosing participants from a population. Dr. Martin Lee provides the example of an aspirin plant. To test for quality control, each bottle that is made is given a number, beginning at 0000 until 6000 is reached; at this point, the study starts over. A random number generator with a range of 0000 to 6000 is used to choose which samples are drawn from the 6000 bottles. The samples are then studied for quality control.

Simple random sampling works well for a homogeneous population. Sampling without replacement means that as samples are taken, their number position is left empty. In sampling with replacement, the number spot for each sample taken is then filled with a new sample.

## Stratified random sampling

Dr. Lee continues his example by posing the question: what if the population of aspirin bottles is not homogeneous? Consider that there are three shifts: the morning shift outputs the best aspirin, while the evening shift makes some mistakes and the overnight shift makes many mistakes. A small simple random sample overlooks recognizable subdivisions (aka *strata* or *subpopulations*). These strata are internally homogenous, but when compared to one another they are heterogeneous.

A stratified random samples takes an equal number of samples from each stratum. The population is subdivided into strata that are homogeneous within each stratum and heterogeneous between strata, then simple random sampling is performed on each stratum. Stratified random sampling can be more useful than simple random sampling, for instance allowing statisticians to subdivide populations by political party when studying presidential approval.