introduction to Statistics

Statistics is a mathematical science including methods of collecting, organizing and analyzing data in such a way that meaningful conclusions can be drawn from them. In general, its investigations and analyses fall into two broad categories called descriptive and inferential statistics.

Descriptive statistics deals with the processing of data without attempting to draw any inferences from it. The data are presented in the form of tables and graphs. The characteristics of the data are described in simple terms. Events that are dealt with include everyday happenings such as accidents, prices of goods, business, incomes, epidemics, sports data, population data.

Inferential statistics is a scientific discipline that uses mathematical tools to make forecasts and projections by analyzing the given data. This is of use to people employed in such fields as engineering, economics, biology, the social sciences, business, agriculture and communications.

Introduction to Population and Sample

A population often consists of a large group of specifically defined elements. For example, the population of a specific country means all the people living within the boundaries of that country.

Usually, it is not possible or practical to measure data for every element of the population under study. We randomly select a small group of elements from the population and call it a sample. Inferences about the population are then made on the basis of several samples.

Example 1: A company is thinking about buying 50,000 electric batteries from a manufacturer. It will buy the batteries if no more that 1% of the batteries are defective. It is not possible to test each battery in the population of 50,000 batteries since it takes time and costs money. Instead, it will select few samples of 500 batteries each and test them for defects. The results of these tests will then be used to estimate the percentage of defective batteries in the population.

Quantitative and Qualitative Data

Data is quantitative if the observations or measurements made on a given variable of a sample or population have numerical values.

Example: height, weight, number of children, blood pressure, current, voltage.

Data is qualitative if words, groups and categories represents the observations or measurements.

Example: colors, yes-no answers, blood group.

Quantitative data is discrete if the corresponding data values take discrete values and it is continuous if the data values take continuous values.

Example of discrete data: number of children, number of cars.

Example of continuous data: speed, distance, time, pressure.

The entire collection of events that you are interested in.

Although we wish to make claims about the entire population, it is often too large to deal with.

Two ways of getting around this …

Random Sampling

Choose a subset of the population ensuring that each member of the population has an equivelant chance of being sampled

Examine that sample and use your observations to draw inferences about the population

Example: Voting Polls, Television Ratings

Note, however, that the inferences drawn are only as good as the randomness of the sample

If the sample is not random, it may not be representative of the population. When a sample is not representative of its parent population, the external validity of any inferences is called into question.

Example: Most psychology experiments

Random Assignment

When studying the effects of some treatment variable, it is also important to randomly assign subjects to treatments

Random assignment reduces the likelihood that groups differ in some critical way other than the treatment

If random assignment is nor used then the internal validity of the experimental results may be compromised

Example: Text book manipulation across years

Variables

Assume we have a random sample of subjects that we have randomly assigned to treatment groups

Example: Stop-smoking study

Now we must select the variables we wish to study, with the term variable referring to a property of an object or event that can take on different values

Examples: # of cigs smoked, abstinance after one week

Note the distinction; # of cigarettes smoked is a continuous variable, whereas abstinance is a categorical variable

Another distinction related to variables concerns variables we measure (dependent variables) versus variables we manipulate (independent variables)

For Example: Whether or not we give a subject the stop-smoking treatment would be the independent variable, and the # of cigarettes smoked would be a dependent variable

What Do We Do With The Data?

Descriptive Statistics are used to describe the data set

Examples: graphing, calculating averages, looking for extreme scores

Inferential Statistics allow you to infer something about the the parameters of the population based on the statistics of the sample, and various tests we perform on the sample

Examples: Chi-Square, T-Tests, Correlations, ANOVA