Introductory course on basic descriptive statistics

Introduction

In this course we will talk about how you can represent your data with an histogram, the meaning of the average and its limits.

Histograms - a definition

In statistics, a histogram is a graphical representation showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson. A histogram consists of tabular frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data.

The average

In mathematics, an average, or measure of central tendency, of a data set is a measure of the "middle" value of the data set. In the most common case, the data set is a list of numbers. The average of a list of numbers is a single number intended to typify the numbers in the list. If all the numbers in the list are the same, then this number should be used. If the numbers are not the same, the average is calculated by combining the numbers from the list in a specific way and computing a single number as being the average of the list.

The most well-known kind of mean is the arithmetic mean, and it's calculated by adding all the data and dividing the result by the number of observations.

Example: Tim, Bill and Ronald have 2$, 8$ and 6$ respectively. The average "wealth" is calculated by adding 2, 8 and 6 together (=16) and then by dividing this result by the number of values you added together (in this case it's 3). The result is 16/3 = 5.33.

Now explore these concepts by yourself

At the very end of this page you can generate an histogram by defining how many observations you made and then by defining the value for each observation. You can also define the minimum and the maximum theoretical value obtainable in your observations. The average is also displayed in red.

Try with this data:

This data is supposed to represent how many sandwiches were sold in a store during the day. Each value is the number of sandwiches sold per hour.

Input this data:

Food for thought: Looking at the histogram, can you see a pattern? Do you think staying open late is financially viable?

Comments about the average: Do you think the average is a good measure? Do you think that it tells us everything we need to know about our data?

To help you, try to use these two sets of data:

Questions: Were the two averages any different? How do you explain this result? Do you know which statistical tool can help us describe the difference between the two sets of data?


Define how many values you have:

Give a name to your histogram:


The content of this page was partially taken from: http://en.wikipedia.org/wiki/Histogram and http://en.wikipedia.org/wiki/Average