Part of the reason why we analyse data is to see patterns. It is difficult to see patterns in data without summarising the data in some way. The most common way to summarise data is to convert the data into a summary table, into a graph or picture, or to use summary measures like the average. The benefit of this type of summary is that it gives us an instant picture of what is going on in our data set. The problem is that we often lose the detail of the original data.
Suppose you have collected some data on the number of children in people’s families. The variable we are measuring is ‘number of children in the family’. The values that this variable can take are numbers like 0, 1, 2 or 3. These are discrete data, in that they can only be measured as whole values. You can’t measure children in ever more accurate values like you can time or distance. When you can measure data in continuously more accurate measures (providing you have an appropriate measuring instrument), we call this type of data – continuous data.
Start by turning your data set on the variable, number of children, into a frequency distribution. A frequency distribution is a table that shows the values that a variable can take on the left hand column and then the frequency with which we observe the values in the right hand column. For example, it might look like this.
Notice that the right hand column is the frequency with which we find different values of our variable. We found 11 families with no children, 12 families with 1 child, etc. In all we have data from 47 families. You can see the pattern already. Most families have between 0 and 2 children. Having more than 3 children is quite rare. Note that we can still construct the original data from this table. There would be 11 zeros, 12 ones, 13 twos, etc.
Suppose you have a much larger data set with hundreds of values. If you created a table like the one above, it would go on for pages, and you wouldn’t see any patterns. So in these cases, we group the values together to give a shorter table. Look at this table of the value of orders received in a company over a sample of 40 orders. It is called a grouped frequency distribution table.
If you halve the width of a bar, you must double its height to keep the area the same. So you can now see why the height of the bar has been adjusted. You need to decide on one set of bars to plot with the height equal to the frequency. Choose the width that occurs most frequently to save time on adjusting other bars.
So we plot the height of the middle bars with a width of ï¿½10 at their frequency values. All the other bars must be adjusted. For example, the first bar is half the width of the middle bars, so we must double its height (multiply the frequency by 2). The last bar is 6 times wider than the middle bars and so we must divide its height by 6 (or multiply by 1/6). Try drawing this one for yourself.