You are one of...current visitors on the English part - also ...current visitors on the Swedish part

(the number of current visitors is automatically updated every 4 minutes)

Cite this page as:

-

Introduction to statistics

-

First published:

on:

INFOVOICE.SE

Last updated:

If you want to share information about this web page...

This web-page gives you a brief overview of statistics. You will find lots of links from this web-page to other web-pages providing more information. (Another page describes how to chose inferential statistics suitable for your project.)

You will understand this page best if you first have read the page introduction to research.

Algorithms and mathematics

An algorithm is any set of well defined instructions or steps to accomplish a goal. It could be how to walk from A to B. Mathematical algorithms are precise instructions for how to solve a defined mathematical problem.

Mathematics use numbers in various forms to draw conclusions, usually aiming to solve a problem. It can be purely theoretical problems or practical problems involving obtaining observations from the real world. Mathematical research explores new pathways to make conclusions. Once a pathways to solve a specific problem is established it is usually defined as a mathematical algorithm. Hence, all statistical methods are mathematical algorithms, but not all mathematical algorithms are statistical methods.

A bird’s-eye view of statistics

Statistics is the method used to find a pattern among observations consisting of, or being transformed to, numbers. Hence, statistics is only used in empirical-atomistic (quantitative) research approaches and is based on mathematics. The domains embraced by statistics are:

Bird’s-eye view of inferential statistics
(Click on image to get a high resolution image for your own PowerPoint)

Descriptive statistics try to describe the observations, by using numbers (a measure of central tendency and a measure of dispersion), tables and graphs.

Inferential statistics try to condense data and make conclusions from your observations by a) calculating the probability of being wrong when you reject the null hypothesis described as a the p-value and b) calculate an effect size such as Cohens d, odds ratios, hazard ratios etc.

Analytical statistics is a bit broader and sometimes used less formally. It generally refers to the process of analyzing data to discover patterns, relationships, and insights. This often involves using inferential statistical methods, but it can also include descriptive statistics, data visualization, and more complex modeling. Inferential statistics is a more narrow term meaning mathematically calculating the p-value, effect size, or a measure of agreement.

Brief history of statistics

Early descriptive statistics

For thousands of years we had descriptive statistics. The primary motivation was statecraft—the need for rulers to manage their kingdoms effectively.

  • Taxation: Monarchs needed to know how many people lived in their lands and how much wealth they possessed to levy taxes accurately.
  • Military Power: Rulers needed to estimate the number of fighting-age men available for conscription into armies.
  • Resource Management: Governments needed data on food supplies, livestock, and land use to prevent famine and maintain stability.

This gives us the etymology of the word itself: Statistics comes from the Italian word statista (“statesman”) and the German word Statistik (“state affairs”). It was originally, quite literally, the “science of the state”. Statistics was initially simply counting. Ancient Civilizations as early as 3800 BC, tlike he Babylonians, used clay tablets to record agricultural yields . The Egyptians conducted detailed censuses to arrange labor for building the pyramids . The Roman Empire were bureaucratic masters. They conducted regular censuses (every 5 years) to register citizens and their property .

Early inferential statistics

The shift from “just counting” to “analyzing patterns” occurred in the 17th century. This is when inferential statistics as a scientific discipline was truly born. John Graunt (1620-1674), a humble London haberdasher (merchant), is often credited as the “father of statistics” because he was the first to analyze the numbers. In 1662, Graunt published Natural and Political Observations Made upon the Bills of Mortality . He studied the weekly reports of deaths in London (originally created to track the bubonic plague). Instead of just reading the lists, he looked for patterns. London had suffered from plague outbreaks at intervals and the King wanted to use an early-warning system of the threat of fresh outbreaks. Weekly records were kept of mortality and the causes of death in the capital. On the basis of these Bills, Graunt made an estimate of the population of London. He calculated survival rates, noted that more boys were born than girls (but boys died at higher rates), and created the first life table to predict life expectancy. He proved that social phenomena (like death and birth) followed predictable laws, turning statistics into a tool for understanding society, not just taxing it.

A friend of Graunt, Sir William Petty (1623-1687), coined the term “Political Arithmetic” . He argued that government policy should be based on data and quantitative evidence rather than merely rhetoric or intuition . He applied Graunt’s statistical methods to economics, estimating national income and the value of labor. Gottfried Achenwall (1719-1772), a German professor, is often cited for coining the actual term Statistik in 1749. However, he defined it as the general description of the state (geography, politics, economics), rather than the mathematical analysis of data we use today.

Abraham de Moivre (1667-1754) a French mathematician who worked in London and was a friend of Isaac Newton. De Moivre was the first to state the properties of the normal curve. We can state the exact proportion of a population that will lie between any two values of items in a population, because of de Moivre’s work. Carl Friedrich Gauss (1777-1855) and Pierre-Simon Laplace (1749-1827) applied probability to errors in astronomical observations, giving us the “Bell Curve” (Normal Distribution).

Adolphe Quetelet (1796-1874) a Belgian mathematician who when studying the distribution of people’s characteristics observed and studied the properties of the normal distribution curve – one of the central concepts in statistics. He applied the Bell Curve to human data. He invented the concept of the “Average Man” (l’homme moyen) and even created the Body Mass Index (BMI). He showed that crime, marriage, and suicide rates were surprisingly constant, suggesting that statistics could study moral and social behavior. Quetelet divided the different heights of people he studied along an horizontal axis, and noted the total numbers of people of specific heights in columns parallel to the vertical axis. He saw that the highest columns in his diagrams were clustered together around a mid-point. The height of the columns fell away symmetrically on either side of the highest column until, at the extreme values of the range, the columns were very small. He used these observations to suggest that the chances of big deviations in any characteristic were limited. Crucially, too, he saw that the distribution of a characteristic in a population follows the shape of a bell when put into a diagram. The properties of the bell-shaped distribution are Quetelet’s greatest contribution to modern statistics.

Mathematicians like Blaise Pascal (1623-1662) and Pierre de Fermat (1601-1665) invented probability theory to solve gambling disputes. The reason it became interesting as a discipline was when mathematicians of the 17th century began calculating the odds in various games of chance. It was obvious that this knowledge has the potential to be financially rewarding. These gaming theories could then be applied into other contexts. In the 18th and 19th centuries, these two streams—state data collection and mathematical probability—merged.

Modern inferential statistics

William Gosset (1876-1937) worked initially as a chemist in the Guinness brewery in Dublin in 1899 and did important work on statistics. He invented the t-test to handle small samples for quality control in brewing. He wrote under the name “Student” and he invented Students t-test.

Ronald Fisher (1890-1962) gave in 1922 a new definition of statistics. Its purpose was the reduction of data and he identified three fundamental problems. These are firstly, specification of the kind of population that the data came from, secondly estimation and, thirdly, distribution. The contributions Fisher made included the development of methods suitable for small samples, like those of Gosset, the discovery of the precise distributions of many sample statistics and the invention of analysis of variance. He introduced the term maximum likelihood and studied hypothesis testing. Fisher is considered one of the founders of modern statistics because of his many important contributions.

Karl Pearson (1857-1936) applied statistics to biological problems of heredity and evolution. From 1893 to 1912 he wrote 18 papers entitled Mathematical Contribution to the Theory of Evolution which contain his most valuable work. These papers contain contributions to regression analysis and the correlation coefficient. Pearson coined the term ‘standard deviation‘ in 1893.

Suggested further reading

  1. Observations and variables.
  2. Study design.
  3. Choosing statistical analysis.
  4. Sample size estimations.
  5. Sampling strategies and data collection
  6. Writing a study protocol.
  7. Significant figures
  8. Descriptive statistics.
  9. Inferential statistics.

References

{2262766:Y38PWFUP};{2262766:Y38PWFUP};{2262766:Y38PWFUP};{2262766:NYTUSBXA};{2262766:RJ5TAQVI};{2262766:7RU4F7DH} vancouver default asc 0 4947