Topzle Topzle

Anscombe's quartet

Updated: Wikipedia source

Anscombe's quartet

Anscombe's quartet comprises four datasets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed. Each dataset consists of eleven (x, y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data when analyzing it, and the effect of outliers and other influential observations on statistical properties. He described the article as being intended to counter the impression among statisticians that "numerical calculations are exact, but graphs are rough".

Tables

· Data
Mean of x
Mean of x
Property
Mean of x
Value
9
Accuracy
exact
Sample variance of x: s2x
Sample variance of x: s2x
Property
Sample variance of x: s2x
Value
11
Accuracy
exact
Mean of y
Mean of y
Property
Mean of y
Value
7.50
Accuracy
to 2 decimal places
Sample variance of y: s2y
Sample variance of y: s2y
Property
Sample variance of y: s2y
Value
4.125
Accuracy
±0.003
Correlation between x and y
Correlation between x and y
Property
Correlation between x and y
Value
0.816
Accuracy
to 3 decimal places
Linear regression line
Linear regression line
Property
Linear regression line
Value
y = 3.00 + 0.500x
Accuracy
to 2 and 3 decimal places, respectively
Coefficient of determination of the linear regression: R 2 {\displaystyle R^{2}}
Coefficient of determination of the linear regression: R 2 {\displaystyle R^{2}}
Property
Coefficient of determination of the linear regression: R 2 {\displaystyle R^{2}}
Value
0.67
Accuracy
to 2 decimal places
Property
Value
Accuracy
Mean of x
9
exact
Sample variance of x: s2x
11
exact
Mean of y
7.50
to 2 decimal places
Sample variance of y: s2y
4.125
±0.003
Correlation between x and y
0.816
to 3 decimal places
Linear regression line
y = 3.00 + 0.500x
to 2 and 3 decimal places, respectively
}
0.67
to 2 decimal places
Anscombe's quartet
x
x
Dataset I
x
Dataset I
y
Dataset II
x
Dataset II
y
Dataset III
x
Dataset III
y
Dataset IV
x
Dataset IV
y
10.0
10.0
Dataset I
10.0
Dataset I
8.04
Dataset II
10.0
Dataset II
9.14
Dataset III
10.0
Dataset III
7.46
Dataset IV
8.0
Dataset IV
6.58
8.0
8.0
Dataset I
8.0
Dataset I
6.95
Dataset II
8.0
Dataset II
8.14
Dataset III
8.0
Dataset III
6.77
Dataset IV
8.0
Dataset IV
5.76
13.0
13.0
Dataset I
13.0
Dataset I
7.58
Dataset II
13.0
Dataset II
8.74
Dataset III
13.0
Dataset III
12.74
Dataset IV
8.0
Dataset IV
7.71
9.0
9.0
Dataset I
9.0
Dataset I
8.81
Dataset II
9.0
Dataset II
8.77
Dataset III
9.0
Dataset III
7.11
Dataset IV
8.0
Dataset IV
8.84
11.0
11.0
Dataset I
11.0
Dataset I
8.33
Dataset II
11.0
Dataset II
9.26
Dataset III
11.0
Dataset III
7.81
Dataset IV
8.0
Dataset IV
8.47
14.0
14.0
Dataset I
14.0
Dataset I
9.96
Dataset II
14.0
Dataset II
8.10
Dataset III
14.0
Dataset III
8.84
Dataset IV
8.0
Dataset IV
7.04
6.0
6.0
Dataset I
6.0
Dataset I
7.24
Dataset II
6.0
Dataset II
6.13
Dataset III
6.0
Dataset III
6.08
Dataset IV
8.0
Dataset IV
5.25
4.0
4.0
Dataset I
4.0
Dataset I
4.26
Dataset II
4.0
Dataset II
3.10
Dataset III
4.0
Dataset III
5.39
Dataset IV
19.0
Dataset IV
12.50
12.0
12.0
Dataset I
12.0
Dataset I
10.84
Dataset II
12.0
Dataset II
9.13
Dataset III
12.0
Dataset III
8.15
Dataset IV
8.0
Dataset IV
5.56
7.0
7.0
Dataset I
7.0
Dataset I
4.82
Dataset II
7.0
Dataset II
7.26
Dataset III
7.0
Dataset III
6.42
Dataset IV
8.0
Dataset IV
7.91
5.0
5.0
Dataset I
5.0
Dataset I
5.68
Dataset II
5.0
Dataset II
4.74
Dataset III
5.0
Dataset III
5.73
Dataset IV
8.0
Dataset IV
6.89
Dataset I
Dataset II
Dataset III
Dataset IV
x
y
x
y
x
y
x
y
10.0
8.04
10.0
9.14
10.0
7.46
8.0
6.58
8.0
6.95
8.0
8.14
8.0
6.77
8.0
5.76
13.0
7.58
13.0
8.74
13.0
12.74
8.0
7.71
9.0
8.81
9.0
8.77
9.0
7.11
8.0
8.84
11.0
8.33
11.0
9.26
11.0
7.81
8.0
8.47
14.0
9.96
14.0
8.10
14.0
8.84
8.0
7.04
6.0
7.24
6.0
6.13
6.0
6.08
8.0
5.25
4.0
4.26
4.0
3.10
4.0
5.39
19.0
12.50
12.0
10.84
12.0
9.13
12.0
8.15
8.0
5.56
7.0
4.82
7.0
7.26
7.0
6.42
8.0
7.91
5.0
5.68
5.0
4.74
5.0
5.73
8.0
6.89

References

  1. American Statistician
    https://doi.org/10.1080%2F00031305.1973.10478966
  2. The Physics Hypertextbook
    http://physics.info/linear-regression/practice.shtml#4
  3. Data Analysis with Open Source Tools
    https://archive.org/details/isbn_9780596802356/page/65
  4. Regression Analysis by Example
  5. Statistical Methods: The geometric approach
  6. The Visual Display of Quantitative Information
    https://archive.org/details/visualdisplayofq00tuft
  7. The American Statistician
    https://doi.org/10.1198%2F000313007X220057
  8. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems
    https://doi.org/10.1145%2F3025453.3025912
  9. Autodesk Research
    https://www.autodesk.com/research/publications/same-stats-different-graphs
  10. Decision Sciences Journal of Innovative Education
    https://onlinelibrary.wiley.com/doi/10.1111/dsji.12233
  11. Visual Analytics for Data Scientists
    http://link.springer.com/10.1007/978-3-030-56146-8_5
Image
Source:
Tip: Wheel or +/− to zoom, drag to pan, Esc to close.