Statistics of Raven's Advanced Progressive Matrices (reported I.Q.'s)

© Dec. 2010 Paul Cooijmans

Scores from this test are sometimes reported as "I.Q." with a standard deviation of 24, and sometimes as raw scores out of 36. This report deals with the "I.Q.'s". In a few cases they were reported with an standard deviation of 16, and those have been converted to 24 as that is the most used scale for this test.

Note that the testees reporting "I.Q.'s" are not the same individuals as those reporting raw scores (although a few report both so there is a small overlap). So the scores in this report are from a different group than those in the report dealing with R.A.P.M. raw scores.

Scores on Raven's Advanced Progressive Matrices (I.Q.)

115 *
148 *******
149 *****
151 *
152 *
153 *
154 *
156 ********************
157 *
160 *******
163 ****
164 **
165 *
171 *
172 ***
174 *
179 *
180 *
187 *

Scores by males

n = 50

115 *
148 ****
149 *****
151 *
153 *
154 *
156 ******************
157 *
160 ******
163 ****
164 *
165 *
171 *
172 *
174 *
179 *
180 *
187 *

Scores by females

n = 7

148 **
152 *
156 *
160 *
172 **

Correlation of Raven's Advanced Progressive Matrices (I.Q.) with other tests by Paul Cooijmans

Test n r
Test For Genius - Revision 200460.81
Verbal section of Test For Genius - Revision 200460.78
Spatial section of Test For Genius - Revision 200460.65
Numbers120.17
Short Test For Genius80.07
Qoymans Multiple-Choice #25-0.01
Space, Time, and Hyperspace18-0.09
Analogies of Long Test For Genius10-0.11
The Final Test8-0.13
Qoymans Multiple-Choice #18-0.16
Association subtest of Long Test For Genius10-0.21
Intelligence Quantifier by assessment8-0.25
Qoymans Multiple-Choice #47-0.32
Cooijmans Intelligence Test - Form 18-0.42
Genius Association Test5-0.52
Long Test For Genius10-0.55
Lieshout International Mesospheric Intelligence Test5-0.57
Cooijmans Intelligence Test - Form 28-0.64
Qoymans Multiple-Choice #38-0.70

Weighted average of correlations: -0.132

Conservatively estimated minimum g loading: -0.36

Ranking in above table is based on the unrounded correlations. All available data is present in this table, no tests are left out except for those with less than 5 score pairs. All known pairs are used to obtain the true, honest statistics; correlations have not been artificially inflated by leaving out ceiling scores, outliers or other anomalies.

Correlation of Raven's Advanced Progressive Matrices (I.Q.) with tests by others

Test n r
Culture Fair Numerical Spatial Examination - Final version60.46
Logima Strictica 36100.25
Raven's Advanced Progressive Matrices (raw)50.24
Sigma Test70.15
W-8750.10
Strict Logic Sequences Exam I5-0.05
Mega Test7-0.06
Strict Logic Spatial Exam 485-0.21
Nonverbal Cognitive Performance Examination7-0.26
Cattell Culture Fair12-0.27

Weighted average of correlations: 0.019

Ranking in above table is based on the unrounded correlations. All available data is present in this table, no tests are left out except for those with less than 5 score pairs. All known pairs are used to obtain the true, honest statistics; correlations have not been artificially inflated by leaving out ceiling scores, outliers or other anomalies.

Please be aware that correlations with these external tests are in most cases affected (depressed, typically) by one or more of the following: 1) Little overlap with the object test because of the much lower ceilings and inherent ceiling effects of the tests used in regular psychology; 2) Candidates reporting scores selectively, for instance only the higher ones while withholding lower ones; 3) Candidates reporting, or having been reported by psychometricians, incorrect scores.

Estimated loadings of Raven's Advanced Progressive Matrices (I.Q.) on particular item types

These are estimated g factor loadings, but against homogeneous tests containing only particular item types, as opposed to non-compound heterogeneous tests. Although tending to surprise the lay person, it is not uncommon for tests to have high loadings on item types they do not actually contain themselves. Such loadings reflect the empirical fact that most tests for mental abilities measure primarily g, regardless of their contents; that the major part of test score variance is caused by g, and only a minor part by factors germane to particular item types. It is of key importance to understand that this is a fact of nature, a natural phenomenon, and not something that was built into the tests by the test constructors.

Typeg loading of Raven's Advanced Progressive Matrices (I.Q.) on that type
Verbal-0.41
Numerical0.52
Spatial-0.14
Logical-0.99
Heterogeneous-0.46

There is no overlap between the categories in this table as a result of leaving out compound tests.

Balanced g loading = -0.29

Yes, this test has a negative g loading in its upper range. That means higher scores correspond to lower ability levels. This is of course not true for the full range of the test, but only for a tiny segment. It is in fact known from many factor-analytical studies that the Raven matrices tests are among the highest g loaded tests; but such studies deal with the normal range of I.Q., between plus and minus about two standard deviations from 100.

The correlation may also have been influenced by the fact that Raven scores are often reported with an artificial ceiling at 156, the 99th centile (see above; many 156s), but that influence may go both ways; in other words, without the artificial ceiling the negative correlation might have been even larger!

Also, in the report on R.A.P.M. raw scores (which are not influenced by reporting with artificial ceiling), a similar negative correlation is found, so apparently it is not the score reporting of psychologists that causes it to be negative.

Further study of this negative correlation and the point where it starts can better be based on known raw scores, as there exist many different sets of norms for this test (for different countries, ages, and the renormings to correct for the Flynn effect which has been large on this test), so one can not rely on the "I.Q.'s" to correspond consistently to particular raw scores.

National medians for Raven's Advanced Progressive Matrices (I.Q.)

Country n median score
United_States2176.0
Finland3163.0
France2160.5
Brazil2160.0
Netherlands5160.0
Belgium10156.0
Denmark2156.0
Sweden6156.0
Bulgaria2152.0
Greece4152.0

For reasons of privacy, only countries with 2 or more candidates are included in this table. Ranking is based on the medians, and then alphabetic.

Correlation of Raven's Advanced Progressive Matrices (I.Q.) with personal details

Personalia n r
P.S.I.A. Just50.83
P.S.I.A. Antisocial50.67
P.S.I.A. Cruel50.38
Disorders (own)230.33
P.S.I.A. Rare50.29
Mother's educational level210.27
Father's educational level210.23
Disorders (parents and siblings)220.10
P.S.I.A. Neurotic5-0.01
Sex57-0.02
Observed associative horizon4-0.08
Educational level22-0.17
Year of birth53-0.18
Observed behaviour7-0.20
Gifted Adult's Inventory of Aspergerisms9-0.24
P.S.I.A. System factor6-0.25
P.S.I.A. Deviance factor9-0.31
P.S.I.A. Ethics factor9-0.35
P.S.I.A. Introverted5-0.35
P.S.I.A. Aspergoid5-0.47
P.S.I.A. Extreme5-0.50
P.S.I.A. Orderly5-0.51
P.S.I.A. True5-0.52
P.S.I.A. Cold5-0.57
P.S.I.A. Rational5-0.72

Remark: As one would expect for a test with negative g loading, its relation to personal details is largely opposite to what it is for most positively g loaded high-range tests.

Correlation with national I.Q.'s of Raven's Advanced Progressive Matrices (I.Q.)

Correlation of this test with national average I.Q.'s published by Lynn and Vanhanen:

Estimated g factor loadings Upward and Downward of specific scores

In parentheses the number of score pairs on which that estimated g factor loading is based. The Upward and Downward values are calculated including the pertinent score itself. It is normal that g factor loadings go down when the range is restricted like this, but careful study of the Upward and Downward columns may reveal possible scores below or above which the test loses validity altogether.

ScoreUpward (n)Downward (n)
115-0.39 (190)NaN (0)
145-0.46 (185)NaN (0)
150-0.45 (131)0.74 (10)
155-0.50 (113)0.69 (35)
156-0.50 (113)0.33 (102)
160-0.54 (43)0.33 (130)
165NaN (0)-0.21 (177)
170NaN (0)-0.21 (177)
175NaN (0)-0.34 (184)
180NaN (0)-0.34 (185)
187NaN (0)-0.39 (190)

It is chilling how accurately this test fails to measure mental ability above about 155 (that is, I.Q. 134). The fact that the female median is the same as the male median on this test, while with high-range tests it is mostly somewhat lower, is consistent with this; the valid ceiling of the test is too low to allow for a sex difference in the high range to show up (that is, too low to allow males to outscore females). This is typical for tests used in regular psychometrics, which are mostly deliberately constructed to be sex-equal by leaving out problems that show a large sex difference (which in effect also means that no truly difficult problems are allowed to be included, as on difficult problems males score higher than females).

Remarks

This test is one of those which have suffered the most from the Flynn effect; the rise of raw scores over the years. That is probably one of the reasons why psychologists have often reported its scores with an artificial ceiling at the 99th centile. They see before their eyes that far too many have perfect or near-perfect scores, and that the silly norm tables attach astronomical "I.Q.'s" to that, which seem all the more ridiculous because they are given on an idiotic scale with 24 points per standard deviation.

They sense this can not be right and give all those high scorers "156 or higher" or "higher than 155" or "99th percentile". It is understandable, but it destroys information.

In any case, it seems I.Q.'s from this test are not suitable for admission to higher-I.Q. societies.