The differentiation hypothesis of g tested

© January 2023 Paul Cooijmans

Introduction

A phenomenon that has been observed since the early twentieth century is the apparently decreasing role of g, the general factor in mental ability testing, at higher levels of I.Q. That is, in above-average samples, g accounts for a smaller proportion of the score variance than in full-range or below-average samples. Charles Spearman, the discoverer of g, was the first to notice this and spoke of a "law of diminishing returns" with regard to g. Others have repeated his findings, and it has been suggested that, in the high range of I.Q., g breaks up into group factors (containing variance common to some but not all tests) and specificity (variance unique to a particular test) and that therefore it is impossible to measure g beyond a certain point or range, or at least that I.Q. scores become meaningless beyond that, no longer representing largely g. Typically one names the 99th centile, but broadly speaking it concerns the range from the 98th to 99.9th centile where the breakup is supposed to occur.

It must be noted that studies also exist that fail to confirm or even contradict this differentiation of g with increasing I.Q. And, in the very low range - of moderate to severe retardation - one also finds that g decreases, apparently as a result of the non-hereditary causes that often underlie very low I.Q.'s (for instance, brain damage).

The differentiation hypothesis of g, and g's possible immeasurability in the high range, were my primary motivation to begin constructing tests specifically for the high range. I wanted to find out whether or not it is possible to meaningfully test intelligence in the high range. I suspected that the decreasing role of g at high I.Q. levels in mainstream testing was simply caused by the absence of really difficult problems in regular I.Q. tests, which makes them invalid for the high range. Later I also understood that the fact that some tests are purposely constructed to yield sex-equal scores destroys high-range validity.

Over the years I have found that the intercorrelations and g loadings of high-range tests are on the whole comparable to those found among mainstream tests, so that, at first sight, the differentiation hypothesis is either not borne out or there is only a small amount of differentiation. On those occasions where significant low correlations are found, there tend to be identifiable causes such as poor test construction, floor effects on extremely hard tests, fraud in test-taking, and dishonesty or incompleteness of candidates in reporting scores on external tests. Many candidates, when asked to report their scores, report only their highest few scores and leave out the rest, and/or report retest scores or fraudulent scores, thus depressing or making meaningless the correlations found with tests by others. For the several external tests for which I am in possession of the full data, similar correlations and g loadings are found to those among my own tests. So it seems that high-range tests do measure the general factor after all, in a range where mainstream tests have lost validity. The next question is then, up to which level in the high range do they measure the general factor? Do g loadings of high-range tests go down as I.Q. goes up, and through what cause?

The method used

In the present study, the goal is to verify whether a decrease of g loadings with increasing I.Q. can be detected within high-range tests. Selected are all of the tests that have received 49 or more submissions; this concerns 42 tests. This high submission threshold is chosen because partial g loadings need much data to be computed; especially the area above the 3rd quartile (top 25 %) tends to have too little data for a g loading of its own in tests with fewer submissions. There were two tests in this sample with too little data to compute a significant top 25 % g loading, but they have been left in since these two outliers (one high and one low) do not affect the median. This concerns Problems In Gentle Slopes of the third degree and Combined Numerical and Spatial sections of Test For Genius - Revision 2010.

Estimated g factor loadings are computed separately for the full range, the bottom half, the top half, and the top 25 % of each test. The separation points are, for each test, the raw score corresponding to protonorm 396.5 (I.Q. 138) for the top/bottom halves, which is the median of scores on high-range tests according to the most recent norming of the protonorm table, and the raw score corresponding to protonorm 452.5 (I.Q. 147) for the 3rd quartile. For clarity, this means that separate g loadings are given for the 50 % of scores below the test-independent median, for the 50 % above the median, and for the 25 % above the 3rd quartile. In a future edition of this report, it will be considered using the within-test quartiles (including the median, which is a quartile too) for separation points, instead of the test-independent values from the protonorm table. The use of within-test quartiles may prevent or reduce the problem of having too little data on some tests (because there is guaranteed to be 25 % of data above the within-test 3rd quartile, for instance) and the submission threshold may be lowered then.

The mere fact of restricting the range like this depresses the g loading compared to computing it over the test's full range, so it would be expected for the partial values to be lower than the test's full-range g loading, but the point here is to see (1) whether or not the top half and top 25 % loadings are consistently lower than the bottom half ones, and if so, (2) whether this is merely the result of restriction of range, or is an intrinsic property of g.

Explanation of restriction of range

Perhaps it should be explained here why restriction of range depresses g loadings; g loadings are computed from correlations between tests. Correlations, in turn, depend on (shared) variance. Less variance means lower correlations and therefore lower g loadings. When you look at only part of a test's range, you are reducing the variance compared to the full-range variance, and as a result will obtain lower correlations between any variables there, and from lower correlations follow lower g loadings. This is the phenomenon of attenuation through restriction of range, which is responsible for the general phenomenon in social science that the use of above-average samples (such as students or persons of higher education) results in marked underestimations of the correlations between various behavioural variables, and thus in marked underestimations of, for instance, the societal importance of general intelligence.

The attenuation of correlations through restriction of range is a purely statistical phenomenon, not a material, substantial, or intrinsic property of the variables being studied.

The results per test

For each test in the table, the loadings are based on all of the other tests in the database for which there exist at least 4 score pairs with the object test. The tests are ordered by their full-range g loadings. The numbers of pairs are not given here to avoid cluttering up the table, but are in the many hundreds to over a thousand per test. One may consult the statistical reports of particular tests to get an impression of the high numbers of score pairs currently available.

TestFull-range gBottom half gTop half gTop 25 % g
Test of the Beheaded Man.87.81.69.69
Test For Genius - Revision 2016.87.8.72.7
Reflections In Peroxide.86.83.68.5
Numerical and spatial sections of The Marathon Test.86.83.71.58
Numerical section of The Marathon Test.84.8.59.48
Spatial section of The Marathon Test.83.79.66.62
The Bonsai Test - Revision 2016.83.79.62.59
Cooijmans Intelligence Test - Form 4.81.69.67.68
Combined Numerical and Spatial sections of Test For Genius - Revision 2016.81.76.56.75
The Test To End All Tests.8.81.7.72
Test For Genius - Revision 2004.8.74.66.71
The Nemesis Test.79.55.7.66
Problems In Gentle Slopes of the third degree.79.77.45.85
Cooijmans Intelligence Test - Form 3.78.75.55.55
Associative LIMIT.78.71.63.68
Combined Numerical and Spatial sections of Test For Genius - Revision 2010.78.86.57-.31
Cooijmans Intelligence Test 5.77.44.66.85
Cartoons of Shock.77.75.69.58
Space, Time, and Hyperspace - Revision 2016.77.75.38.28
Gliaweb Riddled Intelligence Test - Revision 2011.77.71.62.61
The Final Test.76.6.68.66
Long Test For Genius.75.66.68.63
Cooijmans Intelligence Test - Form 2.75.71.62.73
Lieshout International Mesospheric Intelligence Test.75.72.52.51
Numerical section of Test For Genius - Revision 2010.75.71.65.51
Short Test For Genius.74.51.78.78
Verbal section of Test For Genius - Revision 2004.74.69.63.61
Spatial section of Test For Genius - Revision 2004.74.74.54.58
Reason Behind Multiple-Choice - Revision 2008.74.74.6.52
Analogies of Long Test For Genius.73.59.54.49
Qoymans Multiple-Choice #5.73.67.57.66
The Sargasso Test.72.58.52.64
Association subtest of Long Test For Genius.69.28.62.46
Genius Association Test.69.56.68.67
Reason - Revision 2008.65.68.32.26
Cooijmans Intelligence Test - Form 1.64.57.51.52
Space, Time, and Hyperspace.64.65.53.47
Qoymans Multiple-Choice #2.6.43.64.52
Qoymans Multiple-Choice #4.59.53.57.4
Qoymans Multiple-Choice #3.57.38.4.19
Numbers.62.38.37.38
Qoymans Multiple-Choice #1.33.28.51.66

Combined results

Set of testsMedian full-range gMedian bottom half gMedian top half gMedian top 25 % g
All.755.71.62.585
Heterogeneous.78.725.64.65
Verbal.69.59.62.61
Spatial.75.74.53.51
Numerical.75.71.59.48

Over all of the tests combined, the top half g loading is lower than the bottom half one, and the top 25 % loading is lower still, which is all consistent with the "law of diminishing returns". However, this effect seems mainly due to the spatial and numerical tests, while on the verbal tests the top half loadings are even slightly higher than the bottom half loadings. On heterogeneous tests, the top half loadings are only moderately lower than the bottom half ones, and in fact these tests do best of all tests in the top half and top 25 %. This last fact speaks against the "differentiation hypothesis" of g; it suggests that g does not break up into group factors and specificity at higher levels, and that the observed reduction of g loadings may be due to restriction of range only.

This test of the differentiation hypothesis will be repeated in the future with (even) more tests and more data, possibly with improvements to the method, and the tests themselves will be improved whenever possible. Also, it would be interesting to verify whether the result observed on the homogeneous spatial and numerical tests (much lower top half and top 25 % than bottom half loadings) is germane to those test types, or is unique to the particular set of spatial and numerical tests in this study. There is currently no sufficient data for other homogeneous spatial and numerical tests available to do this.

Conclusions

  1. There is a decrease of g loading going from the bottom half to the top half and top 25 % of the high range;
  2. The decrease of g loading is mainly due to homogeneous spatial and numerical tests, while on verbal tests there is a slight increase;
  3. On heterogeneous tests, the decrease of g loading is only moderate, and in fact heterogeneous tests do best of all tests in the top half and top 25 %;
  4. The "differentiation hypothesis" of g appears not to be supported by these results, suggesting that the decrease of g is caused only by the statistical phenomenon of restriction of range, in other words is a type of artefact;
  5. Especially the strong decrease of g loading observed on homogeneous spatial and numerical tests needs to be studied again with more or other spatial and numerical tests, to know whether it is (A) an intrinsic property of those test types or (B) specific to the particular spatial and numerical tests in this report.