Test data structure

Introduction

To facilitate statistical processing, data from high-range I.Q. tests need to be stored in a structured way. Below, an example of such a structure is described in general terms.

Top level sections

Basic descriptive information records of each particular test or type of personal datum;
Candidate records;
Test submission records;
Test norm records;
Norming scale records.

The tests themselves, their correct answers, and the programs that need to be written to combine and process data from these sections are not considered here.

Brief description of the sections in tabular form

	Section 1.	Section 2.	Section 3.	Section 4.	Section 5.
# Units	1	1	1 for each test	1 for each test	1
# Records	1 for each test or personal datum	1 for each candidate	1 for each test submission	1 for each possible score	1 for each norm
Fields	Contain descriptive information on the test or personal datum	Contain personal information and test scores of the candidate	Contain (personal) information on that test submission, and the submission's item scores	Contain the norm for that score	Contain corresponding values on other scales for that norm

The sections in more detail

1. Basic information records per test or personal datum

For each field (test or personal datum) that occurs in the Candidate records (2.), this section (1.) has a record containing descriptive information on that field, needed when producing statistical reports. Examples of data fields of the records of this section:

Title of the field (in 2.) being described;
Possibly a numerical index to help link the field to its units in 3. and 4.;
Lowest possible value (score) of the field;
Highest possible value;
Coded description of the field's (test's) contents type (coded to facilitate automatic processing);
Year or date when the field (test) was taken into use;
Year or date when the field (test) was phased out;
Code to indicate whether the field concerns a test currently in use (this is redundant if the "Year phased out" field is present in all records);
Code to indicate whether the field is compound (is a combination of two or more other fields), and possibly also which other fields it combines if so; compound fields are inherently redundant but may be useful still);
Remarks.

These fields are only filled in where applicable; some records may not need all of these fields.

2. Candidate records

For each test candidate that occurs in the Test submission records (3.), this section (2.) has a record containing fields with personal information, and containing a field for each test that has been taken. The latter fields indicate at least whether the test has been taken, but may also contain the score instead (even though the score is redundant as it can also be fetched from the Test submission records, where it in turn is redundant as well). The Candidate records theoretically have several hundreds of fields (because there exist that many tests that may have been taken), but in most of the records not more than a few dozen of fields are occupied (because most candidates have not taken more than a few dozen of tests).

Redundant information like the test scores may be included to speed up processing. Statistical computations tend to be complex and require large numbers of simple calculations to be performed recursively. If basic data like scores have to computed dynamically whenever needed, this may slow down the most complex computations greatly.

Caution: Candidate records are privacy-sensitive, and this can not be resolved by anonymizing the records because a candidate's combination of scores on a number of tests is as unique and identifying as a fingerprint. Only three, probably even two scores suffice to uniquely identify a candidate, even in the absence of any personal information at all.

3. Test submission records

This is a complex section which contains a unit (equivalent to a table) for each test. Each unit therefore corresponds to a record in section 1, a unit in section 4., and a field in section 2. Each unit contains a record for each submission to that test. These records contain fields such as:

Candidate (this corresponds to a record in section 2.);
Sex (this is redundant as it occurs also in 2.);
Age when test was taken (theoretically this is redundant as it could also be computed from the date taken in the current record and date of birth in section 2.);
Date when test was taken;
Score (this is redundant as it can also be computed "on the fly" from the item scores, but may be included still to speed up processing);
A field for each item, containing the item score (typically 0 or 1, but anything is possible);
In case of simple multiple-choice tests, the actual given answers may be included in a field for each item; in that case, the item score fields become redundant as the item scores can be computed dynamically if one stores the answer key in a separate unit in this section (but high-range tests are mostly not of this kind; and dynamic scoring while computing statistics may also slow down complex calculations).

4. Test norm records

This complex section contains a unit for each test. Such a unit contains a record for every possible raw score on the test in question, providing a norm, either from a table or by means of a formula, or a combination thereof. Every norm corresponds to a record in 5.

5. Norming scale records

This section contains a unit with for each norm - whichever type of norms one uses - a record containing that norm and the corresponding values on a few other scales, such as proportions outscored with regard to certain populations, I.Q., and so on.