Test data structure

© January 2012 Paul Cooijmans

Introduction

To facilitate statistical processing, data from high-range I.Q. tests need to be stored in a structured way. Below, an example of such a structure is described in general terms.

Top level sections

  1. Basic descriptive information records of each particular test or type of personal datum;
  2. Candidate records;
  3. Test submission records;
  4. Test norm records;
  5. Norming scale records.

The tests themselves, their correct answers, and the programs that need to be written to combine and process data from these sections are not considered here.

Brief description of the sections in tabular form

 Section 1.Section 2.Section 3.Section 4.Section 5.
# Units111 for each test1 for each test1
# Records1 for each test or personal datum1 for each candidate1 for each test submission1 for each possible score1 for each norm
FieldsContain descriptive information on the test or personal datumContain personal information and test scores of the candidateContain (personal) information on that test submission, and the submission's item scoresContain the norm for that scoreContain corresponding values on other scales for that norm

The sections in more detail

1. Basic information records per test or personal datum

For each field (test or personal datum) that occurs in the Candidate records (2.), this section (1.) has a record containing descriptive information on that field, needed when producing statistical reports. Examples of data fields of the records of this section:

These fields are only filled in where applicable; some records may not need all of these fields.

2. Candidate records

For each test candidate that occurs in the Test submission records (3.), this section (2.) has a record containing fields with personal information, and containing a field for each test that has been taken. The latter fields indicate at least whether the test has been taken, but may also contain the score instead (even though the score is redundant as it can also be fetched from the Test submission records, where it in turn is redundant as well). The Candidate records theoretically have several hundreds of fields (because there exist that many tests that may have been taken), but in most of the records not more than a few dozen of fields are occupied (because most candidates have not taken more than a few dozen of tests).

Redundant information like the test scores may be included to speed up processing. Statistical computations tend to be complex and require large numbers of simple calculations to be performed recursively. If basic data like scores have to computed dynamically whenever needed, this may slow down the most complex computations greatly.

Caution: Candidate records are privacy-sensitive, and this can not be resolved by anonymizing the records because a candidate's combination of scores on a number of tests is as unique and identifying as a fingerprint. Only three, probably even two scores suffice to uniquely identify a candidate, even in the absence of any personal information at all.

3. Test submission records

This is a complex section which contains a unit (equivalent to a table) for each test. Each unit therefore corresponds to a record in section 1, a unit in section 4., and a field in section 2. Each unit contains a record for each submission to that test. These records contain fields such as:

4. Test norm records

This complex section contains a unit for each test. Such a unit contains a record for every possible raw score on the test in question, providing a norm, either from a table or by means of a formula, or a combination thereof. Every norm corresponds to a record in 5.

5. Norming scale records

This section contains a unit with for each norm - whichever type of norms one uses - a record containing that norm and the corresponding values on a few other scales, such as proportions outscored with regard to certain populations, I.Q., and so on.

- [More statistics explained]