Stats & Glossary
Erich R. Merkle, M.A., M.Ed.
School Psychologist
School Psychology Program
Kent State University


Home
Why Assess?
Characteristics
Stats & Glossary
Additional Resources
Test Examples

Glossary of Terms & Statistical Properties
Related to Assessing Young Children


Statistical Properties:

  • Reliability:  refers to the consistency of measurements taken by a given test.  Test results need to be reproducible, stable, and meaningful.  Usually reliability is reported in terms of a reliability coefficient, a number ranging from 0.00 to 1.00.  The closer to 1.00 a given test's reliability coefficient is, the stable the test over time.  There are also several different types of reliability, including:

    • Test-Retest Reliability:  this value gives an index of stability over time.  For a preschool test, you would want a test-retest reliability of 0.90 for a 2 to 6 week period.

    • Alternate Form Reliability:  this value, also called equivalent or parallel form reliability, indicates the degree to which two forms of a test are equivalent.

    • Internal Consistency Reliability:  this value indicates the degree to which every item consistently measures some underlying idea.

  • Validity:  refers to the extent that a test measures what it is supposed to measure.  It is important to acknowledge that tests are only valid for the specific purpose they are designed - validity is a matter of degree and context of the test.  Like reliability, there are also several forms of validity, including:

    • Content Validity:  refers to whether items on a test are representative of the domain or attribute the test is supposed to measure.

    • Criterion-Related Validity:  refers to the relationship between the test scores and some other criterion or outcome.  There are two further types of criterion-related validity:

      • Concurrent Validity:  the extent to which the test scores of a given test are related to some other available measure.

      • Predictive Validity:  refers to whether the score obtained on the test are an accurate predictor of future performance on that criterion.  Ideally, preschool tests should measure skills that best predict future intellectual or academic abilities.

    • Construct Validity:  refers to whether a test actually measures some domain or trait.

    Sources:  Sattler, J. M. (1992).  Assessment of children: Revised and updated third edition [3rd ed.].  San Diego:  Jerome M. Sattler Publisher.

    Bracken (1988), adapted by Caven S. Mcloughlin.


    Other Technical Terms:

  • Ability Test (IQ test):  a type of test used to measure ability in a given domain, typically intellectual ability as in an IQ test.

  • Achievement Test:  a type of test used to measure knowledge or skills in one or more academic domains.

  • Age Equivalent:  a type of derived score that represents the chronological age corresponding to a given raw score value.  This does not mean a child is performing at a given age; instead it only means the child earned the same number of points as a child with that age.

  • Ceiling:  the highest level on a given test; preschool tests would ideally have a ceiling that is + 2 standard deviations, with + 3 or 4 standard deviations preferred.

  • Correlation:  the relationship between two or more variables.  Correlations range from -1.00 to +1.00.  The closer any value is to 1.00, the higher degree of relationship between the variables.  Correlations are generally considered positive, meaning that a high score on one variable predicts a high score on another variable, or negative, meaning that a low score on one variable predicts a high score on another variable.

  • Floor:  the lowest level of a given test; preschool tests would ideally have a floor that is -2 standard deviations, with - 3 or 4 preferred.

  • Grade Equivalent:  a type of derived score that represents a school grade corresponding to a given raw score value.  This value does not mean a child is performing at that grade level; instead it only signifies that the child earned the same raw score as a child in a given grade.

  • Mean:  the average score in a distribution of scores.

  • Normal Curve:  a symmetrical bell-shaped distribution of scores where the highest frequency of scores cluster around the mean and more infrequent scores lie at the outer tails.  Most test scores are assumed to fall along a normal curve distribution.

     
  • Norm Tables:  a table organized by chronological age/school grade, and lists corresponding standard scores for a given raw score on a test.  Ideally preschool tests would have 1 to 2 month divisions over 3 or 4 month divisions for those tables organized by age.

  • Percentile Rank:  a type of derived score that allows one to know a child's relative position on a given distribution of scores, typically a normal distribution.

  • Standard Deviation:  the degree that scores on a test deviate from the average score.

  • Standard Scores:  a raw score that has been transformed to have a given mean and standard deviation.  Such a transformation is helpful in order to compare scores across different tests.  Most tests report their results in one or more types of standard score:

    • Stanford Binet IQ Scores:  Mean of 100 and Standard Deviation of 16.

    • Scaled Score:  Mean of 10 and Standard Deviation of 3.

    • T-Scores:  Mean of 50 and Standard Deviation of 10.

    • Weschler IQ Scores:  Mean of 100 and Standard Deviation of 15.

    • Z-Score:  Mean of 0 and Standard Deviation of 1.

  • Stanine:  a contraction of standard-nine; expresses a score as a whole number ranging from 1 to 9, and has a mean of 5 and standard deviation of 2.

  • Raw Score:  a value corresponding to the number of items a child answered correctly on a test.  Because raw scores are meaningless by themselves, they are typically transformed into one or more standard scores.




These pages designed to fulfill a course requirement in
Developmental Assessment at Kent State University.
Please contact Erich Merkle with any comments or questions.