Searching for the Truth About
the TIMSS 4th Grade Math Test
Did The United States Deserve
Although the United States ranked below the
mean" for the TIMSS 8th grade math test and near the bottom for
TIMSS 12th grade math test, our 4th graders appeared to save the day
an "above average" performance. But a careful reading of TIMSS
documentation casts serious doubt on the validity of this
the "Above Average" Ranking?
The TIMSS 4th Grade Math Test: Background Information
TIMSS refers to the Third International Mathematics and Science Study.
Although conventionally referred to as "the TIMSS 4th grade math test",
the test was targeted at nine-year-olds and actually given to 3rd and
graders in most countries, including the United States.
Twenty six countries participated, with approximately 150 schools per
In the United States 182 schools with 7,296 students participated
for grade 4, and 186 schools with 3,819 students participated for grade
3. Private schools were included. How many? They
say.1 (See Reference Notes)
The TIMSS 4th grade math test was given during the period September,
through May, 1995.
There were actually 8 different tests, with some questions occurring in
Test items were grouped into 26 mutually exclusive test item "clusters".
There were 8 test booklets, each containing 7 clusters.
The total test time, for both math and science, was 64 minutes.
Math was emphasized in 4 booklets, with 34, 37, 37, and 38
Science was emphasized in 4 booklets, with 18, 19, 19, and 24 minutes
For all 8 booklets, the average time for math was 28.25
The TIMSS International Study Center, located at Boston College, was
for supervising all aspects of the design and implementation of the 4th
grade math test.
The US TIMSS Steering Committee included Dr. John Dossey and Dr. Mary
both past presidents of the National Council of Teachers of Mathematics
There was one National Research Coordinator (NRC) per country.
NRC for the United States was Dr. William H. Schmidt, a professor
of counseling, educational psychology, and special education in the
of Education at Michigan State University.
Note that Professor Schmidt recently attacked the new California math
as "most disappointing, a 'back-to-basics' document that emphasizes
memorization and computations." On the other hand, in their
Report on State Math Standards, Ralph Raimi and Larry Braden ranked
math standards as the best is the nation, with the only perfect
score of 16 points.
Professor Harold W. Stevenson, co-author of The Learning Gap,
was the Director of the TIMSS Case Studies Project. He was not
with the TIMSS 4th grade math test.
United States 100, Singapore 74
No, this doesn't refer to test results, it's the
percentage of TIMSS test questions that each country considered
to their 4th grade curriculum. Only 74% of the 4th grade
test questions were "considered appropriate" by Singapore, while
100% were approved by the United States. The percentage for Korea was
lower at 43%. Yet Singapore and Korea ranked first
second on the TIMSS 4th grade math test, scoring "significantly higher"
than the United States.
It's clear that there was a "constructivist" vs. "traditional"
over the development of the pool of questions for the TIMSS 4th grade
test. Even with such weak criteria as "appropriate for at
70% of the countries" and "recommended for deletion by less than
30% of the countries", the total number of acceptable test
was continually found to be inadequate. Finally, due to "very tight
deadlines for test production", The Educational Testing Service
States) and SRI International (United States) were both both hired to
Above Average? It's All Relative!
For the 8th grade math test, 20 nations scored "significantly higher"
the United States. Only 12 of these nations competed at the 4th grade
and 11 of them again scored higher than the United States.6
What happened to the other 8 "significantly higher"
It's never explained. But perhaps they didn't like test
that were 100% approved only by the United States. Whatever the reason,
the fact that they were missing clearly helped to boost the relative
of the United States.
The Spin From the National Center for Education Statistics (NCES)
"TIMSS is a fair comparison of achievements for several reasons.
the test was jointly developed and carefully reviewed by the
countries to ensure that the items reflected curriculum topics
important in all countries, and did not over-emphasize the curriculum
taught in only a few. Second, international monitors carefully
nations' adherence to guidelines to ensure that significant numbers of
students were not excluded from the test process for any reason".7
The Truth About Exclusions: The United States excluded
412 fourth grade students, more than any other nation. Canada,
mesmerized by constructivism, was next with 268. The other
24 nations excluded a combined total of 804 fourth graders. The
nations that scored "significantly higher" than the United States had a
combined total of 11 exclusions, with none for the three highest
scoring countries, Singapore, Korea, and Japan.8
"ARE U.S. FOURTH-GRADE TEACHERS BETTER TRAINED THAT THEIR
IN OTHER TIMSS COUNTRIES? The profile of a typical U.S.
of fourth graders is similar to that of teachers in most other TIMSS
a woman at least 40 years old with more than 10 years of teaching
teachers of U.S. fourth graders have more university training than
counterparts in most TIMSS countries." 9 (Caps
and bold emphasis in the original)
The Truth About Teacher Training: There was no mention of
The Spin From the TIMSS International Study Center, Boston College
"Before giving results, I believe that it is important to note that
the TIMSS study was conducted with great attention to quality at
every step of the way. Rigorous procedures were designed
to translate the tests, and numerous regional training sessions were
in data collection and scoring. Quality control monitors
testing sessions (emphasis added) and reported back to the
Study Center at Boston College". 10 - Dr.
Beaton, TIMSS Study Director
The Truth About Test Observations:
There was one quality control monitor per country.11
Each country selected its own quality control monitor.12
Plans called for visiting 10 schools per country, and "the schools
for classroom observation had to be within easy traveling distance of
quality control monitor's home".13
In 10 of 26 countries there were no observations of testing sessions.
particular, "in the United States, the quality control monitor
indisposed, and was unable to conduct the classroom observations". 14
What About Test Security?
Although there were many lip service efforts at test security, the
documentation make it clear that anyone who wanted an advance copy of
test would have no difficulty getting it:
There was no attempt to give the test on the same day, either within a
specific country or across all countries.
There is no mention of any attempt to maintain exact counts of test
Under "Verification of the supply of test books", only 77% of test
had "definitely verified" that there was an "adequate supply
of test books prior to test administration".15
The decision to use booklet seals was left to each participating
and more than half of the countries failed to use them.16
"In many cases, test booklets were left on students' desks between
Printers, school officials, and data entry clerks all had easy
to test questions.
Defects With Booklets and Supplies, But Probably Not in the United
"Frequent problems reported with respect to the test booklets were
pages, blank pages, and duplicated pages, in addition to problems
with item translation. Several NRCs commented that the timeline did not
allow for a more thorough review of the test booklets before printing
"The Test Administrator had an adequate supply of pencils and other
necessary materials ready for the students in 77% of the
However, in many countries it is the responsibility of the student to
pencils, pens, ect. to testing situations, so it is not necessarily the
case that there was an inadequate supply in almost a quarter of the
"In 6% of the observed sessions (emphasis added), defective
booklets were identified and replaced before the session began.
a further 6% of sessions, defective booklets were found and replaced
the session began. On most of the occasions where booklets needed
replacement, the Test Administrator replaced them appropriately.
Occasionally booklets were not replaced because of a lack of spare
Difficulties With Translation, But Not in the United States
The TIMSS 4th grade math test was prepared in the United States and
translated from English (United States version) into 17 additional
for the 26 participating countries. Although the TIMSS
shows that serious efforts were made to ensure accurate translations
appropriate cultural adaptations, it's clear that these efforts weren't
totally successful and probably couldn't be totally successful,
when one considers the limited resources used, the complex subtlety of
cultural adaptation, and the lack of precision in the source English
Each country was supposed to engage two translators for the math test
but "due to limited resources, some countries were unable to engage
more than one translator" for math. 21
Translators were not required to have any knowledge of mathematics. 22
Several National Research Coordinators (NRCs) "commented on the
of making exact translations of single words or phrases from English
the country's language".23
"Some NRCs reported that due to a lack of time they could not have
translations verified prior to printing the test booklets."24
Never Mind Translation, What Does It Mean in English?
Here are examples of ambiguous TIMSS 4th grade questions.
are "free response", not multiple choice questions. (See Reference
3 for all TIMSS 4th grade math questions)
Question T2: What is the smallest whole number that you
can make using the digits 4, 3, 9, and 1? Use each digit only
The expected answer is the 4-digit number 1349. But notice
the question didn't request a 4-digit number. How about 1 = 9 -
+ 3 +1)? The answer "1" is given the incorrect response code
71, and listed as the second possible "incorrect response".
Question U5: Addition Fact: 4+4+4+4+4 = 20.
this addition fact as a multiplication fact.
The expected answer is 4x5 = 20 or 5x4 = 20. The answer 2x10 =
20 or 10x2 = 20 is given the incorrect response code of 72, and
as the third possible "incorrect response".
Question S3: Julie put a box on a shelf that is 96.4
long. The box is 33.2 centimeters long. What is the longest
box she could put on the rest of the shelf.
The expected answer is 96.4- 33.2 = 63.2 (with no mention of
But since there is no information given about the width of the shelf or
the width of boxes, the most that can be said is that the length of
longest box" is greater than or equal to 63.2 cm, and it could even be
greater than 96.4 cm. Since a written response was required, the
good student might have looked for more than a simple
Only 26% of 4th graders got the "right answer".
Searching For the Math
Beyond the lack of precision in the statement of questions, the
from many countries must have been puzzled by the non-math questions
"eyeball" estimation, pattern recognition, and "probability". (See Reference
3 for all TIMSS 4th grade math questions)
Question K5: About how long is this picture of a
Note that "measurement instruments (such as graduated rulers or
were NOT permitted for any of the student populations because several
items called for estimation (Cap is original, bold emphasis
The "picture of the pencil" actually measures 9.2 cm (using a
instrument.). The answer choices were 5 cm, 10 cm, 20 cm, and 30
Question L4: These shapes are arranged in a pattern.
shows a left-to-right string of symbols as follows: 1 Circle, 1
2 Circles, 2 Triangles, 3 Circles, 3 Triangles)
Question: What set of shapes is arranged in the same pattern?
(Four answer patterns use stars and squares instead of circles and
Question S5: Here is a paper clip (graphic of a
clip is given, with the word "Length" written below and arrow heads
to the left and right to the ends of the paper clip image).
The question: How many lengths of the paper clip is the same as the
length of this line. (A straight line is shown below the statement of
question). The paper clip actually measures 1 inch, and the line
measures 4.5 inches. Correct answers were 4, 5, and any
between 4 and 5.5 (Yes that's right, not 3.5 to 5.5). Note that
is an example of multiple right answers.
Question T5: Craig folded a piece of paper in half and
cut out a shape (a picture of a folded page is given with a rough image
of the number 3 shown as cut out of the folded edge). The student is
"to draw a picture to show what the cut-out shape will look like when
is opened up and flattened out." Multiple correct answers
a drawing of "the cut-out shape" and drawing of "the remaining
Question U2: Write a fraction that is larger than 2/7.
Multiple correct answers included 3/8.
Question L2: There is only one red marble in each of
bags (graphic shows 3 bags, one labeled 10 marbles, one labeled 100
and one labeled 1000 marbles). Question: Without looking in the
you are to pick a marble out of one of the bags. Which bag would
give you the greatest chance of picking the red marble?
Yes, it's trivial, but only if you have previously seen
of this type and understand the concept of "chance".
only 51% of 4th graders answered correctly.
Somewhere in the Directions it Said "No Calculators"!
If calculators were allowed, many TIMSS 4th grade math questions would
After an apparently bitter debate, it was decided that calculators
not be permitted for the TIMSS 4th grade math test.26
Question I4: What is 3 times 23?
Question I9: 6000 - 2369 = ?
Question J4: 25x18 is more than 24x18. How much more?
Question K2: 6971 + 5291 = ?
What a moral dilemma for the constructivist brethren in the United
Their first commandment is "calculators in kindergarten". The
is "the child shall use the calculator whenever the child thinks such
is appropriate". Now, for this one test, they were expected to
these fundamental beliefs.
We assume that Test Administrators in the United States did the
thing. Of course, with no quality control monitor observations in
the United States, we'll have to rely on faith.
Reference 1. TIMSS
Technical Manual, Volume 1: Design and Development
Reference 2. TIMSS
Quality Assurance in Data Collection
Reference 3. TIMSS
Released Item Sets - Set for Population 1 (Third and Fourth
Reference 4: TIMSS
Mathematics in the Primary School Years
Reference 5. A
TIMSS Primer: (Fordham Report by Harold W. Stevenson)
Reference 6: TIMSS
Primary School Years News, Statement by Dr. Albert Beaton
Reference 7: US
TIMSS Steering Committee Members
Reference 8: Pursuing
Excellence NCES (Study of 4th Grade TIMSS)
Numbered References: (See Superscript Numbers in
1. Reference 4, Pages A-13 through A16 in Appendix A
2. Reference 1, Pages 3-4 through 3-9
3. Reference 7, Page 2
4. Reference 4, Pages B2 and B3 in Appendix B
5. Reference 1, Pages 2-3 through 2-6
6. Reference 5, Tables 3 and 5 (Pages 11 and 15 in hardcopy)
7. Reference 8, Chapter 1, Special Notes On The Test Scores
8. Reference 4, Page A-14 in Appendix A
9. Reference 8, Chapter 2, Are U.S. Fourth-grade Teachers Better
10. Reference 6, Page 1
11. Reference 2, Page 3-1
12. Reference 1, Page 3-1
13. Reference 2, Page 3-8
14. Reference 2, Page 4-2 and 4-3.
15 Reference 2, Page 4-4
16. Reference 2, Page 4-4
17. Reference 2, Page 4-6
18. Reference 2, Page 3-10
19. Reference 2, Page 4-5
20. Reference 2, Page 4-9
21. Reference 2, Page 1-5
22 .Reference 1, Page 8-3
23. Reference 2, Page 3-10
24. Reference 2, Page 3-10
25. Reference 1, Page 2-19
26. Reference 1, Page 2-19
William G. Quirk, Ph.D.