Searching for the Truth About
the TIMSS 4th Grade Math Test
Did The United States Deserve
the "Above Average" Ranking?
Although the United States ranked below the
"international
mean" for the TIMSS 8th grade math test and near the bottom for
the
TIMSS 12th grade math test, our 4th graders appeared to save the day
with
an "above average" performance. But a careful reading of TIMSS
documentation casts serious doubt on the validity of this
apparently
positive result.
The TIMSS 4th Grade Math Test: Background Information
-
TIMSS refers to the Third International Mathematics and Science Study.
-
Although conventionally referred to as "the TIMSS 4th grade math test",
the test was targeted at nine-year-olds and actually given to 3rd and
4th
graders in most countries, including the United States.
-
Twenty six countries participated, with approximately 150 schools per
country.
-
In the United States 182 schools with 7,296 students participated
for grade 4, and 186 schools with 3,819 students participated for grade
3. Private schools were included. How many? They
didn't
say.1 (See Reference Notes)
-
The TIMSS 4th grade math test was given during the period September,
1994
through May, 1995.
-
There were actually 8 different tests, with some questions occurring in
multiple tests.2
-
Test items were grouped into 26 mutually exclusive test item "clusters".
-
There were 8 test booklets, each containing 7 clusters.
-
The total test time, for both math and science, was 64 minutes.
-
Math was emphasized in 4 booklets, with 34, 37, 37, and 38
minutes
for math.
-
Science was emphasized in 4 booklets, with 18, 19, 19, and 24 minutes
for
math.
-
For all 8 booklets, the average time for math was 28.25
minutes.
-
The TIMSS International Study Center, located at Boston College, was
responsible
for supervising all aspects of the design and implementation of the 4th
grade math test.
-
The US TIMSS Steering Committee included Dr. John Dossey and Dr. Mary
Lindquist,
both past presidents of the National Council of Teachers of Mathematics
(NCTM).3
-
There was one National Research Coordinator (NRC) per country.
The
NRC for the United States was Dr. William H. Schmidt, a professor
of counseling, educational psychology, and special education in the
College
of Education at Michigan State University.
-
Note that Professor Schmidt recently attacked the new California math
standards
as "most disappointing, a 'back-to-basics' document that emphasizes
memorization and computations." On the other hand, in their
Fordham
Report on State Math Standards, Ralph Raimi and Larry Braden ranked
California's
math standards as the best is the nation, with the only perfect
numerical
score of 16 points.
-
Professor Harold W. Stevenson, co-author of The Learning Gap,
was the Director of the TIMSS Case Studies Project. He was not
involved
with the TIMSS 4th grade math test.
United States 100, Singapore 74
No, this doesn't refer to test results, it's the
percentage of TIMSS test questions that each country considered
relevant
to their 4th grade curriculum. Only 74% of the 4th grade
math
test questions were "considered appropriate" by Singapore, while
100% were approved by the United States. The percentage for Korea was
even
lower at 43%. Yet Singapore and Korea ranked first
and
second on the TIMSS 4th grade math test, scoring "significantly higher"
than the United States.4
It's clear that there was a "constructivist" vs. "traditional"
battle
over the development of the pool of questions for the TIMSS 4th grade
math
test. Even with such weak criteria as "appropriate for at
least
70% of the countries" and "recommended for deletion by less than
30% of the countries", the total number of acceptable test
questions
was continually found to be inadequate. Finally, due to "very tight
deadlines for test production", The Educational Testing Service
(United
States) and SRI International (United States) were both both hired to
add
test items.5
Above Average? It's All Relative!
For the 8th grade math test, 20 nations scored "significantly higher"
than
the United States. Only 12 of these nations competed at the 4th grade
level,
and 11 of them again scored higher than the United States.6
What happened to the other 8 "significantly higher"
nations?
It's never explained. But perhaps they didn't like test
questions
that were 100% approved only by the United States. Whatever the reason,
the fact that they were missing clearly helped to boost the relative
standing
of the United States.
The Spin From the National Center for Education Statistics (NCES)
"TIMSS is a fair comparison of achievements for several reasons.
First,
the test was jointly developed and carefully reviewed by the
participating
countries to ensure that the items reflected curriculum topics
considered
important in all countries, and did not over-emphasize the curriculum
content
taught in only a few. Second, international monitors carefully
reviewed
nations' adherence to guidelines to ensure that significant numbers of
students were not excluded from the test process for any reason".7
The Truth About Exclusions: The United States excluded
412 fourth grade students, more than any other nation. Canada,
also
mesmerized by constructivism, was next with 268. The other
24 nations excluded a combined total of 804 fourth graders. The
seven
nations that scored "significantly higher" than the United States had a
combined total of 11 exclusions, with none for the three highest
scoring countries, Singapore, Korea, and Japan.8
"ARE U.S. FOURTH-GRADE TEACHERS BETTER TRAINED THAT THEIR
COLLEAGUES
IN OTHER TIMSS COUNTRIES? The profile of a typical U.S.
teacher
of fourth graders is similar to that of teachers in most other TIMSS
countries:
a woman at least 40 years old with more than 10 years of teaching
experience.
However,
teachers of U.S. fourth graders have more university training than
their
counterparts in most TIMSS countries." 9 (Caps
and bold emphasis in the original)
The Truth About Teacher Training: There was no mention of
education
in mathematics.
The Spin From the TIMSS International Study Center, Boston College
"Before giving results, I believe that it is important to note that
the TIMSS study was conducted with great attention to quality at
every step of the way. Rigorous procedures were designed
specifically
to translate the tests, and numerous regional training sessions were
held
in data collection and scoring. Quality control monitors
observed
testing sessions (emphasis added) and reported back to the
International
Study Center at Boston College". 10 - Dr.
Albert
Beaton, TIMSS Study Director
The Truth About Test Observations:
-
There was one quality control monitor per country.11
-
Each country selected its own quality control monitor.12
-
Plans called for visiting 10 schools per country, and "the schools
selected
for classroom observation had to be within easy traveling distance of
the
quality control monitor's home".13
-
In 10 of 26 countries there were no observations of testing sessions.
In
particular, "in the United States, the quality control monitor
became
indisposed, and was unable to conduct the classroom observations". 14
(emphasis added)
What About Test Security?
Although there were many lip service efforts at test security, the
TIMSS
documentation make it clear that anyone who wanted an advance copy of
the
test would have no difficulty getting it:
-
There was no attempt to give the test on the same day, either within a
specific country or across all countries.
-
There is no mention of any attempt to maintain exact counts of test
booklets.
Under "Verification of the supply of test books", only 77% of test
administrators
had "definitely verified" that there was an "adequate supply
of test books prior to test administration".15
-
The decision to use booklet seals was left to each participating
country,
and more than half of the countries failed to use them.16
-
"In many cases, test booklets were left on students' desks between
sessions".17
-
Printers, school officials, and data entry clerks all had easy
access
to test questions.
Defects With Booklets and Supplies, But Probably Not in the United
States
-
"Frequent problems reported with respect to the test booklets were
missing
pages, blank pages, and duplicated pages, in addition to problems
associated
with item translation. Several NRCs commented that the timeline did not
allow for a more thorough review of the test booklets before printing
and
dissemination."18
-
"The Test Administrator had an adequate supply of pencils and other
necessary materials ready for the students in 77% of the
sessions.
However, in many countries it is the responsibility of the student to
bring
pencils, pens, ect. to testing situations, so it is not necessarily the
case that there was an inadequate supply in almost a quarter of the
testing
sessions."19
-
"In 6% of the observed sessions (emphasis added), defective
test
booklets were identified and replaced before the session began.
In
a further 6% of sessions, defective booklets were found and replaced
after
the session began. On most of the occasions where booklets needed
replacement, the Test Administrator replaced them appropriately.
Occasionally booklets were not replaced because of a lack of spare
copies."
20
Difficulties With Translation, But Not in the United States
The TIMSS 4th grade math test was prepared in the United States and
then
translated from English (United States version) into 17 additional
languages
for the 26 participating countries. Although the TIMSS
documentation
shows that serious efforts were made to ensure accurate translations
and
appropriate cultural adaptations, it's clear that these efforts weren't
totally successful and probably couldn't be totally successful,
particularly
when one considers the limited resources used, the complex subtlety of
cultural adaptation, and the lack of precision in the source English
version..
-
Each country was supposed to engage two translators for the math test
questions,
but "due to limited resources, some countries were unable to engage
more than one translator" for math. 21
-
Translators were not required to have any knowledge of mathematics. 22
-
Several National Research Coordinators (NRCs) "commented on the
difficulty
of making exact translations of single words or phrases from English
into
the country's language".23
-
"Some NRCs reported that due to a lack of time they could not have
their
translations verified prior to printing the test booklets."24
Never Mind Translation, What Does It Mean in English?
Here are examples of ambiguous TIMSS 4th grade questions.
These
are "free response", not multiple choice questions. (See Reference
3 for all TIMSS 4th grade math questions)
Question T2: What is the smallest whole number that you
can make using the digits 4, 3, 9, and 1? Use each digit only
once.
The expected answer is the 4-digit number 1349. But notice
that
the question didn't request a 4-digit number. How about 1 = 9 -
(4
+ 3 +1)? The answer "1" is given the incorrect response code
of
71, and listed as the second possible "incorrect response".
Question U5: Addition Fact: 4+4+4+4+4 = 20.
Write
this addition fact as a multiplication fact.
The expected answer is 4x5 = 20 or 5x4 = 20. The answer 2x10 =
20 or 10x2 = 20 is given the incorrect response code of 72, and
listed
as the third possible "incorrect response".
Question S3: Julie put a box on a shelf that is 96.4
centimeters
long. The box is 33.2 centimeters long. What is the longest
box she could put on the rest of the shelf.
The expected answer is 96.4- 33.2 = 63.2 (with no mention of
centimeters).
But since there is no information given about the width of the shelf or
the width of boxes, the most that can be said is that the length of
"the
longest box" is greater than or equal to 63.2 cm, and it could even be
greater than 96.4 cm. Since a written response was required, the
good student might have looked for more than a simple
subtraction.
Only 26% of 4th graders got the "right answer".
Searching For the Math
Beyond the lack of precision in the statement of questions, the
children
from many countries must have been puzzled by the non-math questions
involving
"eyeball" estimation, pattern recognition, and "probability". (See Reference
3 for all TIMSS 4th grade math questions)
Question K5: About how long is this picture of a
pencil?
Note that "measurement instruments (such as graduated rulers or
protractors)
were NOT permitted for any of the student populations because several
items called for estimation (Cap is original, bold emphasis
added)".25
The "picture of the pencil" actually measures 9.2 cm (using a
"measuring
instrument.). The answer choices were 5 cm, 10 cm, 20 cm, and 30
cm.
Question L4: These shapes are arranged in a pattern.
(Graphic
shows a left-to-right string of symbols as follows: 1 Circle, 1
Triangle,
2 Circles, 2 Triangles, 3 Circles, 3 Triangles)
Question: What set of shapes is arranged in the same pattern?
(Four answer patterns use stars and squares instead of circles and
triangles).
Question S5: Here is a paper clip (graphic of a
paper
clip is given, with the word "Length" written below and arrow heads
pointing
to the left and right to the ends of the paper clip image).
The question: How many lengths of the paper clip is the same as the
length of this line. (A straight line is shown below the statement of
the
question). The paper clip actually measures 1 inch, and the line
measures 4.5 inches. Correct answers were 4, 5, and any
number
between 4 and 5.5 (Yes that's right, not 3.5 to 5.5). Note that
this
is an example of multiple right answers.
Question T5: Craig folded a piece of paper in half and
cut out a shape (a picture of a folded page is given with a rough image
of the number 3 shown as cut out of the folded edge). The student is
asked
"to draw a picture to show what the cut-out shape will look like when
it
is opened up and flattened out." Multiple correct answers
included
a drawing of "the cut-out shape" and drawing of "the remaining
piece
of paper".
Question U2: Write a fraction that is larger than 2/7.
Multiple correct answers included 3/8.
Question L2: There is only one red marble in each of
these
bags (graphic shows 3 bags, one labeled 10 marbles, one labeled 100
marbles,
and one labeled 1000 marbles). Question: Without looking in the
bags,
you are to pick a marble out of one of the bags. Which bag would
give you the greatest chance of picking the red marble?
Yes, it's trivial, but only if you have previously seen
questions
of this type and understand the concept of "chance".
Internationally,
only 51% of 4th graders answered correctly.
Somewhere in the Directions it Said "No Calculators"!
If calculators were allowed, many TIMSS 4th grade math questions would
be trivial:
-
Question I4: What is 3 times 23?
-
Question I9: 6000 - 2369 = ?
-
Question J4: 25x18 is more than 24x18. How much more?
-
Question K2: 6971 + 5291 = ?
After an apparently bitter debate, it was decided that calculators
would
not be permitted for the TIMSS 4th grade math test.26
What a moral dilemma for the constructivist brethren in the United
States!
Their first commandment is "calculators in kindergarten". The
second
is "the child shall use the calculator whenever the child thinks such
use
is appropriate". Now, for this one test, they were expected to
deny
these fundamental beliefs.
We assume that Test Administrators in the United States did the
right
thing. Of course, with no quality control monitor observations in
the United States, we'll have to rely on faith.
TIMSS Documentation
Reference 1. TIMSS
Technical Manual, Volume 1: Design and Development
Reference 2. TIMSS
Quality Assurance in Data Collection
Reference 3. TIMSS
Released Item Sets - Set for Population 1 (Third and Fourth
Grades)
Reference 4: TIMSS
Mathematics in the Primary School Years
Reference 5. A
TIMSS Primer: (Fordham Report by Harold W. Stevenson)
Reference 6: TIMSS
Primary School Years News, Statement by Dr. Albert Beaton
Reference 7: US
TIMSS Steering Committee Members
Reference 8: Pursuing
Excellence NCES (Study of 4th Grade TIMSS)
Numbered References: (See Superscript Numbers in
Text)
1. Reference 4, Pages A-13 through A16 in Appendix A
2. Reference 1, Pages 3-4 through 3-9
3. Reference 7, Page 2
4. Reference 4, Pages B2 and B3 in Appendix B
5. Reference 1, Pages 2-3 through 2-6
6. Reference 5, Tables 3 and 5 (Pages 11 and 15 in hardcopy)
7. Reference 8, Chapter 1, Special Notes On The Test Scores
8. Reference 4, Page A-14 in Appendix A
9. Reference 8, Chapter 2, Are U.S. Fourth-grade Teachers Better
Trained?
10. Reference 6, Page 1
11. Reference 2, Page 3-1
12. Reference 1, Page 3-1
13. Reference 2, Page 3-8
14. Reference 2, Page 4-2 and 4-3.
15 Reference 2, Page 4-4
16. Reference 2, Page 4-4
17. Reference 2, Page 4-6
18. Reference 2, Page 3-10
19. Reference 2, Page 4-5
20. Reference 2, Page 4-9
21. Reference 2, Page 1-5
22 .Reference 1, Page 8-3
23. Reference 2, Page 3-10
24. Reference 2, Page 3-10
25. Reference 1, Page 2-19
26. Reference 1, Page 2-19
Copyright
1998-2011
William G. Quirk, Ph.D.