Here in the States, most of us are now teaching to some sort of standardized test. The original goal of these tests was to provide data and feedback to increase student achievement nationwide. Unfortunately, now, the tests tend to define our curricula and, more often than not, punish schools in economically disadvantaged areas.
What if we actually want to try to salvage some data out of these tests, though? Like them or not, they do represent impressive data stores on our students and provide significant opportunities to address areas of weakness. Besides, if we don't address the areas of the test on which our students don't achieve, then we get "targeted for improvement," students jump to other school districts, enrollments decline, etc., etc. The bottom line? We better be looking hard at these data.
I wrote the other day about our own efforts to begin analyzing standardized test data longitudinally; the idea came up again at a school committee meeting tonight when one of the school committee members noted how useful it would be look over time at a student's or group of students' performance. This was in contrast to the canned reports the state provides us that simply show changes in a school's performance over time.
The Commonwealth of Massachusetts actually partnered with a software company to give us an online data exploration tool that is fairly useful for quick looks at school data and snapshots of student achievement, but serious data mining requires a better tool. Since MCAS data analysis has now made its way into my job description, I started downloading flat files with all of the test data for each year since 2002 when the tests were fully implemented. One CSV file per test administration (e.g., one for Grade 8 Math in 2003, one for Grade 7 English in 2006, etc.).
This would have been tolerable if the data were easily merged and concatenated, but because the data files are "wide" in nature (one record per student with fields for every value) and because the fieldsets collected each year have evolved over time, assembling a unified data source is painful at best. Hasn't anyone heard of relational databases? They've been around for a while and make it just a wee bit easier to drill down into (and across) data.
Even folks used to working with flat files know that if you add or delete fields collected at different time points, a few dummy fields to act as place holders can go a long ways towards making data usable. We are supposed to use these data, right?
Anyone else out there have prettier data coming to them on their standardized tests that are easy to analyze? I'd be especially interested in readers outside the States: Do national or local testing schemes end up providing you with data for some good ol' data-driven instruction?