Exploring Exam Scores with Finite Mixture Models!

The Background:

     I just recently taught a session of CJ 292 - Introduction to Research Methods. This is the first time I have ever taught a class of my own, so I was understandably excited (my students - it was a mixed bag). As part of the class, I tried to collect as much data from my students as possible. I created a survey to measure their interest and prior knowledge of research methods and statistics before the class started. Additionally, I had all of their graded classwork to examine after the course was finished. In general, I was very pleased with how the class went. Everyone passed the course, and the class average was about an 80%. The histograms below show the distribution of scores for each of the 3 exams.

Scores for exam 1, 2 and 3. With the exception of a few low scorers on the first exam, the scores are pretty consistently around the 75-80% range.

Scores for exam 1, 2 and 3. With the exception of a few low scorers on the first exam, the scores are pretty consistently around the 75-80% range.

     Many analyses performed are what we call "between-subjects" - that is, the comparisons are between the sum of some variable for each unit of analysis. In this case, a between-subjects design might simply look at the average exam score between students. While this would be a valid analysis, it isn't particularly interesting. Furthermore, it doesn't really tell us much more than the histograms reveal - there are some high scorers, some low scorers, and a large group of people somewhere in the middle. This is what we often call informally a bell curve.

     Of course, what I was really interested in was how the scores of individual students changed over time. My general thought was that most students who did well at the beginning of the course would continue to do well, and those who did poorly would continue to do poorly.In order to do this, I would need to perform a "within-subjects" design, where I look at how each individual's exam score changes over time.

Mixture Modeling

     In statistics we most often examine data which come from a single probability distribution. For instance, a Poisson regression assumes the data come from a single data generating process which has a constant mean and variance. But what do we do if our data has several sub-populations? In our case, my assumption is that there are several "latent" sub-groups of exam-takers, rather than there being a single group which will generally perform the same on all three exams. Finite mixture models are a good way of identifying latent subgroups, and we can use this format to apply it longitudinally.

     In criminology, Jones, Nagin, and Roeder (1999) developed a way to identify latent groups over time which is generally called "group-based trajectory models." In their case, they looked at the offending behavior of adolescents as they aged. What they found was that most youth followed a trajectory of increasing offending behavior around the mid-to-late teen years, which then dropped off as they "aged-out" of crime. Other researchers have extended this research to identify groups of street blocks which have varying rates of crime over time (Weisburd, Bushway, Lum, & Yang, 2004). However, there's nothing about this method which has to be applied to a criminology context. In this case I use a similar methodology to look at individual changes of exam scores.

Changes in Exam Scores

 In R I used the 'lcmm' package to fix a 3 group mixture model to exam scores. The three time periods corresponded to each of the three exams - so for each individual I had 3 measurement taken. Because I only had 108 observations (36 students x 3 exams) I utilized a linear time trend and linear link function. I assumed the model residuals would be approximately Gaussian as well.

     Success! The model fit a fairly well and identified three latent groups. The first picture below shows a group of high-scoring individuals, a group of middle scoring individuals, and 2 individuals who had low first exam scores and improved somewhat after that. However, this visualization is a mess, and makes it difficult to see the trends. 

Three group mixture model. Each individual line corresponds to a student. The colors correspond to the latent group assigned by the model.

Three group mixture model. Each individual line corresponds to a student. The colors correspond to the latent group assigned by the model.

     I faceted the groups so we can look at each one individually. Interestingly, the top scoring group exhibited some behavior I hadn't considered. While they all had high initial scores, they appeared to be dropping over time. In contrast, the "middle-of-the-road" group appeared to stay relatively stable, or even increase a bit for exam 3! Indeed, when I looked at the model results I saw that group 2 had the highest intercept, but also had a very negative slope. Group 1 had a lower intercept but a slope quite near 0 - indicating little change over time. This left me thinking - why would students who started off so strong have dropping exam scores?

Three group mixture model faceted by group type. Group 1 had a mostly middle of the road intercept and didn't change much. Group 2 had high scores which decreased over time. Group 3 represented two students who had some initially difficulties, but improved somewhat.

Three group mixture model faceted by group type. Group 1 had a mostly middle of the road intercept and didn't change much. Group 2 had high scores which decreased over time. Group 3 represented two students who had some initially difficulties, but improved somewhat.

What Does it All Mean?

     Based on this analysis, I went back and looked at the scores of the students in the second group - the ones who had high initial exam scores which dropped as the semester went on. To my surprise, these were my top-scoring students! Most of them had perfect or nearly-perfect attendance, had aced their assignments, and had done most of the extra credit I had available. So why were they doing more poorly on their last two exams?

    Looking at the way my class was designed, it looked like these students pretty much knew they had an 'A' locked in before the last exam. Therefore, since the results of the last exam were unlikely to affect their final grade, they probably didn't study much for it at all. On the other hand, students who were between grades, or needed to improve probably studied more and performed more consistently.

    As an educator, this made me think: maybe I need to change my incentives to make performing well throughout the class more important, rather than just front-loading points at the beginning of the semester.