AAPTSM14/PERC2014: Measuring the Learning from Two-Stage Collaborative Group Exams

In an attempt to get back into semi-regular blogging, I am setting aside my pile have half-written posts and am going to share the work that I presented at the 2014 AAPT Summer Meeting and PERC 2014.

  • Poster (link)
  • PERC paper (link to first submission on arXiv)

The quick and dirty version is that I was able to run a study, looking at the effectiveness of group exams, in a three-section course which enrolled nearly 800 students. The image below summarizes the study design, which was repeated for each of the two midterms.


The study design...

(First row) The students all take/sit/write their exams individually. (Second row) After all the individual exams have been collected, they self-organize into collaborative groups of 3-4. There were three different versions of the group exam (conditions A, B and C), each having a different subset of the questions from the individual exam. It was designed so that, for each question (1-6), 1/3 of the groups would not see that question on their group exam (control) and the other 2/3rds would (treatment). (Bottom row) They were retested on the end-of-term diagnostic using questions which matched, in terms of application of concept, the original 6 questions from the midterm. 

Results: As previously mentioned, I went through this cycle for each of the two midterms. For midterm 2, which took place only 1-2 weeks prior to the end-of-term diagnostic, students that saw a given question on their group exam outperformed those that did not on the matched questions from the end-of-term diagnostic. Huzzah! However, for the end-of-term diagnostic questions matched with the midterm 1 questions, which took place 6-7 weeks prior to the end-of-term diagnostic, there were no statistically significant differences between those that saw the matched questions on their group exams vs. not. The most likely explanation is that the learning from both control and treatment decays over time, thus so does the difference between these groups. After 1-2 weeks, there is still a statistically significant difference, but after 6-7 weeks there is not. It could also be differences in the questions associated with midterm 1 vs midterm 2. For some of the questions, it is possible that the concepts were not well separated enough within the questions so group exam discussions may have helped them improve their learning of concepts for questions that weren’t on their particular group exam. I hope to address these possibilities in a study this upcoming academic year.


Will I abandon this pedagogy? Nope. The group exams may not provide a measurable learning effect which lasts all the way to the end of the term for early topics, but I am more than fine with that. There is a short-term learning effect and the affective benefits of the group exams are extremely important:

  • One of the big ones is that these group exams are effectively Peer Instruction in an exam situation. Since we use Peer Instruction in this course, this means that the assessment and the lecture generate buy-in for each other.
  • There are a number of affective benefits, such as increased motivation to study, increased enjoyment of the class, and lower failure rates, which have been shown in previous studies (see arXiv link to my PERC paper for more on this). Despite my study design, which had the students encountering different subsets of the original question on their group exams, all students participated in the same intervention from the perspective of the affective benefits.


I had some great conversations with old friends, new friends and colleagues. I hope to expand on some of the above based on these conversations and feedback from the referees on the PERC paper, but that will be for another post.


4 Comments on “AAPTSM14/PERC2014: Measuring the Learning from Two-Stage Collaborative Group Exams”

  1. Jared Stang says:

    Hey Joss,

    I like your image to summarize the study. Very nice!

    How surprised were you that you saw no effect for the midterm one questions?


    • Joss Ives says:

      Hi Jared,

      When I designed the study, the possible time effect had not even occurred to me. So I was extremely surprised that there was a difference between the two midterms. In hindsight, work from Ellie Sayre and colleagues using their response curve methodology (e.g., http://dx.doi.org/10.1119/1.4890508) shows that student understanding decays over the time-scale of a course, so the time effect perhaps should have been expected.

  2. ambarr512 says:

    Logistics questions for you: Did you have any students who are granted extra time for exams or exams in reduced destruction environments? If so, how did you do the group portion for those students? Thanks.

    • Joss Ives says:

      Those students typically have a standard fractional increase of time that they are allowed so we have those students join the larger group for the group portion after writing in the reduced distraction environment. We build in enough extra time for them to make their way over as well.

