Summer 2012 Research, Part 1: Immediate feedback during an examPosted: July 25, 2012
One of my brief studies, based on data from a recent introductory calculus-based course, was to look at the effect of immediate feedback in an exam situation. The results show that, after being provided with immediate feedback on their answer to the first of two questions which tested the same concept, students had a statistically significant improvement in performance on the second question.
Although I used immediate feedback for multiple questions on both the term test and final exam in the course, I only set up the experimental conditions discussed below for one question.
The question I used (Figure 1) asked about the sign of the electric potential at two different points. A common student difficulty is to confuse the procedures of finding electric potential (a scalar quantity) and electric field (a vector quantity) for a given charge distrubution. The interested reader might wish to read a study by Sayre and Heckler (link to journal, publication page with direction link to pdf).
Experimental design and results
There were three versions of the exam, with one version of this question appearing on two exams (Condition 1, 33 students) and the other version of this question appearing on the third exam (Condition 2, 16 students). For each condition, they were asked to answer the first question (Q1), using an IFAT scratch card for one of the points (Condition 1 = point A; Condition 2 = point B). With the scratch cards, they scratch their chosen answer and if they chose correctly they will see a star. If they were incorrect, they could choose a different answer and if they were correct on their second try, they received half the points. If they had to scratch a third time to find the correct answer, they received no marks. No matter how they did on the first question, they will have learned the correct answer to that question before moving on to the second question, which asked for the potential at the other point (Cond1 = point B; Cond2 = point A). The results for each condition and question are shown in Table 1.
|Q1 (scratch card question)||Q2 (follow-up question)|
|Condition 1||Point A: 24/33 correct = 72.7±7.8%||Point B: 28/33 correct = 84.8±6.2%|
|Condition 2||Point B: 8/16 correct = 50.0±12.5%||Point A: 10/16 correct = 62.5±12.1%|
Table 1: Results are shown for each of the conditions. In condition 1, they answered the question for point A and received feedback, using the IFAT scratch card, before moving on to answer the question for point B. In condition 2, they first answered the question for point B using the scratch card and then moved on to answering the question for point A.
So that I can look at the improvement from all students when going from the scratch card question (Q1) to the follow-up question (Q2), I need to show that there is no statistically significant difference between how the students answered the question for point A and point B. Figure 2 shows that a two-tailed repeated-measures t-test fails to reject the null hypothesis, that the mean performance for point A and B are the same. Thus we have no evidence that these questions are different, which means we can move on to comparing how the students performed on the the follow-up question (Q2) as compared to the scratch card question (Q1).
Figure 3 shows a 12.2% improvement from the scratch card question (Q1) to the follow-up question (Q2). Using a one-tailed repeated-measures t-test (it was assumed that performance on Q2 would be better than Q1), the null-hypothesis is rejected at a level of p = 0.0064. Since I have made two comparisons using these same data, a Bonferroni correction should be applied. The result of this correction is there were statistically significant differences at the p = 0.05/2 = 0.025 level, which means improvement from Q1 to Q2 was statistically significant.
In additional to reproducing these results using multiple questions, I would also like to examine if these results hold true for some different conditions. Additional factors which could be examined include difference disciplines, upper-division vs. introductory courses and questions which target different levels of Bloom’s taxonomy.
Note: I found a paper that looks at the effect of feedback on follow-up questions as part of exam preparation and discuss it in more detail in this follow-up post.