Summer 2012 Research, Part 1: Immediate feedback during an exam

One of my brief studies, based on data from a recent introductory calculus-based course, was to look at the effect of immediate feedback in an exam situation. The results show that, after being provided with immediate feedback on their answer to the first of two questions which tested the same concept, students had a statistically significant improvement in performance on the second question.

Although I used immediate feedback for multiple questions on both the term test and final exam in the course, I only set up the experimental conditions discussed below for one question.

The question

The question I used (Figure 1) asked about the sign of the electric potential at two different points. A common student difficulty is to confuse the procedures of finding electric potential (a scalar quantity) and electric field (a vector quantity) for a given charge distrubution. The interested reader might wish to read a study by Sayre and Heckler (link to journal, publication page with direction link to pdf).

Figure 1. Two insulating bars, each of length L, have charges distributed uniformly upon them. The one on the left has a charge +Q uniformly distributed on it and the one on the right has a charge -Q uniformly distributed on it. Assume that V=0 at a distance that is infinitely far away from these insulating bars. Is the potential positivenegative or zero at point A? At point B?

Experimental design and results

There were three versions of the exam, with one version of this question appearing on two exams (Condition 1, 33 students) and the other version of this question appearing on the third exam (Condition 2, 16 students). For each condition, they were asked to answer the first question (Q1), using an IFAT scratch card for one of the points (Condition 1 = point A; Condition 2 = point B). With the scratch cards, they scratch their chosen answer and if they chose correctly they will see a star. If they were incorrect, they could choose a different answer and if they were correct on their second try, they received half the points. If they had to scratch a third time to find the correct answer, they received no marks. No matter how they did on the first question, they will have learned the correct answer to that question before moving on to the second question, which asked for the potential at the other point (Cond1 = point B; Cond2 = point A). The results for each condition and question are shown in Table 1.

Q1 (scratch card question) Q2 (follow-up question)
Condition 1 Point A: 24/33 correct = 72.7±7.8% Point B: 28/33 correct = 84.8±6.2%
Condition 2 Point B: 8/16 correct = 50.0±12.5% Point A: 10/16 correct = 62.5±12.1%

Table 1: Results are shown for each of the conditions. In condition 1, they answered the question for point A and received feedback, using the IFAT scratch card, before moving on to answer the question for point B. In condition 2, they first answered the question for point B using the scratch card and then moved on to answering the question for point A.

So that I can look at the improvement from all students when going from the scratch card question (Q1) to the follow-up question (Q2), I need to show that there is no statistically significant difference between how the students answered the question for point A and point B. Figure 2 shows that a two-tailed repeated-measures t-test fails to reject the null hypothesis, that the mean performance for point A and B are the same. Thus we have no evidence that these questions are different, which means we can move on to comparing how the students performed on the the follow-up question (Q2) as compared to the scratch card question (Q1).

Figure 2. A two-sided repeated-measures t-test shows that there is no statistically significant difference in performance on the question for points A and B.

Figure 3 shows a 12.2% improvement from the scratch card question (Q1) to the follow-up question (Q2). Using a one-tailed repeated-measures t-test (it was assumed that performance on Q2 would be better than Q1), the null-hypothesis is rejected at a level of p = 0.0064. Since I have made two comparisons using these same data, a Bonferroni correction should be applied. The result of this correction is there were statistically significant differences at the p = 0.05/2 = 0.025 level, which means improvement from Q1 to Q2 was statistically significant.

Figure 3. A one-sided repeated-measures t-test shows that there is a statistically significant improvement in performance on the scratch card (Q1, 65.3±6.8%) and follow-up (Q2, 77.5±6.0%) questions.

Future work

In additional to reproducing these results using multiple questions, I would also like to examine if these results hold true for some different conditions. Additional factors which could be examined include difference disciplines, upper-division vs. introductory courses and questions which target different levels of Bloom’s taxonomy.

Note: I found a paper that looks at the effect of feedback on follow-up questions as part of exam preparation and discuss it in more detail in this follow-up post.

My BCAPT Presentation on Group Quizzes

I forgot to post this. I gave a talk on group quizzes at the BCAPT AGM (local AAPT chapter) nearly a month ago. It was based on the same data analysis as a poster that I presented the previous year (two-stage group quiz posts 0 and 1), but I added some comparisons to other similar studies.

My presentation:


I’m in the middle of some data analysis for data I collected during the past year and will be presenting my initial findings at FFPER-PS 2012.

Student opinions on what contributes to their learning in my intro E&M course

We are a couple of weeks away from our one and only term test in my intro calc-based electricity and magnetism course. This test comes in the second last week of the course and I pitch it to them as practicing for the final. This term test is worth 10-20% of their final grade and the final exam 30-40% of their final grade and these relative weights are meant to maximize the individual student’s grade.

Today I asked them how they feel the major course components are contributing to their learning:

How much do you feel that the following course component has contributed to your learning so far in this course?

This is a bit vague, but I told them to vote according to what contributes to their understanding of the physics in this course. It doesn’t necessarily mean what makes them feel the most prepared for the term test, but if that is how they wanted to interpret it, that would be just fine.

For each component that I discuss below, I will briefly discuss how it fits into the overall course. And you should have a sense of how the whole course works by the end.

The smartphysics pre-class assignments

The pre-class assignments are the engine that allow my course structure to work they way I want it to and I have been writing about them a lot lately (see my most recent post in a longer series). My specific implementation is detailed under ‘Reading assignments and other “learning before class” assignments’ in this post. The quick and dirty explanation is that, before coming to class, my students watch multimedia prelectures that have embedded conceptual multiple-choice questions. Afterward they answer 2-4 additional conceptual multiple-choice questions where they are asked to explain the reasoning behind each of their choices. They earn marks based on putting in an honest effort to explain their reasoning as opposed to choosing the correct answer. Then they show up to class ready to build on what they learned in the pre-class assignment.

Smartphysics pre-class assignments

A) A large contribution to my learning.
B) A small contribution to my learning, so I rarely complete them.
C) A small contribution to my learning, but they are worth marks so I complete them.
D) No contribution to my learning, so I rarely complete them.
E) No contribution to my learning, but they are worth marks so I complete them


The smartphysics online homework

The homework assignments are a combination of “Interactive Examples” and multi-part end-of-chapter-style problems.

The Interactive Examples tend to be fairly long and challenging problems where the online homework system takes the student through multiple steps of qualitative and quantitative analysis to arrive at the final answer. Some students seem to like these questions and others find them frustrating because they managed to figure out 90% of the problem on their own but are forced to step through all the intermediate guiding questions to get to the bit that is giving them trouble.

The multi-part end-of-chapter-style problems require, in theory, conceptual understanding to solve. In practice, I find that a lot of the students simply number mash until the correct answer comes out the other end, and then they don’t bother to step back and try to make sure that they understand why that particular number mashing combination gave them the correct answer. The default for the system (which is the way that I have left it) is that they can have as many tries as they like for each question and are never penalized as long as they find the correct answer. This seems to have really encouraged the mindless number mashing.

This is why their response regarding the learning value of the homework really surprised me. A sufficient number of them have admitted that they usually number mash, so I would have expected them not to place so much learning value on the homework.

The online smartphysics homework/b>

A) A large contribution to my learning.
B) A small contribution to my learning, so I rarely complete them.
C) A small contribution to my learning, but it is worth marks so I complete it.
D) No contribution to my learning, so I rarely complete it.
E) No contribution to my learning, but it is worth marks so I complete it


Studying for quizzes and other review outside of class time

Studying for quizzes and other review outside of class time

A) A large contribution to my learning.
B) A small contribution to my learning, but I do it anyway.
C) A small contribution to my learning so I don’t bother.
D) No contribution to my learning so I don’t bother.


Group quizzes

I have an older post that discusses these in detail, but I will summarize here. Every Friday we have a quiz. They write the quiz individually, hand it in, and then re-write the same quiz in groups. They receive instant feedback on their group quiz answers thanks to IF-AT multiple-choice scratch-and-win sheets and receive partial marks based on how many tries it took them to find the correct answer. Marks are awarded 75% for the individual portion and 25% for the group portion OR 100% for the individual portion if that would give them the better mark.

The questions are usually conceptual and often test the exact same conceptual step needed for them to get a correct answer on one of the homework questions (but not always with the same cover story). There are usually a lot of ranking tasks, which the students do not seem to like, but I do.

Group Quizzes

A) A large contribution to my learning.
B) A small contribution to my learning.
C) They don’t contribute to my learning.


Quiz Corrections

I have an older post that discusses these in detail, but I will again summarize here. For the quiz correction assignments they are asked, for each question, to diagnose what went wrong and then to generalize their new understanding of the physics involved. If they complete these assignments in the way I have asked, they earn back half of the marks they lost (e.g. a 60% quiz grade becomes 80%).

I am delighted to see that 42% of them find that these have a large contribution to their learning. The quizzes are worth 20% of their final grade, so I would have guessed that their perceived learning value would get lost in the quest for points.

Quiz Corrections

A) A large contribution to my learning.
B) A small contribution to my learning, so I rarely complete them.
C) A small contribution to my learning, but they are worth marks so I complete them.
D) No contribution to my learning, so I rarely complete them.
E) No contribution to my learning, but they are worth marks so I complete them.


In-class stuff

I am a full-on interactive engagement guy. I use clickers, in the question-driven instruction paradigm, as the driving force behind what happens during class time. Instead of working examples at the board, I either (A) use clicker questions to step the students through the example so that they are considering for themselves each of the important steps instead of me just showing them or (B) get them to work through examples in groups on whiteboards. Although I aspire to have the students report out there solutions in a future version of the course (“board meeting”), what I usually do when they work through the example on their whiteboards is wait until the majority of the groups are mostly done and then work through the example at the board with lots of their input, often generating clicker questions as we go.

The stuff we do in class

A) A large contribution to my learning.
B) A small contribution to my learning.
C) It doesn't contribute to my learning.


The take home messages

Groups quizzes rule! The students like them. I like them. The research tells us they are effective. Everybody wins. And they only take up approximately 10 minutes each week.

I need to step it up in terms of the perceived learning value of what we do in class. That 2/3rds number is somewhere between an accurate estimate and a small overestimate of the fraction of the students in class that at any moment are actively engaged with the task at hand. This class is 50% larger than my usual intro courses (54 students in this case) and I have been doing a much poorer job than usual of circulating and engaging individual students or groups during clicker questions and whiteboarding sessions. The other 1/3 of the students are a mix of students surfing/working on stuff for other classes (which I decided was something I was not going to fight in a course this size) and students that have adopted the “wait for him to tell us the answer” mentality. Peter Newbury talked about  these students in a recent post. I have lots of things in mind to improve both their perception and the actual learning value of what is happening in class. I will sit down and create a coherent plan of attack for the next round of courses.

I’m sure there are lots of other take home messages that I can pluck out of these data, but I will take make victory (group quizzes) and my needs improvement (working on the in class stuff) and look forward to continuing to work on course improvement.

Two-Stage Group Quizzes Part 0: Poster Presentation from FFPERPS 2011

This is Part 0 because I am just posting a link to the poster, as I presented it at FFPERPS 2011 (Foundations and Frontiers of Physics Education Research:  Puget Sound):

Click me to get full poster

Part 1 of this planned series of posts is where I go into some detail about the what, how and why (the intro section of the poster).

I am scheduled to present on this topic at the forthcoming April 13, 2011 Global Physics Department meeting, which takes place at 9:30 EDT. Please come join us in our elluminate session if you are interested (the more the merrier). We also have a posterous if you’re interested.

Two-Stage Group Quizzes Part 1: What, How and Why

Note: This is the first in a series of posts, and is based on my fleshing out in more detail a poster that I presented at the fantastic FFPERPS conference last week. The basic points of group exam pros and cons, and the related references borrow very heavily from Ref. 5 (thanks Brett!). All quoted feedback is from my students.


A two-stage group exam is form of assessment where students learn as part of the assessment. The idea is that the students write an exam individually, hand in their individual exams, and then re-write the same or similar exam in groups, where learning, volume and even fun are all had.

Instead of doing this for exams, I used the two-stage format for my weekly 20-30 minute quizzes. These quizzes would take place on the same day that the weekly homework was due (Mastering Physics so they had the instant feedback there as well), with the homework being do the moment class started. I used them for the first time Winter 2011 in an Introductory Calculus-based E&M course with an enrolment of 37 students.

The format of the weekly quiz (20-30 minutes total) was as follows:

  • The students wrote their 3-5 multiple-choice and short-answer question quiz individually and afterward handed these solo quizzes in.
  • The group quiz (typical group size = 3) consisted of most or all of individual questions as multiple-choice questions. Most questions had five choices.
  • The marks were weighted 75% for individual component and 25% for group component. The group component was not counted if it would drop an individual’s mark .

Immediate Feedback Assessment Technique (IF-AT)

IFAT card

The group quizzes were administered using IF-AT [1] cards, with correct answers being indicated by a star when covering is scratched off. This immediate feedback meant that the students always knew the correct answer by the end of the quiz, which was a time at which they were at their most curious, and thus the correct answer should be most meaningful to them. Typically, students that did not understand why a given answer was correct would call me over and ask for an explanation.

“I feel finding out the answers immediately after the quiz helps clarify where I went wrong instead of letting me think my answer was right over the weekend. It also works the same for right answers.”

My students found it to be a very fair marking system because they received partial marks based on their number of incorrect answers. My system was 2 points if you got it correct on the first try, and that went down by half for each wrong answer. So 1 wrong answer = 1 point credit, 2 wrong answers = 0.5 point credit, and so on.

The General Benefits of Group Exams

Peer interactions and feedback: Both students benefit when one explains an answer to the other.

“When I think I know what’s going on I can explain it and realize yes, I really do know what I’m talking about…and sometimes vice-versa.”

All students are given a good chance to participate: All students, including the shy or quiet, participate and are given a chance to explain their understanding and reasoning. I gave nine of these two-stage quizzes over the term and don’t remember seeing any students sitting on the sidelines letting the rest of their group figure it out. They were invested because their marks were on the line, and they genuinely (based on feedback) seemed to feel like it was a good opportunity for learning.

Development of collaboration skills [2]: I can’t really comment on this too much. These are not skills which I was able to notice a difference over the course of the term, but I would certainly believe that many students had some tangible level of development of these types of skills.

Students enjoy them and there is an increase in overall enjoyment of course [2]: The claim about the increase in overall enjoyment from Ref. 2 is not something I can comment on since I changed many things in this course from the first time I taught it so couldn’t pinpoint one thing which lead to higher student evaluations and a more positive classroom culture than the first time I taught this course (note this was only my second time teaching this course). But I can certainly tell you the feedback I got regarding the group quizzes was overwhelmingly positive.

“I (heart) group quizzes!!! I always learn, and it’s nice to argue out your thinking with the group. It really helps me understand. Plus, scratching the boxes is super fun.”

They promote higher-level thinking [3]: This claim from Ref. 3 is another one for which I cannot say I looked for, nor saw any evidence.

Increase in student retention of information (maybe) [4]: This is at the heart of a study which I plan to run this coming year. Ref. 4 saw a slight increase in student retention of information on questions for which students completed the group quizzes. But this study did not control for differences in time-on-task between the students that completed just the individual exams and those that complete individual and group exams. More discussion on my planned study in a future post.

Improved student learning [3,5]: I saw the same sort of evidence for this that is shown in Ref. 5. Groups of students, where none of them had the correct answer for a question on their solo quizzes, were able to get the correct answer on the group quiz nearly half the time. This shift from nobody having the correct answer to the group figuring out the correct answer is a very nice sign of learning.

Some Specific Benefits I Saw in My Implementation

The feedback comes when it is most useful to them: Immediate feedback via IF-AT provides them with correct answer when they are very receptive, after having spent time on their own and in group discussion pondering the question.

“It’s a good time to discuss and it’s the perfect time to learn, ‘cause right after the quiz, the ideas and thoughts stick to your mind. It’s gonna leave a great impression.”

Very high engagement from all students: I observed my students to be highly engaged with the questions at hand and were very excited (high-fives and other celebrations were delightfully common) when they got a challenging question correct.

Reduced student anxiety: due to (a) knowing that they could earn marks even if they were incorrect on individual portion, and (b) knowing that they would come away from the quiz knowing the correct answers. Point (a) is pure speculation on my part. Part (b) was a point made by multiple students when I asked them to provide feedback on the group quiz process.

Some Drawbacks to Group Exams

Longer exam/quiz time: This really wasn’t a big deal. It was typically less than an extra 10 minutes and the advantages were far too great to not give up that extra little bit of class time.

Some students feel very anxious about group work and interactions: This never came up in the feedback I received from the students, but I have friends and family who have discussed with me how much they dislike group work. Perhaps this specific implementation might have even been to their liking.

Social loafers and non-contributors earn same marks as the rest of the group: To my mind the potential student frustration from this was greatly moderated by all students writing the solo quizzes, as well as the group portion being worth only 25% of total quiz mark. And as I mentioned earlier, I do not remember noticing a non-contributor even once over the term.

Dominant group members can lead group astray when incorrect: This is another thing which, to my mind, the IF-AT sheets moderate greatly. Dominant group members can potentially minimize the contributions of other group members, but I do remember an “ease-your-mind” about dominant student issues point made by Jim Sibley when I first learned of IF-AT. Jim Sibley is at the University of British Columbia and is a proponent of Team-Based Learning. In this learning environment they use the IF-AT cards for reading quizzes at the start of a module. He told us that groups often go to the shy or quiet members as trusted answer sources when dominant group members are repeatedly incorrect.



[2] Stearns, S. (1996). Collaborative Exams as Learning Tools. College Teaching, 44, 111–112.

[3] Yuretich, R., Khan, S. & Leckie, R. (2001). Active-learning methods to improve student performance and scientific interest in a large introductory oceanography course. Journal of Geoscience Education, 49, 111–119.

[4] Cortright, R.N., Collins, H.L., Rodenbaugh D.W. & DiCarlo, S.T. (2003). Student retention of course content is improved by collaborative-group testing, Advan. Physiol. Edu. 27: 102-108.

[5]Gilley, B. & Harris, S. (2010). Group quizzes as a learning experience in an introductory lab, Poster presented at Geological Society of America 2010 Annual Meeting.


March 26, 2011 – Added: “These quizzes would take place on the same day that the weekly homework was due (Mastering Physics so they had the instant feedback there as well), with the homework being do the moment class started.”