# Summer 2012 Research, Part 1: Immediate feedback during an exam

One of my brief studies, based on data from a recent introductory calculus-based course, was to look at the effect of immediate feedback in an exam situation. The results show that, after being provided with immediate feedback on their answer to the first of two questions which tested the same concept, students had a statistically significant improvement in performance on the second question.

Although I used immediate feedback for multiple questions on both the term test and final exam in the course, I only set up the experimental conditions discussed below for one question.

## The question

The question I used (Figure 1) asked about the sign of the electric potential at two different points. A common student difficulty is to confuse the procedures of finding electric potential (a scalar quantity) and electric field (a vector quantity) for a given charge distrubution. The interested reader might wish to read a study by Sayre and Heckler (link to journal, publication page with direction link to pdf).

Figure 1. Two insulating bars, each of length L, have charges distributed uniformly upon them. The one on the left has a charge +Q uniformly distributed on it and the one on the right has a charge -Q uniformly distributed on it. Assume that V=0 at a distance that is infinitely far away from these insulating bars. Is the potential positivenegative or zero at point A? At point B?

## Experimental design and results

There were three versions of the exam, with one version of this question appearing on two exams (Condition 1, 33 students) and the other version of this question appearing on the third exam (Condition 2, 16 students). For each condition, they were asked to answer the first question (Q1), using an IFAT scratch card for one of the points (Condition 1 = point A; Condition 2 = point B). With the scratch cards, they scratch their chosen answer and if they chose correctly they will see a star. If they were incorrect, they could choose a different answer and if they were correct on their second try, they received half the points. If they had to scratch a third time to find the correct answer, they received no marks. No matter how they did on the first question, they will have learned the correct answer to that question before moving on to the second question, which asked for the potential at the other point (Cond1 = point B; Cond2 = point A). The results for each condition and question are shown in Table 1.

 Q1 (scratch card question) Q2 (follow-up question) Condition 1 Point A: 24/33 correct = 72.7±7.8% Point B: 28/33 correct = 84.8±6.2% Condition 2 Point B: 8/16 correct = 50.0±12.5% Point A: 10/16 correct = 62.5±12.1%

Table 1: Results are shown for each of the conditions. In condition 1, they answered the question for point A and received feedback, using the IFAT scratch card, before moving on to answer the question for point B. In condition 2, they first answered the question for point B using the scratch card and then moved on to answering the question for point A.

So that I can look at the improvement from all students when going from the scratch card question (Q1) to the follow-up question (Q2), I need to show that there is no statistically significant difference between how the students answered the question for point A and point B. Figure 2 shows that a two-tailed repeated-measures t-test fails to reject the null hypothesis, that the mean performance for point A and B are the same. Thus we have no evidence that these questions are different, which means we can move on to comparing how the students performed on the the follow-up question (Q2) as compared to the scratch card question (Q1).

Figure 2. A two-sided repeated-measures t-test shows that there is no statistically significant difference in performance on the question for points A and B.

Figure 3 shows a 12.2% improvement from the scratch card question (Q1) to the follow-up question (Q2). Using a one-tailed repeated-measures t-test (it was assumed that performance on Q2 would be better than Q1), the null-hypothesis is rejected at a level of p = 0.0064. Since I have made two comparisons using these same data, a Bonferroni correction should be applied. The result of this correction is there were statistically significant differences at the p = 0.05/2 = 0.025 level, which means improvement from Q1 to Q2 was statistically significant.

Figure 3. A one-sided repeated-measures t-test shows that there is a statistically significant improvement in performance on the scratch card (Q1, 65.3±6.8%) and follow-up (Q2, 77.5±6.0%) questions.

## Future work

In additional to reproducing these results using multiple questions, I would also like to examine if these results hold true for some different conditions. Additional factors which could be examined include difference disciplines, upper-division vs. introductory courses and questions which target different levels of Bloom’s taxonomy.

Note: I found a paper that looks at the effect of feedback on follow-up questions as part of exam preparation and discuss it in more detail in this follow-up post.

# Kinder, Gentler Oral Exams

Let me start off by saying that, as a student, I found oral exams to be very intimidating and frustrating. I could see their value as assessment tools, but found that in practice they were simply a source of personal dread. Enter 2012 where I am using oral assessments with my own students, but what I have done is try to minimize what I found intimidating and frustrating about oral exams. I have made my oral assessments kinder and gentler.

## The strengths of oral assessments

In my opinion, the strengths of oral assessments are a result of their interactive nature.

If a student is stuck on a minor point, or even a major one, you can give them a hint or use some leading questions to help them along. Compare this to what happens if a student gets stuck on a written exam question and you can see how the oral assessment provides you with a much better assessment of student understanding than an incomplete or nonsensical written response.

Another strength is that no ambiguity need be left unturned. If some sort of ambiguous statement comes out of a student’s mouth, you can ask them to clarify or expand on what they have said instead of dealing with the common grader’s dilemma of sitting in front of a written response trying to make judgement calls related to ambiguous student work.

Some other benefits are that marking is a breeze (I will discuss my specific marking scheme later) and I have also found that I can generate “good” oral exam questions much more quickly than I can written ones.

## My perception of the weaknesses of traditional oral assessments

The following are common, but not universal characteristics of oral assessments.

Public –Looking dumb in front of me may not be fun, but it is far more comfortable than looking dumb in front of a room full of your peers or discipline experts. Having spent some time on both sides of the desk, I don’t feel that my students ever “look dumb”, but as a student I remember feeling dumb on many occasions (here I will also include comprehensive exams, dissertation defences and question periods after oral presentations in my definition of oral assessments). I guess I’m saying that it feels worse than it looks, but doing it in public makes it feel even worse.

A lack of time to think – This is actually my biggest beef with oral assessments. In a written assessment you can read the question, collect your thoughts, brain-storm, make some mistakes, try multiple paths, and then finally try to put together a cohesive answer. I realize that you can do all these things in an oral assessment as well, but there is a certain time pressure which hangs over your head during an oral assessment. And there is a difference between privately pursuing different paths before coming to a desired one and having people scrutinize your every step while you do this.

Inauthentic – By inauthentic, I mean that oral exams (and for the most part, written ones too) isolate you from resources and come with some sort of urgent time pressure. If we are confronted with a challenging problem or question in the real world, we usually have access to the internet, textbooks, journals and even experts. We are able to use those resources to help build or clarify our understanding before having to present our solution. On the flip side, we can also consider the question period after a presentation as a real-world assessment and we are usually expected to have answers at our fingertips without consulting any resources so arguments can be made for and against the authenticity of an oral assessment.

## Context (Advanced Lab)

Before I break down my kinder, gentler oral exams, I want to discuss the course in which I was using them. This course was my Advanced Lab (see an earlier post) where students work in pairs on roughly month-long experimental physics projects. One students is asked to be in charge of writing about the background and theory and the other the experimental details, and then on the second project they switch. For their oral assessments I used the same set of questions for both partners, but the actual questions (see below) were very project-specific. My hope was that using the same questions for both partners would have forced them to pay much closer attention to what the other had written.

It took at most a total of 2 hours to come up with the 6 sets of questions (12 students total in the course) and then 9 hours of actual oral exams which comes out to less than an hour per student. I would say that this is roughly equivalent to the time I would have spent creating and marking that many different written exams, but this was much more pleasant for me than all that marking.

## Kinder, gentler oral exams

I will describe the format that I use and then highlight some of the key changes that I made to improve on what I perceive to be the weaknesses of traditional oral exams.

The key changes:

• Private – I have them come to my office and do the assessment one-on-one instead of in front of the whole class.
• 10 minutes to collect their thoughts and consult resources – It is similar to the perceived safety blanket offered by an open book exam. Students that were well-prepared rarely used the entire time and students that were not well-prepared tried to cram but did not do very well since I would always ask some clarification or follow-up questions. I have some post-course feedback interviews planned to learn more about the student perspective on this, but my perception is that the preparation time was helpful, even for the well-prepared students. It gave them a chance to build some confidence in their answers and I often delighted in how well they were able to answer their questions. I think that time also offered an opportunity to get some minor details straight, which is beneficial in terms of confidence building and improving the quality of their answers. And finally, knowing that they had that 10 minutes of live prep time seemed to reduce their pre-test stress.
• Immediate feedback – Discussing the correct answer with the student immediately after they have answered a question is a potential confidence killer. I suspect that the students would prefer to wait until after they have answered all the questions before discussing the correct answers, and I am interested to see what I will learn in my feedback interviews.
• Grading done as collaborative process with the student – In practice I would usually suggest a grade for a question and mention some examples from their answer (including how much help they needed from me) and then ask them if they thought that was fair. If they felt they should have earned a higher grade, they were invited to give examples of how their answer fell in the higher rubric category and there were many occasions where those students received higher grades. However, the problem is that this is a squeaky wheel situation and it is hard to figure out if it is entirely fair to all students. For cases where I asked students to tell me what grade they thought they earned before saying anything myself, students were far more likely to self-assess lower than I would have assessed them than to self-assess higher than I would have assessed them.

The grading rubric used was as follows:

Grade Meets Expections? Criteria
100% Greatly exceeds expectations The students displayed an understanding which went far beyond the scope of the question.
90% Exceeds expectations Everything was explained correctly without leading questions.
75% Meets expectations The major points were all explained correctly, but some leading questions were needed to help get there. There may have been a minor point which was not explained correctly.
60% Approaching expectations There was a major point or many minor points which were not explained correctly. The student was able to communicate an overall understanding which is correct.
45% Below expections Some of the major points were explained correctly, but the overall explanation was mostly incorrect.
30% Far below expectations Some of the minor points were explained correctly, but the overall explanation was mostly incorrect.
0% x_X X_x

## Some example questions

General

• I would pull a figure from their lab report and ask them to explain the underlying physics or experimental details that led to a specific detail in the figure.

Superconductor experiment

• “Run me through the physics of how you were able to get a current into the superconducting loop. Why did you have to have the magnet in place before the superconducting transition?”
• “Describe the physics behind how the Hall gave a voltage output which is proportional (when zeroed) to the external field. How do the external magnetic field and the hall sensor need to be oriented with respect to each other?”
• “Explain superconductivity to me in a way which a student, just finishing up first-year science, would understand.”

Electron-spin resonance experiment

• “Discuss how the relative alignment between your experiment and the Earth’s magnetic field might affect your results.”

Gamma-ray spectroscopy

• “In what ways did your detector resolution not agree with what was expected according to the lab manual? What are some reasonable steps that you could take to TRY to improve this agreement?”

## Some other directions to take oral assessments

A couple of my blogger buddies have also been writing about using oral assessments and really like what they are up to as well.

Andy Rundquist has written quite a bit about oral assessments (one example) because they are quite central to his Standards-Based Grading implements. One of the things that he has been doing lately is giving a student a question ahead of time and asking them to prepare a page-length solution to the question to bring to class. In class the student projects their solution via doc-cam, Andy studies it a bit, and then he starts asking the student questions. To my mind this is most similar to the question period after a presentation. The student has had some time, in isolation, to put together the pieces to answer the question, and the questions are used to see how well they understood all the pieces required to put together the solution. Another thing that Andy does is gets the whole class to publicly participate in determining the student’s overall grade on that assessment. I love that idea, but feel like I have some work to do in terms of creating an appropriate classroom environment to do that.

Bret Benesh wrote a couple of posts (1, 2) discussing his use of oral exams. His format it closer to mine than it is to Andy’s, but Bret’s experience was that even if they knew the exam question ahead of time, he could easily tell the difference between students that understood their answers and those that did not. I really want to try giving them the questions ahead of time now.

## One final note

I am giving a short AAPT talk on my kinder, gentler oral exams, so any feedback that will help with my presentation will be greatly appreciated. Are there certain points which were not, but should have been emphasized?

# Tier 2 Canada Research Chair – Teaching and Learning at University of the Fraser Valley

Please pass this along to anybody you know that does research in the broad field of teaching and learning and is interested in setting up shop in beautiful British Columbia. I would love to see a strong physics or other science ed researcher get awarded this chair. The competition closes August 3rd.
The meaty bit of the posting (http://www.ufv.ca/es/Careers/Faculty/Re-Post_2011_185.htm):

The successful applicant will hold a doctoral degree (obtained in the last ten years) and will be an outstanding emerging scholar who has demonstrated innovation and a proven ability to cultivate multidisciplinary, collaborative partnerships in local, national, and international research networks. The candidate must possess an original and independent research program in the general area of teaching and learning, the use of new teaching technologies and innovative pedagogical approaches relevant to the post-secondary education level.

The goals of the CRC program (www.chairs-chaires.gc.ca) are to promote leading edge research and the training of highly qualified personnel at universities.