Today Brett Gilley and I ran a workshop “Two Stage Exams: Learning Together?” at the University of British Columbia Okanagan Learning Conference. Great fun was had and many ideas exchanged. Slides below.
Disclosure: my colleague, Georg Rieger, and I are currently in the process of securing post-doc funding to evaluate the effectiveness of Learning Catalytics and that position would be paid in part by Pearson, who owns Learning Catalytics.
I have been using Learning Catalytics, web-based “clickers on steroids” software, in a lecture course and a lab course since the start of September. In this post I want to focus on the logistical side of working with Learning Catalytics in comparison to clickers, and just touch briefly on the pedagogical benefits.
I will briefly summarize my overall pros and cons of with using Learning Catalytics before diving into the logistical details:
- Pro: Learning Catalytics enables a lot of types of questions that are not practical to implement using clickers. We have used word clouds, drawing questions (FBDs and directions mostly), numerical response, choose all that apply, and ranking questions. Although all of these question types, aside from the word clouds, are possible as multiple-choice if you have or are able to come up with good distractors, Learning Catalytics lets you collect their actual answers instead of having them give you best guess from a selection of answers that you give to them.
- Con: Learning Catalytics is clunky. The bulk of this post will discuss these issues, but Learning Catalytics has larger hardware requirements, relies heavily on good wifi and website performance, is more fiddly to run as an instructor, and is less time-efficient than just using clickers (in the same way that using clickers is less time-efficient than using coloured cards).
- Pro: The Learning Catalytics group tool engages reluctant participants in a way that no amount of buy-in or running around the classroom trying to get students to talk to each other seems to be able to do. When you do the group revote potion of Peer Instruction (the “turn to your neighbor” part), Learning Catalytics tells the students exactly who to talk to (talk to Jane Doe, sitting your your right) and matches them up with somebody that answered differently than them. Although this should not be any different than instructing them to “find somebody nearby that answered differently than you did and convince them that you are correct,” there ends up being a huge difference in practice in how quickly they start these discussions and what fraction of the room seems to engage in these discussions.
Honestly, the first two points make it so that I would favour clickers a bit, but the difference in the level of engagement thanks to the group tool is the thing that has sold me on Learning Catalytics. Onto the logistical details.
As you can see from the picture, I use a lot of devices when I teach with Learning Catalytics. You can get away with fewer devices, but this is the solution that meets my needs. I have tried some different iterations, and what I describe here is the one that I have settled on.
- In the middle you will find my laptop, which runs my main slide deck and is permanently projected on one of the two screens in the room. It has a bamboo writing tablet attached to it to mark up slides live and will likely be replaced by Surface Pro 3 in the very near future.
- At the bottom is my tablet (iPad), which I use to run the instructor version of Learning Catalytics. This is where I start and stop polls, choose when and how to display results to the students and other such instructorly tasks. The screen is never shared with the students and is analogous to the instructor remote with little receiver+display box that I use with iclickers. Since it accesses Learning Catalytics over wifi and is not projected anywhere, I can wander around the room with it in my hand and monitor the polls while talking to students. Very handy! I have also tried to do this from my smartphone when my tablet battery was dead, but the instructor UI is nowhere near as good for the smartphone as it is for larger tablets or regular web browsers.
- At the top is a built-in PC which I use to run the student version of Learning Catalytics. This displays the Learning Catalytics content that students are seeing on their devices at any moment. I want to have this projected for two reasons. First, I like to stomp around and point at things when I am teaching so I want the question currently being discussed or result currently being displayed to be something that I can point at and focus their attention on instead of it just being on the screens of their devices. Second, I need the feedback of what the students see at any moment to make sure that the question or result that I intended to push to their devices has actually been pushed to their devices. For the second point, it is reasonable to flip back and forth between instructor and student view on the device running Learning Catalytics (this is what one of my colleagues does successfully), but I find that a bit clunky and it still doesn’t meet my stomping around and pointing at stuff need. The instructor version of Learning Catalytics pops up a student view and this is what I use here (so technically I am logged in as an instructor on two devices at once). The student view that pops up with the instructor version makes better use of the projected screen real estate (e.g., results are shown along the side instead of at the bottom) than the student version that one gets when logging in using a student account.
The trade-off when going from clickers to Learning Catalytics is that you gain a bunch of additional functionality, but in order to do so you need to take on a somewhat clunky and less time-efficient system. There are additional issues that may not be obvious from just the hardware setup described above.
- I am using 3 computer-type devices instead of a computer and clicker base. Launching Learning Catalytics on a device takes only a bit longer than plugging in my iclicker base and starting the session, but this is still one or two more devices to get going (again, my choice and preference to have this student view). Given the small amount of of time that we typically have between gaining access to a room and the time at which we start a class, each extra step in this process introduces another possible delay in starting class on time. With 10 minutes, I find I am often cutting it very close and sometimes not ready quite on time. In two of approximately twelve lectures where I intended to use Learning Catalytics this term, there was a wifi or Learning Catalytics website problem. Once I just switched to clickers (they have them for their next course) and the other time the problem resolved quickly enough that it just cost us a bit of time. When I remember to do so, I can save myself a bit of time by starting the session on my tablet before I leave my office.
- The workflow of running a Learning Catalytics question is very similar to running a clicker question, but after six weeks of using Learning Catalytics, clickers feel like they have a decent-sized advantage in the “it just works” category. There are many more choices with the Learning Catalytics software, and with that a loss of simplicity. Since I did have the experience a few weeks ago of using clickers instead of Learning Catalytics, I can say that the “it just works” aspect of the clickers was reinforced.
- Overall, running a typical Learning Catalytics question feels less time-efficient than a clicker question. It takes slightly longer to start the question, for them to answer and then to display the results. This becomes amplified slightly because many of the questions we are using require the students to have more complicated interactions with the question than just picking one of five answers. All that being said, my lecture TA and I noted last week that it felt like we finally got to a point where running a multiple-choice question in Learning Catalytics felt very similar in time from beginning to end as with clickers. To get to this point, I have had to push the pace quite a bit with these questions, starting my “closing the poll” countdown when barely more than half of the answers are in. So I think I can run multiple choice questions with similar efficiency on both systems now, but I am having to actively force the timing in the case of Learning Catalytics. However, having to force the timing may be a characteristic of the students in the course more than the platform.
- Batteries! Use of Learning Catalytics demands that everybody has a sufficiently charged device or ability to plug their device in, including the instructor. This seems a bit problematic if students are taking multiple courses using the system in rooms where charging is not convenient.
- Preparing for class also has additional overhead. We have been preparing the lecture slides in the same way as usual and then porting any questions we are using from the slides into Learning Catalytics. This process is fairly quick, but still adds time to the course preparation process. Where it can become a bit annoying, is that sometimes the slide and Learning Catalytics versions of the question aren’t identical due to a typo or modification that was made on one platform, but accidentally not on the other There haven’t been a ton of these, but it is one more piece that makes using Learning Catalytics a bit clunky.
- In its current incarnation, it seems like one could use Learning Catalytics to deliver all the slides for a course, not just the questions. This would be non-ideal for me because I like to ink up my slides while I am teaching, but this would allow one to get rid of the need for a device that was projecting the normal slide deck.
An instructor needs to be willing to take on a lot of overhead, inside the class and out, if they want to use Learning Catalytics. For courses where many of the students are reluctant to engage enthusiastically with the peer discussion part of the Peer Instruction cycle, the group tool functionality can make a large improvement in that level of engagement. The additional question types are nice to have, but feel like they are not the make or break feature of the system.
In an attempt to get back into semi-regular blogging, I am setting aside my pile have half-written posts and am going to share the work that I presented at the 2014 AAPT Summer Meeting and PERC 2014.
The quick and dirty version is that I was able to run a study, looking at the effectiveness of group exams, in a three-section course which enrolled nearly 800 students. The image below summarizes the study design, which was repeated for each of the two midterms.
Results: As previously mentioned, I went through this cycle for each of the two midterms. For midterm 2, which took place only 1-2 weeks prior to the end-of-term diagnostic, students that saw a given question on their group exam outperformed those that did not on the matched questions from the end-of-term diagnostic. Huzzah! However, for the end-of-term diagnostic questions matched with the midterm 1 questions, which took place 6-7 weeks prior to the end-of-term diagnostic, there were no statistically significant differences between those that saw the matched questions on their group exams vs. not. The most likely explanation is that the learning from both control and treatment decays over time, thus so does the difference between these groups. After 1-2 weeks, there is still a statistically significant difference, but after 6-7 weeks there is not. It could also be differences in the questions associated with midterm 1 vs midterm 2. For some of the questions, it is possible that the concepts were not well separated enough within the questions so group exam discussions may have helped them improve their learning of concepts for questions that weren’t on their particular group exam. I hope to address these possibilities in a study this upcoming academic year.
Will I abandon this pedagogy? Nope. The group exams may not provide a measurable learning effect which lasts all the way to the end of the term for early topics, but I am more than fine with that. There is a short-term learning effect and the affective benefits of the group exams are extremely important:
- One of the big ones is that these group exams are effectively Peer Instruction in an exam situation. Since we use Peer Instruction in this course, this means that the assessment and the lecture generate buy-in for each other.
- There are a number of affective benefits, such as increased motivation to study, increased enjoyment of the class, and lower failure rates, which have been shown in previous studies (see arXiv link to my PERC paper for more on this). Despite my study design, which had the students encountering different subsets of the original question on their group exams, all students participated in the same intervention from the perspective of the affective benefits.
I had some great conversations with old friends, new friends and colleagues. I hope to expand on some of the above based on these conversations and feedback from the referees on the PERC paper, but that will be for another post.
In my mind it is hard to get students to do pre-class homework (“pre-reading”) with much more than an 80% completion rate when averaged out over the term. It usually starts higher than this, but there is a slow trend toward less completion as the term wears on. After taking a more careful look at the five introductory courses in which I used pre-class assignment I have discovered that I was able to do much better than 80% in some of the later courses and want to share my data.
Descriptions of the five courses
The table below summarizes some of the key differences between each of the five introductory physics courses in which I used pre-class assignments. It may also be important to note that the majority of the students in Jan 2010 were the same students from Sep 2009, but not much more than half of the Jan 2013 students took my Sep 2012 course. For Jan 2013 only two of the students had previously taken a course with me.
|Course||Textbook||Contribution to overall course grade||Median completion rate (the numbers in brackets show the 1st and 3rd quartiles)|
|Sep 2009 (Mechanics)||Young & Freedman – University Physics 11e||Worth 8%, but drop 3 worst assignments. No opportunities for late submission or earning back lost marks.||0.73 (0.62,0.79)|
|Jan 2010 (E&M)||Young & Freedman – University Physics 11e||Worth 10%, but drop 2 worst assignments. No opportunities for late submission or earning back lost marks.||0.78 (0.74,0.89)|
|Sep 2011 (Mechanics)||smartPhysics||Worth 8%. Did not drop any assignments, but they could (re)submit at any point up until the final exam and earn half marks.||0.98 (0.96,0.98)|
|Jan 2012 (E&M)||smartPhysics||Worth 8%. Did not drop any assignments, but they could (re)submit at any point up until the final exam and earn half marks.||0.94 (0.93,0.98)|
|Jan 2013 (E&M)||Halliday, Resnick & Walker – Fundamentals of Physics 9e & smartPhysics multimedia presentations||Worth 10%. Did not drop any assignments, but they could (re)submit at any point up until the final exam and earn half marks.||0.93 (0.87,0.97)|
Overall the style of question used was the same for each course, with the most common type of question being a fairly straight-forward clicker question (I discuss the resources and assignments a bit more in the next paragraph). I have not crunched the numbers, but scanning through results from the Jan 2013 course shows that students are answering the questions correctly somewhere in the 65-90% range and the questions used in that course were a mishmash of the Jan 2010 and Jan 2012 courses. Every question would have an “explain your answer” part. These assignments were graded on completion only, but their explanation had to show a reasonable level of effort to earn these completion marks. Depending on class size, I did not always read their explanations in detail, but always scanned every answer. For the first couple of assignments I always made sure to send some feedback to each student which would include an explanation of the correct answer if they answered incorrectly. Each question would also be discussed in class.
A rundown of how the resources and assignments varied by class:
- For Sept 2009 and Jan 2010 I used a Blackboard assignment to give them the three questions each week and told them which sections of the textbook to read, and I didn’t do much to tell them to skip passages or examples that weren’t directly relevant.
- For Sept 2010 and Jan 2012 I used smartPhysics (link to UIUC PER group page, where they were developed). These consist of multimedia presentations for each chapter/major topic, which have embedded conceptual questions (no student explanations required for these). After they are done the multimedia presentation, they then answer the actual pre-class questions, which are different from those embedded in the multimedia presentation. For the most part, the questions in their pre-class assignments were similar to the ones I was previously using except for the smartPhysics ones were often more difficult. Additionally, my one major criticism of smartPhysics is that I don’t feel they are pitched at the appropriate level for a student encountering the material for the first time. For more on this, have a look at the second bullet in the “Random Notes” section of this post I did on pre-class assignments (link). One of the very nice things about smartphysics is that everything (the regular homework, the pre-class assignments and the multimedia presentations) all used the same web system.
- For January 2013, I was back on assigning the pre-class assignments through Blackboard. The preamble for each of the pre-class assignments pointed them toward a smartPhysics multimedia presentation and the relevant sections of the textbook we were using. Students could use one, the other or both of these resources as they felt fit. I don’t think I ever surveyed them on their use of one over the other, but anecdotally I had the sense that way more were using the multimedia presentations.
I present two graphs showing the same data from different perspectives. Figure 1 shows how the fraction of the class completing a given pre-class assignment varies over the course of the term. There is a noticeable downward trend in each course. Figure 2 shows the fraction of assignments completed by each student in each class.
There is clearly a large difference between the first two courses and the final three in terms of the rates at which students were completing these pre-class assignments. The fact that I saw 98% of these assignments completed one term is still shocking to me. I’m not sure how much each of the following factors contributed to the changes, but here are some of the potential factors…
- Multimedia presentations – students seem to find these easier to consume than reading the textbook. There is a study [Phys. Rev. ST Physics Ed. Research 8, 010103 (2012)] from
Homeyra Sadaghiani at California State Polytechnic University where she did a controlled study comparing the multimedia presentations to readings in textbooks, and used the same pre-class assignments for both. In addition to finding that the multimedia presentation group did slightly better on the exams, she also found that the students had a favorable attitude toward the usefulness of the multimedia presentations, but that the textbook group had an unfavorable attitude toward the textbook reading assignments. But she also mentions that the multimedia group had a more favorable attitude toward clicker questions than the textbook section, and this alone could explain the difference in test performance as opposed to it having to do with the learning which takes place as part of the pre-class assignments. If the students in one section are buying into how the course is being run more than another, they are going to do a better job of engaging with all of the learning opportunities and as a result should be learning more. There are a variety of reasons why reading the textbook may be preferred to have them watching a video or multimedia presentation, but you can’t argue with the participation results.
- Generating buy-in – I have certainly found that, as time wears on, I have gotten better at generating buy-in for the way that my courses are run. I have gotten better at following up on the pre-class assignments in class and weaving the trends from their submissions into the class. However, for the Sep 2009 and Jan 2010 courses, that was the most personal feedback I have ever sent to students in an intro course on their pre-class assignments so I might have expected that getting better at generating buy-in might cancel out the decreased personal feedback.
- Changes in grading system – This may be a very large one and is tied to generating buy-in. For the first two courses I allowed them to drop their worst 3 or 2 pre-class assignments from their overall grade. In the later courses, I changed the system to being one where they could even submit the assignments late for half credit, but were not allowed to drop any. In the latter method I am clearly communicating to the students I think it is worth their time to complete all of the assignments.
In poking around through the UIUC papers and those from Sadaghiani, that 98% completion rate from my Sept 2011 course is really high, but is going to be an overestimate of how many people actually engaged with the pre-class material as opposed to trying to bluff their way through it. The smartPhysics system also gave them credit for completing the questions embedded in the multimedia presentations and I’m not presenting those numbers here, but when I scan the gradebooks, those that received credit for doing their pre-class assignments also always received credit for completing the embedded questions in the multimedia presentations. But, it is possible to skip slides to get to those so that doesn’t mean they actually fully consumed those presentations. Based on reviewing their explanations each week (with different degrees of thoroughness) and then docking grades accordingly, I would estimate that maybe 1 or 2 students managed to bluff their way through each week without actually consuming the presentation. That translates to 2-3% of these pre-class assignments.
Sadaghiani reported “78% of the MLM students completed 75% or more of the MLMs”, where MLM is what I have been calling the multimedia presentations. Colleagues of mine at UBC found (link to poster) that students self-reported to read their textbooks regularly in a course that used a quiz-based pre-class assignment (meaning that students were given marks for being correct as opposed to just participating, and in this case were not asked to explain their reasoning). 97% of the students actually took the weekly quizzes, but there is a discrepancy in numbers between those that took the quizzes and those that actually did the preparation.
With everything I have discussed here in mind, it seems that 80% or better is a good rule of thumb number for buy-in for pre-class activities, and that one can do even better than that with some additional effort.
This term I eliminated the weekly homework assignment from my calc-based intro physics course and replaced it with a weekly practice quiz (not for marks in any way), meant to help them prepare for their weekly quiz. There’s a post coming discussing why I have done this and how it has worked, but a la Brian or Mylene, I think it can be valuable to post this student feedback.
I asked a couple of clicker questions related to how they use the practice quizzes and how relevant they find the practice quiz questions in preparing them for the real quizzes. I also handed out index cards and asked for extra comments.
Aside from changing from homework assignments to practice quizzes, the structure of my intro course remains largely the same. I get them to do pre-class assignments, we spend most of our class time doing clicker questions and whiteboard activities, and there is a weekly two-stage quiz (individual then group). I have added a single problem (well, closer to an exercise) to each weekly quiz, where in the past I would infrequently ask them to work a problem on a quiz.
Clicker Question 1
Clicker Question 2
Just from a quick scan of the individual student responses on this one, I saw that the students with the highest quiz averages (so far) tended to answer A or B, where the students with the lower quiz averages tended to answer B or C. I will look at the correlations more closely at a later date, but I find that this is a really interesting piece of insight.
Additional Written Feedback
Most of the time I ask the students for some feedback after the first month and then continue to ask them about various aspects of the course every couple of weeks. In some courses I don’t do such a great job with the frequency.
Usually, for this first round of feedback, the additional comments are dominated by frustration toward the online homework system (I have used Mastering Physics and smartPhysics), requests/demands for me to do more examples in class, and some comments on there being a disconnect between the weekly homework and the weekly quiz. As you can see below, there is none of that this time. The practice quizzes, the inclusion of a problem on each weekly quiz, and perhaps the provided learning goals, seem to do a pretty good job of communicating my expectations to them (and thus minimize their frustration).
Student comments (that were somewhat on topic)
- I feel like the practice quizzes would be more helpful if I did them more often. I forget that they have been posted so maybe an extra reminder as class ends would help.
- The wording is kind of confusing then I over think things. I think it’s just me though. Defining the terms and the equations that go with each question help but the quizzes are still really confusing…
- Curveball questions are important. Memorize concepts not questions. Changes how students approach studying.
- The group quizzes are awesome for verbalizing processes to others. I like having the opportunity to have “friendly arguments” about question we disagree on
- I love the way you teach your class Joss! The preclass assignments are sometimes annoying, but they do motivate me to come to class prepared
- I enjoy this teaching style. I feel like I am actually learning physics, as opposed to just memorizing how to answer a question (which has been the case in the past).
- I really enjoy the group quiz section. It gets a debate going and makes us really think about the concepts. Therefore making the material stick a lot better.
Last thought: With this kind of student feedback, I like to figure out a couple of things that I can improve or change and bring them back to the class as things I will work on. It looks like I will need to ask them a weekly feedback question which asks them specifically about areas of potential improvement in the course.
One of my brief studies, based on data from a recent introductory calculus-based course, was to look at the effect of immediate feedback in an exam situation. The results show that, after being provided with immediate feedback on their answer to the first of two questions which tested the same concept, students had a statistically significant improvement in performance on the second question.
Although I used immediate feedback for multiple questions on both the term test and final exam in the course, I only set up the experimental conditions discussed below for one question.
The question I used (Figure 1) asked about the sign of the electric potential at two different points. A common student difficulty is to confuse the procedures of finding electric potential (a scalar quantity) and electric field (a vector quantity) for a given charge distrubution. The interested reader might wish to read a study by Sayre and Heckler (link to journal, publication page with direction link to pdf).
Experimental design and results
There were three versions of the exam, with one version of this question appearing on two exams (Condition 1, 33 students) and the other version of this question appearing on the third exam (Condition 2, 16 students). For each condition, they were asked to answer the first question (Q1), using an IFAT scratch card for one of the points (Condition 1 = point A; Condition 2 = point B). With the scratch cards, they scratch their chosen answer and if they chose correctly they will see a star. If they were incorrect, they could choose a different answer and if they were correct on their second try, they received half the points. If they had to scratch a third time to find the correct answer, they received no marks. No matter how they did on the first question, they will have learned the correct answer to that question before moving on to the second question, which asked for the potential at the other point (Cond1 = point B; Cond2 = point A). The results for each condition and question are shown in Table 1.
|Q1 (scratch card question)||Q2 (follow-up question)|
|Condition 1||Point A: 24/33 correct = 72.7±7.8%||Point B: 28/33 correct = 84.8±6.2%|
|Condition 2||Point B: 8/16 correct = 50.0±12.5%||Point A: 10/16 correct = 62.5±12.1%|
Table 1: Results are shown for each of the conditions. In condition 1, they answered the question for point A and received feedback, using the IFAT scratch card, before moving on to answer the question for point B. In condition 2, they first answered the question for point B using the scratch card and then moved on to answering the question for point A.
So that I can look at the improvement from all students when going from the scratch card question (Q1) to the follow-up question (Q2), I need to show that there is no statistically significant difference between how the students answered the question for point A and point B. Figure 2 shows that a two-tailed repeated-measures t-test fails to reject the null hypothesis, that the mean performance for point A and B are the same. Thus we have no evidence that these questions are different, which means we can move on to comparing how the students performed on the the follow-up question (Q2) as compared to the scratch card question (Q1).
Figure 3 shows a 12.2% improvement from the scratch card question (Q1) to the follow-up question (Q2). Using a one-tailed repeated-measures t-test (it was assumed that performance on Q2 would be better than Q1), the null-hypothesis is rejected at a level of p = 0.0064. Since I have made two comparisons using these same data, a Bonferroni correction should be applied. The result of this correction is there were statistically significant differences at the p = 0.05/2 = 0.025 level, which means improvement from Q1 to Q2 was statistically significant.
In additional to reproducing these results using multiple questions, I would also like to examine if these results hold true for some different conditions. Additional factors which could be examined include difference disciplines, upper-division vs. introductory courses and questions which target different levels of Bloom’s taxonomy.
Note: I found a paper that looks at the effect of feedback on follow-up questions as part of exam preparation and discuss it in more detail in this follow-up post.
Let me start off by saying that, as a student, I found oral exams to be very intimidating and frustrating. I could see their value as assessment tools, but found that in practice they were simply a source of personal dread. Enter 2012 where I am using oral assessments with my own students, but what I have done is try to minimize what I found intimidating and frustrating about oral exams. I have made my oral assessments kinder and gentler.
The strengths of oral assessments
In my opinion, the strengths of oral assessments are a result of their interactive nature.
If a student is stuck on a minor point, or even a major one, you can give them a hint or use some leading questions to help them along. Compare this to what happens if a student gets stuck on a written exam question and you can see how the oral assessment provides you with a much better assessment of student understanding than an incomplete or nonsensical written response.
Another strength is that no ambiguity need be left unturned. If some sort of ambiguous statement comes out of a student’s mouth, you can ask them to clarify or expand on what they have said instead of dealing with the common grader’s dilemma of sitting in front of a written response trying to make judgement calls related to ambiguous student work.
Some other benefits are that marking is a breeze (I will discuss my specific marking scheme later) and I have also found that I can generate “good” oral exam questions much more quickly than I can written ones.
My perception of the weaknesses of traditional oral assessments
The following are common, but not universal characteristics of oral assessments.
Public –Looking dumb in front of me may not be fun, but it is far more comfortable than looking dumb in front of a room full of your peers or discipline experts. Having spent some time on both sides of the desk, I don’t feel that my students ever “look dumb”, but as a student I remember feeling dumb on many occasions (here I will also include comprehensive exams, dissertation defences and question periods after oral presentations in my definition of oral assessments). I guess I’m saying that it feels worse than it looks, but doing it in public makes it feel even worse.
A lack of time to think – This is actually my biggest beef with oral assessments. In a written assessment you can read the question, collect your thoughts, brain-storm, make some mistakes, try multiple paths, and then finally try to put together a cohesive answer. I realize that you can do all these things in an oral assessment as well, but there is a certain time pressure which hangs over your head during an oral assessment. And there is a difference between privately pursuing different paths before coming to a desired one and having people scrutinize your every step while you do this.
Inauthentic – By inauthentic, I mean that oral exams (and for the most part, written ones too) isolate you from resources and come with some sort of urgent time pressure. If we are confronted with a challenging problem or question in the real world, we usually have access to the internet, textbooks, journals and even experts. We are able to use those resources to help build or clarify our understanding before having to present our solution. On the flip side, we can also consider the question period after a presentation as a real-world assessment and we are usually expected to have answers at our fingertips without consulting any resources so arguments can be made for and against the authenticity of an oral assessment.
Context (Advanced Lab)
Before I break down my kinder, gentler oral exams, I want to discuss the course in which I was using them. This course was my Advanced Lab (see an earlier post) where students work in pairs on roughly month-long experimental physics projects. One students is asked to be in charge of writing about the background and theory and the other the experimental details, and then on the second project they switch. For their oral assessments I used the same set of questions for both partners, but the actual questions (see below) were very project-specific. My hope was that using the same questions for both partners would have forced them to pay much closer attention to what the other had written.
It took at most a total of 2 hours to come up with the 6 sets of questions (12 students total in the course) and then 9 hours of actual oral exams which comes out to less than an hour per student. I would say that this is roughly equivalent to the time I would have spent creating and marking that many different written exams, but this was much more pleasant for me than all that marking.
Kinder, gentler oral exams
I will describe the format that I use and then highlight some of the key changes that I made to improve on what I perceive to be the weaknesses of traditional oral exams.
I book a 45-minute time slot for each student and they come to my office one at a time. When they show up in my office I have 3 questions for them. They have 10 minutes to gather their thoughts and use whatever resources that they brought (including using the internet, but not consulting with somebody) to help formulate some coherent answers. I also give them a nice big whiteboard to use how they see fit. Once their 10 minutes are up (it is not uncommon for them to take a couple extra minutes if they want that little bit of extra time), they are asked to answer the questions in whatever order would please them. For each question I try, but not always successfully, to let them get their answer out before I start asking clarification, leading or follow-up questions. If they are on the completely wrong track or get stuck I will step in much earlier. If the leading questions do not help them get to the correct answer, we will discuss the question on the spot until I feel like the student “gets” the answer. Sometimes these discussions would immediately follow the question and sometimes I would wait until after they have had a chance to answer all three questions. After they have answered all three questions and we have discussed the correct answers, I pull out the rubric (see below) and we try to come to consensus on their grade for each question. They leave my office with a grade and knowledge of the correct answer to all three questions.
The key changes:
- Private – I have them come to my office and do the assessment one-on-one instead of in front of the whole class.
- 10 minutes to collect their thoughts and consult resources – It is similar to the perceived safety blanket offered by an open book exam. Students that were well-prepared rarely used the entire time and students that were not well-prepared tried to cram but did not do very well since I would always ask some clarification or follow-up questions. I have some post-course feedback interviews planned to learn more about the student perspective on this, but my perception is that the preparation time was helpful, even for the well-prepared students. It gave them a chance to build some confidence in their answers and I often delighted in how well they were able to answer their questions. I think that time also offered an opportunity to get some minor details straight, which is beneficial in terms of confidence building and improving the quality of their answers. And finally, knowing that they had that 10 minutes of live prep time seemed to reduce their pre-test stress.
- Immediate feedback – Discussing the correct answer with the student immediately after they have answered a question is a potential confidence killer. I suspect that the students would prefer to wait until after they have answered all the questions before discussing the correct answers, and I am interested to see what I will learn in my feedback interviews.
- Grading done as collaborative process with the student – In practice I would usually suggest a grade for a question and mention some examples from their answer (including how much help they needed from me) and then ask them if they thought that was fair. If they felt they should have earned a higher grade, they were invited to give examples of how their answer fell in the higher rubric category and there were many occasions where those students received higher grades. However, the problem is that this is a squeaky wheel situation and it is hard to figure out if it is entirely fair to all students. For cases where I asked students to tell me what grade they thought they earned before saying anything myself, students were far more likely to self-assess lower than I would have assessed them than to self-assess higher than I would have assessed them.
The grading rubric used was as follows:
|100%||Greatly exceeds expectations||The students displayed an understanding which went far beyond the scope of the question.|
|90%||Exceeds expectations||Everything was explained correctly without leading questions.|
|75%||Meets expectations||The major points were all explained correctly, but some leading questions were needed to help get there. There may have been a minor point which was not explained correctly.|
|60%||Approaching expectations||There was a major point or many minor points which were not explained correctly. The student was able to communicate an overall understanding which is correct.|
|45%||Below expections||Some of the major points were explained correctly, but the overall explanation was mostly incorrect.|
|30%||Far below expectations||Some of the minor points were explained correctly, but the overall explanation was mostly incorrect.|
Some example questions
- I would pull a figure from their lab report and ask them to explain the underlying physics or experimental details that led to a specific detail in the figure.
- “Run me through the physics of how you were able to get a current into the superconducting loop. Why did you have to have the magnet in place before the superconducting transition?”
- “Describe the physics behind how the Hall gave a voltage output which is proportional (when zeroed) to the external field. How do the external magnetic field and the hall sensor need to be oriented with respect to each other?”
- “Explain superconductivity to me in a way which a student, just finishing up first-year science, would understand.”
Electron-spin resonance experiment
- “Discuss how the relative alignment between your experiment and the Earth’s magnetic field might affect your results.”
- “In what ways did your detector resolution not agree with what was expected according to the lab manual? What are some reasonable steps that you could take to TRY to improve this agreement?”
Some other directions to take oral assessments
A couple of my blogger buddies have also been writing about using oral assessments and really like what they are up to as well.
Andy Rundquist has written quite a bit about oral assessments (one example) because they are quite central to his Standards-Based Grading implements. One of the things that he has been doing lately is giving a student a question ahead of time and asking them to prepare a page-length solution to the question to bring to class. In class the student projects their solution via doc-cam, Andy studies it a bit, and then he starts asking the student questions. To my mind this is most similar to the question period after a presentation. The student has had some time, in isolation, to put together the pieces to answer the question, and the questions are used to see how well they understood all the pieces required to put together the solution. Another thing that Andy does is gets the whole class to publicly participate in determining the student’s overall grade on that assessment. I love that idea, but feel like I have some work to do in terms of creating an appropriate classroom environment to do that.
Bret Benesh wrote a couple of posts (1, 2) discussing his use of oral exams. His format it closer to mine than it is to Andy’s, but Bret’s experience was that even if they knew the exam question ahead of time, he could easily tell the difference between students that understood their answers and those that did not. I really want to try giving them the questions ahead of time now.
One final note
I am giving a short AAPT talk on my kinder, gentler oral exams, so any feedback that will help with my presentation will be greatly appreciated. Are there certain points which were not, but should have been emphasized?