Workshop slides for Gilley & Ives “Two Stage Exams: Learning Together?”

Today Brett Gilley and I ran a workshop “Two Stage Exams: Learning Together?” at the University of British Columbia Okanagan Learning Conference. Great fun was had and many ideas exchanged. Slides below.

Gilley Ives – Two Stage Exam Workshop – UBCO Learning Conference – 2017-05-04


Learning Catalytics workflow

Disclosure: my colleague, Georg Rieger, and I are currently in the process of securing post-doc funding to evaluate the effectiveness of Learning Catalytics and that position would be paid in part by Pearson, who owns Learning Catalytics.

photo

A whole lotta devices!

I have been using Learning Catalytics, web-based “clickers on steroids” software, in a lecture course and a lab course since the start of September. In this post I want to focus on the logistical side of working with Learning Catalytics in comparison to clickers, and just touch briefly on the pedagogical benefits.

I will briefly summarize my overall pros and cons of with using Learning Catalytics before diving into the logistical details:

  • Pro: Learning Catalytics enables a lot of types of questions that are not practical to implement using clickers. We have used word clouds, drawing questions (FBDs and directions mostly), numerical response, choose all that apply, and ranking questions. Although all of these question types, aside from the word clouds, are possible as multiple-choice if you have or are able to come up with good distractors, Learning Catalytics lets you collect their actual answers instead of having them give you best guess from a selection of answers that you give to them.
  • Con: Learning Catalytics is clunky. The bulk of this post will discuss these issues, but Learning Catalytics has larger hardware requirements, relies heavily on good wifi and website performance, is more fiddly to run as an instructor, and is less time-efficient than just using clickers (in the same way that using clickers is less time-efficient than using coloured cards).
  • Pro: The Learning Catalytics group tool engages reluctant participants in a way that no amount of buy-in or running around the classroom trying to get students to talk to each other seems to be able to do. When you do the group revote potion of Peer Instruction (the “turn to your neighbor” part), Learning Catalytics tells the students exactly who to talk to (talk to Jane Doe, sitting your your right) and matches them up with somebody that answered differently than them. Although this should not be any different than instructing them to “find somebody nearby that answered differently than you did and convince them that you are correct,” there ends up being a huge difference in practice in how quickly they start these discussions and what fraction of the room seems to engage in these discussions.

Honestly, the first two points make it so that I would favour clickers a bit, but the difference in the level of engagement thanks to the group tool is the thing that has sold me on Learning Catalytics. Onto the logistical details.

 

Hardware setup

As you can see from the picture, I use a lot of devices when I teach with Learning Catalytics. You can get away with fewer devices, but this is the solution that meets my needs. I have tried some different iterations, and what I describe here is the one that I have settled on.

  • In the middle you will find my laptop, which runs my main slide deck and is permanently projected on one of the two screens in the room. It has a bamboo writing tablet attached to it to mark up slides live and will likely be replaced by Surface Pro 3 in the very near future.
  • At the bottom is my tablet (iPad), which I use to run the instructor version of Learning Catalytics. This is where I start and stop polls, choose when and how to display results to the students and other such instructorly tasks. The screen is never shared with the students and is analogous to the instructor remote with  little receiver+display box that I use with iclickers. Since it accesses Learning Catalytics over wifi and is not projected anywhere, I can wander around the room with it in my hand and monitor the polls while talking to students. Very handy! I have also tried to do this from my smartphone when my tablet battery was dead, but the instructor UI is nowhere near as good for the smartphone as it is for larger tablets or regular web browsers.
  • At the top is a built-in PC which I use to run the student version of Learning Catalytics. This displays the Learning Catalytics content that students are seeing on their devices at any moment. I want to have this projected for two reasons. First, I like to stomp around and point at things when I am teaching so I want the question currently being discussed or result currently being displayed to be something that I can point at and focus their attention on instead of it just being on the screens of their devices. Second, I need the feedback of what the students see at any moment to make sure that the question or result that I intended to push to their devices has actually been pushed to their devices. For the second point, it is reasonable to flip back and forth between instructor and student view on the device running Learning Catalytics (this is what one of my colleagues does successfully), but I find that a bit clunky and it still doesn’t meet my stomping around and pointing at stuff need. The instructor version of Learning Catalytics pops up a student view and this is what I use here (so technically I am logged in as an instructor on two devices at once). The student view that pops up with the instructor version makes better use of the projected screen real estate (e.g., results are shown along the side instead of at the bottom) than the student version that one gets when logging in using a student account.

 

Other logistics

The trade-off when going from clickers to Learning Catalytics is that you gain a bunch of additional functionality, but in order to do so you need to take on a somewhat clunky and less time-efficient system. There are additional issues that may not be obvious from just the hardware setup described above.

  • I am using 3 computer-type devices instead of a computer and clicker base. Launching Learning Catalytics on a device takes only a bit longer than plugging in my iclicker base and starting the session, but this is still one or two more devices to get going (again, my choice and preference to have this student view). Given the small amount of of time that we typically have between gaining access to a room and the time at which we start a class, each extra step in this process introduces another possible delay in starting class on time. With 10 minutes, I find I am often cutting it very close and sometimes not ready quite on time. In two of approximately twelve lectures where I intended to use Learning Catalytics this term, there was a wifi or Learning Catalytics website problem. Once I just switched to clickers (they have them for their next course) and the other time the problem resolved quickly enough that it just cost us a bit of time. When I remember to do so, I can save myself a bit of time by starting the session on my tablet before I leave my office.
  • The workflow of running a Learning Catalytics question is very similar to running a clicker question, but after six weeks of using Learning Catalytics, clickers feel like they have a decent-sized advantage in the “it just works” category. There are many more choices with the Learning Catalytics software, and with that a loss of simplicity. Since I did have the experience a few weeks ago of using clickers instead of Learning Catalytics, I can say that the “it just works” aspect of the clickers was reinforced.
  • Overall, running a typical Learning Catalytics question feels less time-efficient than a clicker question. It takes slightly longer to start the question, for them to answer and then to display the results. This becomes amplified slightly because many of the questions we are using require the students to have more complicated interactions with the question than just picking one of five answers. All that being said, my lecture TA and I noted last week that it felt like we finally got to a point where running a multiple-choice question in Learning Catalytics felt very similar in time from beginning to end as with clickers. To get to this point, I have had to push the pace quite a bit with these questions, starting my “closing the poll” countdown when barely more than half of the answers are in. So I think I can run multiple choice questions with similar efficiency on both systems now, but I am having to actively force the timing in the case of Learning Catalytics. However, having to force the timing may be a characteristic of the students in the course more than the platform.
  • Batteries! Use of Learning Catalytics demands that everybody has a sufficiently charged device or ability to plug their device in, including the instructor. This seems a bit problematic if students are taking multiple courses using the system in rooms where charging is not convenient.
  • Preparing for class also has additional overhead. We have been preparing the lecture slides in the same way as usual and then porting any questions we are using from the slides into Learning Catalytics. This process is fairly quick, but still adds time to the course preparation process. Where it can become a bit annoying, is that sometimes the slide and Learning Catalytics versions of the question aren’t identical due to a typo or modification that was made on one platform, but accidentally not on the other  There haven’t been a ton of these, but it is one more piece that makes using Learning Catalytics a bit clunky.
  • In its current incarnation, it seems like one could use Learning Catalytics to deliver all the slides for a course, not just the questions. This would be non-ideal for me because I like to ink up my slides while I am teaching, but this would allow one to get rid of the need for a device that was projecting the normal slide deck.

 

In closing

An instructor needs to be willing to take on a lot of overhead, inside the class and out, if they want to use Learning Catalytics. For courses where many of the students are reluctant to engage enthusiastically with the peer discussion part of the Peer Instruction cycle, the group tool functionality can make a large improvement in that level of engagement. The additional question types are nice to have, but feel like they are not the make or break feature of the system.


AAPTSM14/PERC2014: Measuring the Learning from Two-Stage Collaborative Group Exams

In an attempt to get back into semi-regular blogging, I am setting aside my pile have half-written posts and am going to share the work that I presented at the 2014 AAPT Summer Meeting and PERC 2014.

  • Poster (link)
  • PERC paper (link to first submission on arXiv)

The quick and dirty version is that I was able to run a study, looking at the effectiveness of group exams, in a three-section course which enrolled nearly 800 students. The image below summarizes the study design, which was repeated for each of the two midterms.

 

The study design...

(First row) The students all take/sit/write their exams individually. (Second row) After all the individual exams have been collected, they self-organize into collaborative groups of 3-4. There were three different versions of the group exam (conditions A, B and C), each having a different subset of the questions from the individual exam. It was designed so that, for each question (1-6), 1/3 of the groups would not see that question on their group exam (control) and the other 2/3rds would (treatment). (Bottom row) They were retested on the end-of-term diagnostic using questions which matched, in terms of application of concept, the original 6 questions from the midterm. 

Results: As previously mentioned, I went through this cycle for each of the two midterms. For midterm 2, which took place only 1-2 weeks prior to the end-of-term diagnostic, students that saw a given question on their group exam outperformed those that did not on the matched questions from the end-of-term diagnostic. Huzzah! However, for the end-of-term diagnostic questions matched with the midterm 1 questions, which took place 6-7 weeks prior to the end-of-term diagnostic, there were no statistically significant differences between those that saw the matched questions on their group exams vs. not. The most likely explanation is that the learning from both control and treatment decays over time, thus so does the difference between these groups. After 1-2 weeks, there is still a statistically significant difference, but after 6-7 weeks there is not. It could also be differences in the questions associated with midterm 1 vs midterm 2. For some of the questions, it is possible that the concepts were not well separated enough within the questions so group exam discussions may have helped them improve their learning of concepts for questions that weren’t on their particular group exam. I hope to address these possibilities in a study this upcoming academic year.

 

Will I abandon this pedagogy? Nope. The group exams may not provide a measurable learning effect which lasts all the way to the end of the term for early topics, but I am more than fine with that. There is a short-term learning effect and the affective benefits of the group exams are extremely important:

  • One of the big ones is that these group exams are effectively Peer Instruction in an exam situation. Since we use Peer Instruction in this course, this means that the assessment and the lecture generate buy-in for each other.
  • There are a number of affective benefits, such as increased motivation to study, increased enjoyment of the class, and lower failure rates, which have been shown in previous studies (see arXiv link to my PERC paper for more on this). Despite my study design, which had the students encountering different subsets of the original question on their group exams, all students participated in the same intervention from the perspective of the affective benefits.

 

I had some great conversations with old friends, new friends and colleagues. I hope to expand on some of the above based on these conversations and feedback from the referees on the PERC paper, but that will be for another post.


Pre-class homework completion rates in my first-year courses

In my mind it is hard to get students to do pre-class homework (“pre-reading”) with much more than an 80% completion rate when averaged out over the term. It usually starts higher than this, but there is a slow trend toward less completion as the term wears on. After taking a more careful look at the five introductory courses in which I used pre-class assignment I have discovered that I was able to do much better than 80% in some of the later courses and want to share my data.

Descriptions of the five courses

The table below summarizes some of the key differences between each of the five introductory physics courses in which I used pre-class assignments. It may also be important to note that the majority of the students in Jan 2010 were the same students from Sep 2009, but not much more than half of the Jan 2013 students took my Sep 2012 course. For Jan 2013 only two of the students had previously taken a course with me.

Course Textbook Contribution to overall course grade Median completion rate (the numbers in brackets show the 1st and 3rd quartiles)
Sep 2009 (Mechanics) Young & Freedman – University Physics 11e Worth 8%, but drop 3 worst assignments. No opportunities for late submission or earning back lost marks. 0.73 (0.62,0.79)
Jan 2010 (E&M) Young & Freedman – University Physics 11e Worth 10%, but drop 2 worst assignments. No opportunities for late submission or earning back lost marks. 0.78 (0.74,0.89)
Sep 2011 (Mechanics) smartPhysics Worth 8%. Did not drop any assignments, but they could (re)submit at any point up until the final exam and earn half marks. 0.98 (0.96,0.98)
Jan 2012 (E&M) smartPhysics Worth 8%. Did not drop any assignments, but they could (re)submit at any point up until the final exam and earn half marks. 0.94 (0.93,0.98)
Jan 2013 (E&M) Halliday, Resnick & Walker – Fundamentals of Physics 9e & smartPhysics multimedia presentations Worth 10%. Did not drop any assignments, but they could (re)submit at any point up until the final exam and earn half marks. 0.93 (0.87,0.97)

Overall the style of question used was the same for each course, with the most common type of question being a fairly straight-forward clicker question (I discuss the resources and assignments a bit more in the next paragraph). I have not crunched the numbers, but scanning through results from the Jan 2013 course shows that students are answering the questions correctly somewhere in the 65-90% range and the questions used in that course were a mishmash of the Jan 2010 and Jan 2012 courses. Every question would have an “explain your answer” part. These assignments were graded on completion only, but their explanation had to show a reasonable level of effort to earn these completion marks. Depending on class size, I did not always read their explanations in detail, but always scanned every answer. For the first couple of assignments I always made sure to send some feedback to each student which would include an explanation of the correct answer if they answered incorrectly. Each question would also be discussed in class.

A rundown of how the resources and assignments varied by class:

  • For Sept 2009 and Jan 2010 I used a Blackboard assignment to give them the three questions each week and told them which sections of the textbook to read, and I didn’t do much to tell them to skip passages or examples that weren’t directly relevant.
  • For Sept 2010 and Jan 2012 I used smartPhysics (link to UIUC PER group page, where they were developed). These consist of multimedia presentations for each chapter/major topic, which have embedded conceptual questions (no student explanations required for these). After they are done the multimedia presentation, they then answer the actual pre-class questions, which are different from those embedded in the multimedia presentation. For the most part, the questions in their pre-class assignments were similar to the ones I was previously using except for the smartPhysics ones were often more difficult. Additionally, my one major criticism of smartPhysics is that I don’t feel they are pitched at the appropriate level for a student encountering the material for  the first time. For more on this, have a look at the second bullet in the “Random Notes” section of this post I did on pre-class assignments (link). One of the very nice things about smartphysics is that everything (the regular homework, the pre-class assignments and the multimedia presentations) all used the same web system.
  • For January 2013, I was back on assigning the pre-class assignments through Blackboard. The preamble for each of the pre-class assignments pointed them toward a smartPhysics multimedia presentation and the relevant sections of the textbook we were using. Students could use one, the other or both of these resources as they felt fit. I don’t think I ever surveyed them on their use of one over the other, but anecdotally I had the sense that way more were using the multimedia presentations.

The data

I present two graphs showing the same data from different perspectives. Figure 1 shows how the fraction of the class completing a given pre-class assignment varies over the course of the term. There is a noticeable downward trend in each course. Figure 2 shows the fraction of assignments completed by each student in each class.

This is the caption

Figure 1. For the first five graphs, the x-axis represents time going from the start of the course (left side) to the end of the course (right side). The box and whiskers plot compares the five courses according to the previously used colours, where the line in the boxes shows the median and the boxes show the 1st and 3rd quartiles. The whiskers are the matplotlib default; they extend to the most extreme data point within the 1.5*(75%-25%) range.

This is the caption

Figure 2. Histograms showing the fraction of assignments completed by each student. An ambiguous looking artifact appears in the 0 bin for Sept 2011, but all students in that course completed 70% or more of the pre-class assignments

Discussion

There is clearly a large difference between the first two courses and the final three in terms of the rates at which students were completing these pre-class assignments. The fact that I saw 98% of these assignments completed one term is still shocking to me. I’m not sure how much each of the following factors contributed to the changes, but here are some of the potential factors…

  1. Multimedia presentations – students seem to find these easier to consume than reading the textbook. There is a study [Phys. Rev. ST Physics Ed. Research 8, 010103 (2012)] from
    Homeyra Sadaghiani at California State Polytechnic University where she did a controlled study comparing the multimedia presentations to readings in textbooks, and used the same pre-class assignments for both. In addition to finding that the multimedia presentation group did slightly better on the exams, she also found that the students had a favorable attitude toward the usefulness of the multimedia presentations, but that the textbook group had an unfavorable attitude toward the textbook reading assignments. But she also mentions that the multimedia group had a more favorable attitude toward clicker questions than the textbook section, and this alone could explain the difference in test performance as opposed to it having to do with the learning which takes place as part of the pre-class assignments. If the students in one section are buying into how the course is being run more than another, they are going to do a better job of engaging with all of the learning opportunities and as a result should be learning more. There are a variety of reasons why reading the textbook may be preferred to have them watching a video or multimedia presentation, but you can’t argue with the participation results.
  2. Generating buy-in – I have certainly found that, as time wears on, I have gotten better at generating buy-in for the way that my courses are run. I have gotten better at following up on the pre-class assignments in class and weaving the trends from their submissions into the class. However, for the Sep 2009 and Jan 2010 courses, that was the most personal feedback I have ever sent to students in an intro course on their pre-class assignments so I might have expected that getting better at generating buy-in might cancel out the decreased personal feedback.
  3. Changes in grading system – This may be a very large one and is tied to generating buy-in. For the first two courses I allowed them to drop their worst 3 or 2 pre-class assignments from their overall grade. In the later courses, I changed the system to being one where they could even submit the assignments late for half credit, but were not allowed to drop any. In the latter method I am clearly communicating to the students I think it is worth their time to complete all of the assignments.

In poking around through the UIUC papers and those from Sadaghiani, that 98% completion rate from my Sept 2011 course is really high, but is going to be an overestimate of how many people actually engaged with the pre-class material as opposed to trying to bluff their way through it. The smartPhysics system also gave them credit for completing the questions embedded in the multimedia presentations and I’m not presenting those numbers here, but when I scan the gradebooks, those that received credit for doing their pre-class assignments also always received credit for completing the embedded questions in the multimedia presentations. But, it is possible to skip slides to get to those so that doesn’t mean they actually fully consumed those presentations. Based on reviewing their explanations each week (with different degrees of thoroughness) and then docking grades accordingly, I would estimate that maybe 1 or 2 students managed to bluff their way through each week without actually consuming the presentation. That translates to 2-3% of these pre-class assignments.

Sadaghiani reported “78% of the MLM students completed 75% or more of the MLMs”, where MLM is what I have been calling the multimedia presentations. Colleagues of mine at UBC found (link to poster) that students self-reported to read their textbooks regularly in a course that used a quiz-based pre-class assignment (meaning that students were given marks for being correct as opposed to just participating, and in this case were not asked to explain their reasoning). 97% of the students actually took the weekly quizzes, but there is a discrepancy in numbers between those that took the quizzes and those that actually did the preparation.

With everything I have discussed here in mind, it seems that 80% or better is a good rule of thumb number for buy-in for pre-class activities, and that one can do even better than that with some additional effort.


Student feedback on having weekly practice quizzes (instead of homework)

This term I eliminated the weekly homework assignment from my calc-based intro physics course and replaced it with a weekly practice quiz (not for marks in any way), meant to help them prepare for their weekly quiz. There’s a post coming discussing why I have done this and how it has worked, but a la Brian or Mylene, I think it can be valuable to post this student feedback.

I asked a couple of clicker questions related to how they use the practice quizzes and how relevant they find the practice quiz questions in preparing them for the real quizzes. I also handed out index cards and asked for extra comments.

Aside from changing from homework assignments to practice quizzes, the structure of my intro course remains largely the same. I get them to do pre-class assignments, we spend most of our class time doing clicker questions and whiteboard activities, and there is a weekly two-stage quiz (individual then group). I have added a single problem (well, closer to an exercise) to each weekly quiz, where in the past I would infrequently ask them to work a problem on a quiz.

Clicker Question 1

L1302081000_Q1 L1302081000_C1

 

Clicker Question 2

Just from a quick scan of the individual student responses on this one, I saw that the students with the highest quiz averages (so far) tended to answer A or B, where the students with the lower quiz averages tended to answer B or C. I will look at the correlations more closely at a later date, but I find that this is a really interesting piece of insight.

L1302081000_C2

 

Additional Written Feedback

Most of the time I ask the students for some feedback after the first month and then continue to ask them about various aspects of the course every couple of weeks. In some courses I don’t do such a great job with the frequency.

Usually, for this first round of feedback, the additional comments are dominated by frustration toward the online homework system (I have used Mastering Physics and smartPhysics), requests/demands for me to do more examples in class, and some comments on there being a disconnect between the weekly homework and the weekly quiz. As you can see below, there is none of that this time. The practice quizzes, the inclusion of a problem on each weekly quiz, and perhaps the provided learning goals, seem to do a pretty good job of communicating my expectations to them (and thus minimize their frustration).

Student comments (that were somewhat on topic)

  • I feel like the practice quizzes would be more helpful if I did them more often. I forget that they have been posted so maybe an extra reminder as class ends would help.
  • The wording is kind of confusing then I over think things. I think it’s just me though. Defining the terms and the equations that go with each question help but the quizzes are still really confusing…
  • Curveball questions are important. Memorize concepts not questions. Changes how students approach studying.
  • The group quizzes are awesome for verbalizing processes to others. I like having the opportunity to have “friendly arguments” about question we disagree on
  • I love the way you teach your class Joss! The preclass assignments are sometimes annoying, but they do motivate me to come to class prepared
  • I enjoy this teaching style. I feel like I am actually learning physics, as opposed to just memorizing how to answer a question (which has been the case in the past).
  • I really enjoy the group quiz section. It gets a debate going and makes us really think about the concepts. Therefore making the material stick a lot better.

Last thought: With this kind of student feedback, I like to figure out a couple of things that I can improve or change and bring them back to the class as things I will work on. It looks like I will need to ask them a weekly feedback question which asks them specifically about areas of potential improvement in the course.


Summer 2012 Research, Part 1: Immediate feedback during an exam

One of my brief studies, based on data from a recent introductory calculus-based course, was to look at the effect of immediate feedback in an exam situation. The results show that, after being provided with immediate feedback on their answer to the first of two questions which tested the same concept, students had a statistically significant improvement in performance on the second question.

Although I used immediate feedback for multiple questions on both the term test and final exam in the course, I only set up the experimental conditions discussed below for one question.

The question

The question I used (Figure 1) asked about the sign of the electric potential at two different points. A common student difficulty is to confuse the procedures of finding electric potential (a scalar quantity) and electric field (a vector quantity) for a given charge distrubution. The interested reader might wish to read a study by Sayre and Heckler (link to journal, publication page with direction link to pdf).

Figure 1. Two insulating bars, each of length L, have charges distributed uniformly upon them. The one on the left has a charge +Q uniformly distributed on it and the one on the right has a charge -Q uniformly distributed on it. Assume that V=0 at a distance that is infinitely far away from these insulating bars. Is the potential positivenegative or zero at point A? At point B?

Experimental design and results

There were three versions of the exam, with one version of this question appearing on two exams (Condition 1, 33 students) and the other version of this question appearing on the third exam (Condition 2, 16 students). For each condition, they were asked to answer the first question (Q1), using an IFAT scratch card for one of the points (Condition 1 = point A; Condition 2 = point B). With the scratch cards, they scratch their chosen answer and if they chose correctly they will see a star. If they were incorrect, they could choose a different answer and if they were correct on their second try, they received half the points. If they had to scratch a third time to find the correct answer, they received no marks. No matter how they did on the first question, they will have learned the correct answer to that question before moving on to the second question, which asked for the potential at the other point (Cond1 = point B; Cond2 = point A). The results for each condition and question are shown in Table 1.

Q1 (scratch card question) Q2 (follow-up question)
Condition 1 Point A: 24/33 correct = 72.7±7.8% Point B: 28/33 correct = 84.8±6.2%
Condition 2 Point B: 8/16 correct = 50.0±12.5% Point A: 10/16 correct = 62.5±12.1%

Table 1: Results are shown for each of the conditions. In condition 1, they answered the question for point A and received feedback, using the IFAT scratch card, before moving on to answer the question for point B. In condition 2, they first answered the question for point B using the scratch card and then moved on to answering the question for point A.

So that I can look at the improvement from all students when going from the scratch card question (Q1) to the follow-up question (Q2), I need to show that there is no statistically significant difference between how the students answered the question for point A and point B. Figure 2 shows that a two-tailed repeated-measures t-test fails to reject the null hypothesis, that the mean performance for point A and B are the same. Thus we have no evidence that these questions are different, which means we can move on to comparing how the students performed on the the follow-up question (Q2) as compared to the scratch card question (Q1).

Figure 2. A two-sided repeated-measures t-test shows that there is no statistically significant difference in performance on the question for points A and B.

Figure 3 shows a 12.2% improvement from the scratch card question (Q1) to the follow-up question (Q2). Using a one-tailed repeated-measures t-test (it was assumed that performance on Q2 would be better than Q1), the null-hypothesis is rejected at a level of p = 0.0064. Since I have made two comparisons using these same data, a Bonferroni correction should be applied. The result of this correction is there were statistically significant differences at the p = 0.05/2 = 0.025 level, which means improvement from Q1 to Q2 was statistically significant.

Figure 3. A one-sided repeated-measures t-test shows that there is a statistically significant improvement in performance on the scratch card (Q1, 65.3±6.8%) and follow-up (Q2, 77.5±6.0%) questions.

Future work

In additional to reproducing these results using multiple questions, I would also like to examine if these results hold true for some different conditions. Additional factors which could be examined include difference disciplines, upper-division vs. introductory courses and questions which target different levels of Bloom’s taxonomy.

Note: I found a paper that looks at the effect of feedback on follow-up questions as part of exam preparation and discuss it in more detail in this follow-up post.


Kinder, Gentler Oral Exams

Let me start off by saying that, as a student, I found oral exams to be very intimidating and frustrating. I could see their value as assessment tools, but found that in practice they were simply a source of personal dread. Enter 2012 where I am using oral assessments with my own students, but what I have done is try to minimize what I found intimidating and frustrating about oral exams. I have made my oral assessments kinder and gentler.

The strengths of oral assessments

In my opinion, the strengths of oral assessments are a result of their interactive nature.

If a student is stuck on a minor point, or even a major one, you can give them a hint or use some leading questions to help them along. Compare this to what happens if a student gets stuck on a written exam question and you can see how the oral assessment provides you with a much better assessment of student understanding than an incomplete or nonsensical written response.

Another strength is that no ambiguity need be left unturned. If some sort of ambiguous statement comes out of a student’s mouth, you can ask them to clarify or expand on what they have said instead of dealing with the common grader’s dilemma of sitting in front of a written response trying to make judgement calls related to ambiguous student work.

Some other benefits are that marking is a breeze (I will discuss my specific marking scheme later) and I have also found that I can generate “good” oral exam questions much more quickly than I can written ones.

My perception of the weaknesses of traditional oral assessments

The following are common, but not universal characteristics of oral assessments.

Public –Looking dumb in front of me may not be fun, but it is far more comfortable than looking dumb in front of a room full of your peers or discipline experts. Having spent some time on both sides of the desk, I don’t feel that my students ever “look dumb”, but as a student I remember feeling dumb on many occasions (here I will also include comprehensive exams, dissertation defences and question periods after oral presentations in my definition of oral assessments). I guess I’m saying that it feels worse than it looks, but doing it in public makes it feel even worse.

A lack of time to think – This is actually my biggest beef with oral assessments. In a written assessment you can read the question, collect your thoughts, brain-storm, make some mistakes, try multiple paths, and then finally try to put together a cohesive answer. I realize that you can do all these things in an oral assessment as well, but there is a certain time pressure which hangs over your head during an oral assessment. And there is a difference between privately pursuing different paths before coming to a desired one and having people scrutinize your every step while you do this.

Inauthentic – By inauthentic, I mean that oral exams (and for the most part, written ones too) isolate you from resources and come with some sort of urgent time pressure. If we are confronted with a challenging problem or question in the real world, we usually have access to the internet, textbooks, journals and even experts. We are able to use those resources to help build or clarify our understanding before having to present our solution. On the flip side, we can also consider the question period after a presentation as a real-world assessment and we are usually expected to have answers at our fingertips without consulting any resources so arguments can be made for and against the authenticity of an oral assessment.

Context (Advanced Lab)

Before I break down my kinder, gentler oral exams, I want to discuss the course in which I was using them. This course was my Advanced Lab (see an earlier post) where students work in pairs on roughly month-long experimental physics projects. One students is asked to be in charge of writing about the background and theory and the other the experimental details, and then on the second project they switch. For their oral assessments I used the same set of questions for both partners, but the actual questions (see below) were very project-specific. My hope was that using the same questions for both partners would have forced them to pay much closer attention to what the other had written.

It took at most a total of 2 hours to come up with the 6 sets of questions (12 students total in the course) and then 9 hours of actual oral exams which comes out to less than an hour per student. I would say that this is roughly equivalent to the time I would have spent creating and marking that many different written exams, but this was much more pleasant for me than all that marking.

Kinder, gentler oral exams

I will describe the format that I use and then highlight some of the key changes that I made to improve on what I perceive to be the weaknesses of traditional oral exams.

I book a 45-minute time slot for each student and they come to my office one at a time. When they show up in my office I have 3 questions for them. They have 10 minutes to gather their thoughts and use whatever resources that they brought (including using the internet, but not consulting with somebody) to help formulate some coherent answers. I also give them a nice big whiteboard to use how they see fit. Once their 10 minutes are up (it is not uncommon for them to take a couple extra minutes if they want that little bit of extra time), they are asked to answer the questions in whatever order would please them. For each question I try, but not always successfully, to let them get their answer out before I start asking clarification, leading or follow-up questions. If they are on the completely wrong track or get stuck I will step in much earlier. If the leading questions do not help them get to the correct answer, we will discuss the question on the spot until I feel like the student “gets” the answer. Sometimes these discussions would immediately follow the question and sometimes I would wait until after they have had a chance to answer all three questions. After they have answered all three questions and we have discussed the correct answers, I pull out the rubric (see below) and we try to come to consensus on their grade for each question. They leave my office with a grade and knowledge of the correct answer to all three questions.

The key changes:

  • Private – I have them come to my office and do the assessment one-on-one instead of in front of the whole class.
  • 10 minutes to collect their thoughts and consult resources – It is similar to the perceived safety blanket offered by an open book exam. Students that were well-prepared rarely used the entire time and students that were not well-prepared tried to cram but did not do very well since I would always ask some clarification or follow-up questions. I have some post-course feedback interviews planned to learn more about the student perspective on this, but my perception is that the preparation time was helpful, even for the well-prepared students. It gave them a chance to build some confidence in their answers and I often delighted in how well they were able to answer their questions. I think that time also offered an opportunity to get some minor details straight, which is beneficial in terms of confidence building and improving the quality of their answers. And finally, knowing that they had that 10 minutes of live prep time seemed to reduce their pre-test stress.
  • Immediate feedback – Discussing the correct answer with the student immediately after they have answered a question is a potential confidence killer. I suspect that the students would prefer to wait until after they have answered all the questions before discussing the correct answers, and I am interested to see what I will learn in my feedback interviews.
  • Grading done as collaborative process with the student – In practice I would usually suggest a grade for a question and mention some examples from their answer (including how much help they needed from me) and then ask them if they thought that was fair. If they felt they should have earned a higher grade, they were invited to give examples of how their answer fell in the higher rubric category and there were many occasions where those students received higher grades. However, the problem is that this is a squeaky wheel situation and it is hard to figure out if it is entirely fair to all students. For cases where I asked students to tell me what grade they thought they earned before saying anything myself, students were far more likely to self-assess lower than I would have assessed them than to self-assess higher than I would have assessed them.

Grading rubric

The grading rubric used was as follows:

Grade Meets Expections? Criteria
100% Greatly exceeds expectations The students displayed an understanding which went far beyond the scope of the question.
90% Exceeds expectations Everything was explained correctly without leading questions.
75% Meets expectations The major points were all explained correctly, but some leading questions were needed to help get there. There may have been a minor point which was not explained correctly.
60% Approaching expectations There was a major point or many minor points which were not explained correctly. The student was able to communicate an overall understanding which is correct.
45% Below expections Some of the major points were explained correctly, but the overall explanation was mostly incorrect.
30% Far below expectations Some of the minor points were explained correctly, but the overall explanation was mostly incorrect.
0% x_X X_x

Some example questions

General

  • I would pull a figure from their lab report and ask them to explain the underlying physics or experimental details that led to a specific detail in the figure.

Superconductor experiment

  • “Run me through the physics of how you were able to get a current into the superconducting loop. Why did you have to have the magnet in place before the superconducting transition?”
  • “Describe the physics behind how the Hall gave a voltage output which is proportional (when zeroed) to the external field. How do the external magnetic field and the hall sensor need to be oriented with respect to each other?”
  • “Explain superconductivity to me in a way which a student, just finishing up first-year science, would understand.”

Electron-spin resonance experiment

  • “Discuss how the relative alignment between your experiment and the Earth’s magnetic field might affect your results.”

Gamma-ray spectroscopy

  • “In what ways did your detector resolution not agree with what was expected according to the lab manual? What are some reasonable steps that you could take to TRY to improve this agreement?”

Some other directions to take oral assessments

A couple of my blogger buddies have also been writing about using oral assessments and really like what they are up to as well.

Andy Rundquist has written quite a bit about oral assessments (one example) because they are quite central to his Standards-Based Grading implements. One of the things that he has been doing lately is giving a student a question ahead of time and asking them to prepare a page-length solution to the question to bring to class. In class the student projects their solution via doc-cam, Andy studies it a bit, and then he starts asking the student questions. To my mind this is most similar to the question period after a presentation. The student has had some time, in isolation, to put together the pieces to answer the question, and the questions are used to see how well they understood all the pieces required to put together the solution. Another thing that Andy does is gets the whole class to publicly participate in determining the student’s overall grade on that assessment. I love that idea, but feel like I have some work to do in terms of creating an appropriate classroom environment to do that.

Bret Benesh wrote a couple of posts (1, 2) discussing his use of oral exams. His format it closer to mine than it is to Andy’s, but Bret’s experience was that even if they knew the exam question ahead of time, he could easily tell the difference between students that understood their answers and those that did not. I really want to try giving them the questions ahead of time now.

One final note

I am giving a short AAPT talk on my kinder, gentler oral exams, so any feedback that will help with my presentation will be greatly appreciated. Are there certain points which were not, but should have been emphasized?


My BCAPT Presentation on Group Quizzes

I forgot to post this. I gave a talk on group quizzes at the BCAPT AGM (local AAPT chapter) nearly a month ago. It was based on the same data analysis as a poster that I presented the previous year (two-stage group quiz posts 0 and 1), but I added some comparisons to other similar studies.

My presentation:

Download:

I’m in the middle of some data analysis for data I collected during the past year and will be presenting my initial findings at FFPER-PS 2012.


Student opinions on what contributes to their learning in my intro E&M course

We are a couple of weeks away from our one and only term test in my intro calc-based electricity and magnetism course. This test comes in the second last week of the course and I pitch it to them as practicing for the final. This term test is worth 10-20% of their final grade and the final exam 30-40% of their final grade and these relative weights are meant to maximize the individual student’s grade.

Today I asked them how they feel the major course components are contributing to their learning:

How much do you feel that the following course component has contributed to your learning so far in this course?

This is a bit vague, but I told them to vote according to what contributes to their understanding of the physics in this course. It doesn’t necessarily mean what makes them feel the most prepared for the term test, but if that is how they wanted to interpret it, that would be just fine.

For each component that I discuss below, I will briefly discuss how it fits into the overall course. And you should have a sense of how the whole course works by the end.

The smartphysics pre-class assignments

The pre-class assignments are the engine that allow my course structure to work they way I want it to and I have been writing about them a lot lately (see my most recent post in a longer series). My specific implementation is detailed under ‘Reading assignments and other “learning before class” assignments’ in this post. The quick and dirty explanation is that, before coming to class, my students watch multimedia prelectures that have embedded conceptual multiple-choice questions. Afterward they answer 2-4 additional conceptual multiple-choice questions where they are asked to explain the reasoning behind each of their choices. They earn marks based on putting in an honest effort to explain their reasoning as opposed to choosing the correct answer. Then they show up to class ready to build on what they learned in the pre-class assignment.


Smartphysics pre-class assignments

A) A large contribution to my learning.
B) A small contribution to my learning, so I rarely complete them.
C) A small contribution to my learning, but they are worth marks so I complete them.
D) No contribution to my learning, so I rarely complete them.
E) No contribution to my learning, but they are worth marks so I complete them

 

The smartphysics online homework

The homework assignments are a combination of “Interactive Examples” and multi-part end-of-chapter-style problems.

The Interactive Examples tend to be fairly long and challenging problems where the online homework system takes the student through multiple steps of qualitative and quantitative analysis to arrive at the final answer. Some students seem to like these questions and others find them frustrating because they managed to figure out 90% of the problem on their own but are forced to step through all the intermediate guiding questions to get to the bit that is giving them trouble.

The multi-part end-of-chapter-style problems require, in theory, conceptual understanding to solve. In practice, I find that a lot of the students simply number mash until the correct answer comes out the other end, and then they don’t bother to step back and try to make sure that they understand why that particular number mashing combination gave them the correct answer. The default for the system (which is the way that I have left it) is that they can have as many tries as they like for each question and are never penalized as long as they find the correct answer. This seems to have really encouraged the mindless number mashing.

This is why their response regarding the learning value of the homework really surprised me. A sufficient number of them have admitted that they usually number mash, so I would have expected them not to place so much learning value on the homework.

The online smartphysics homework/b>

A) A large contribution to my learning.
B) A small contribution to my learning, so I rarely complete them.
C) A small contribution to my learning, but it is worth marks so I complete it.
D) No contribution to my learning, so I rarely complete it.
E) No contribution to my learning, but it is worth marks so I complete it

 

Studying for quizzes and other review outside of class time

Studying for quizzes and other review outside of class time

A) A large contribution to my learning.
B) A small contribution to my learning, but I do it anyway.
C) A small contribution to my learning so I don’t bother.
D) No contribution to my learning so I don’t bother.

 

Group quizzes

I have an older post that discusses these in detail, but I will summarize here. Every Friday we have a quiz. They write the quiz individually, hand it in, and then re-write the same quiz in groups. They receive instant feedback on their group quiz answers thanks to IF-AT multiple-choice scratch-and-win sheets and receive partial marks based on how many tries it took them to find the correct answer. Marks are awarded 75% for the individual portion and 25% for the group portion OR 100% for the individual portion if that would give them the better mark.

The questions are usually conceptual and often test the exact same conceptual step needed for them to get a correct answer on one of the homework questions (but not always with the same cover story). There are usually a lot of ranking tasks, which the students do not seem to like, but I do.

Group Quizzes

A) A large contribution to my learning.
B) A small contribution to my learning.
C) They don’t contribute to my learning.

 

Quiz Corrections

I have an older post that discusses these in detail, but I will again summarize here. For the quiz correction assignments they are asked, for each question, to diagnose what went wrong and then to generalize their new understanding of the physics involved. If they complete these assignments in the way I have asked, they earn back half of the marks they lost (e.g. a 60% quiz grade becomes 80%).

I am delighted to see that 42% of them find that these have a large contribution to their learning. The quizzes are worth 20% of their final grade, so I would have guessed that their perceived learning value would get lost in the quest for points.

Quiz Corrections

A) A large contribution to my learning.
B) A small contribution to my learning, so I rarely complete them.
C) A small contribution to my learning, but they are worth marks so I complete them.
D) No contribution to my learning, so I rarely complete them.
E) No contribution to my learning, but they are worth marks so I complete them.

 

In-class stuff

I am a full-on interactive engagement guy. I use clickers, in the question-driven instruction paradigm, as the driving force behind what happens during class time. Instead of working examples at the board, I either (A) use clicker questions to step the students through the example so that they are considering for themselves each of the important steps instead of me just showing them or (B) get them to work through examples in groups on whiteboards. Although I aspire to have the students report out there solutions in a future version of the course (“board meeting”), what I usually do when they work through the example on their whiteboards is wait until the majority of the groups are mostly done and then work through the example at the board with lots of their input, often generating clicker questions as we go.

The stuff we do in class

A) A large contribution to my learning.
B) A small contribution to my learning.
C) It doesn't contribute to my learning.

 

The take home messages

Groups quizzes rule! The students like them. I like them. The research tells us they are effective. Everybody wins. And they only take up approximately 10 minutes each week.

I need to step it up in terms of the perceived learning value of what we do in class. That 2/3rds number is somewhere between an accurate estimate and a small overestimate of the fraction of the students in class that at any moment are actively engaged with the task at hand. This class is 50% larger than my usual intro courses (54 students in this case) and I have been doing a much poorer job than usual of circulating and engaging individual students or groups during clicker questions and whiteboarding sessions. The other 1/3 of the students are a mix of students surfing/working on stuff for other classes (which I decided was something I was not going to fight in a course this size) and students that have adopted the “wait for him to tell us the answer” mentality. Peter Newbury talked about  these students in a recent post. I have lots of things in mind to improve both their perception and the actual learning value of what is happening in class. I will sit down and create a coherent plan of attack for the next round of courses.

I’m sure there are lots of other take home messages that I can pluck out of these data, but I will take make victory (group quizzes) and my needs improvement (working on the in class stuff) and look forward to continuing to work on course improvement.


Nonlinear narratives in the inverted classroom

I have temporarily taken over an introductory E&M course from one of my colleagues. I’m teaching the course using his format (and notes) which means that I am (A) lecturing and (B) not using pre-class assignments for the first time since 2006. In addition to his format, I am using the odd clicker question here and there.

The thing that has been the most interesting about lecturing in a non-inverted class has been the difference in narrative. In my regular courses, I assume that the students have had contact with all the major ideas from a given chapter or section before I even ask them the first clicker question. Because of this we are able to bring all the relevant ideas from a chapter to bear on each question if needed. This is what i am used to.

My colleague’s notes develop the ideas in a nice linear fashion and very easy to lecture from, but I just can’t stop myself from bringing in ideas that are multiple sections down the road. I am having a ton of trouble, even with a set of notes in front of me, letting the story develop according to a well laid-out narrative. It has simply been too long since I have presented material in this sort of a structured way. Note that when I give a talk at a conference it takes me a ton of practice to massage the talk I have prepared into something which I am able to deliver using a nice linear narrative. Even when it is nicely laid out, I will jump ahead to other ideas if I don’t spend some serious time practicing not doing that.

It has been really interesting being the one completely responsible for the narrative instead of sharing that responsibility with the resources that I provide for my pre-class assignments.

It has also been weird not having the inmates run the asylum.