Comparing Summaries: A Generative AI Class Assignment
Make students evaluate the outputs
I have been experimenting in class with ungraded GenAI exercises in class since Spring of 2023 and have also talked with a variety of student groups about GenAI inside and outside my universtiy during this time. But in Summer 24, I decided to go deep on this in a graduate class as part of what I came to describe as my “Hot AI Summer.”
The class was a graduate elective in Marketing and Strategy, and was comprised of 18 students from a variety of programs (MS Management, MBA, MS Business Analytics) and levels of business experience (zero to lots). My impression is that we are hitting an inflection point in student use (or at least acknowledged use!) of GenAI. I assigned 12% of the final grade in this class across three GenAI exercises. These were obviously business applications, but I think in concept are generally applicable across a broad range of disciplines. Here is one exercise that worked.
Comparing GenAI Summaries is a Critical Thinking Exercise
Just as we might have previously assigned students to compare articles on the same topic, having students compare GenAI summaries is a way of helping students engage with different views on the same topic. Having to weigh those views requires (a) some understanding of the topic and (b) deeper thinking about how to weigh different aspects of the views presented (cf. wildly different news coverage of the same topic).
Beyond this, what I like about comparing GenAI summaries as an assignment is that it starkly highlights for students that GenAI does not produce the answer to a question. Despite its “never in doubt” tone, it produces an answer to a question.
Could students upload the two summaries and ask an AI to compare them? Of course! But now they are more aware that the comparison they get is not the comparison but a comparison. That should make them (properly) skeptical. (My experience was that it did not appear students tried this.)
Finally, summarization is a commonly touted use case for GenAI. (Disclosure: I use GenAI for summarizing long podcast and YouTube transcripts to see if I want to sit through the piece.) Let’s give students some experience of this in practice.
The Assignment
For my class, as part of a normal in-class discussion I asked them to read a blog post for a class discussing an online travel aggregator (OTA) such as Expedia. Students were expected to read the article and then discuss what it meant in class for the OTA.
At the same time, I gave Perplexity (using the Claude 3.5 Sonnet model) the following prompt:
Visit the following website and summarize the following article. What insights does the article provide that would be useful to an online travel website that enables users to book flights around the world and collect points from any airline: https://zoftify.com/blog/travel-customer-journey
It produced a summary. I hit the rewrite button. It produced a second summary. The first summary was surprisingly awful, riddled with factual errors. I hit rewrite again. The second and third summaries were good enough and different enough (more to come) that they were useful.
At the end of the class discussion I released the two useful summaries and gave students the following instructions:
“On the following two pages, you will find two summaries of the Zoftify blog post assigned for the Travelogo case on July 16. The same prompt was used to generate both summaries:
Visit the following website and summarize the following article. What insights does the article provide that would be useful to an online travel website that enables users to book flights around the world and collect points from any airline: https://zoftify.com/blog/travel-customer-journey
Assignment Instructions:
Write a short (< 1 page) essay identifying which of the two summaries is better and why. Criteria you should consider:
· Is the summary an accurate representation of the post?
· Does the summary provide useful insights to a company like Travelogo?
· What does the summary miss that was important during our in-class discussion of the post?
Submit your essay to the Travelogo AI link as a word or pdf file.”
The Results
This, I must admit, was one of the weirdest grading experiences of my 30-year career.
It turned out I needed to define what “accurate” was.
Summary #2 drew out two points that the article did not make. These were also two very interesting and useful points. I read them and thought, “those are good ideas.” So did students. But when I went back to the underlying post, I realized the article didn’t really make these points.
Students almost uniformly preferred Summary #2. On a usefulness criterion, this was very reasonable. But they also almost uniformly claimed Summary #2 was the more accurate representation of the post. As I was grading the submissions, I ended up repeating some version of “the summary doesn’t say this” for about two-thirds of the class. Grades on these were unsurprisingly low. (I ended up curving grades on this assignment up on my philosophy that if a couple of students get something wrong, that’s their problem, but if two-thirds of the class get something wrong, that’s my problem.)
It was so strange that I opened the next class discussion with the grading and talked with students about how they evaluated the summaries. This turned out to be highly educational for all of us.
As best I can tell, students applied a criterion of what I will call “spiritual accuracy” to the summaries. As one good student noted in their submission:
While the original article didn’t explicitly mention these technologies, it did suggest using chatbots or travel advisors during the planning stage. This demonstrates a deeper analysis beyond simple summarization, adding valuable insights.
Essentially, the argument seemed to be that this was a logical implication of the article, something the author could have said. That’s not my definition of an accurate summary. But of course it was a more useful summary, and it is not surprising that smart business students would be interested in the most useful ideas for a business. I did not ask this question, but if they showed a summary to their boss, Summary #2 might well be the one.
The Feedback
As these were experiments — and presented as such to students — I collected feedback via an anonymous survey at the end of the course. (Note my official teaching ratings did not appear influenced by these experiments.) Sixteen students responded. Twelve of them indicated this assignment was either “useful” or “very useful” on a five-point scale. (No student gave it a “not useful at all” rating.)
Discussion
This was a business exercise, but I would think any time an online article discusses a topic, this could be useful. I will be running versions of this exercise with both graduate and undergraduate audiences this fall (probably thinking about improving the instructions!).
I will note there is an ethics aspect to this assignment that could be discussed. There has been much commentary online about the injustice of GenAI models scraping vast amounts of text and images that a human produced without consent or compensation. Neither the blog site nor the author of the article consented to my use of their article in this fashion (and in fact I did not ask permission to have students read the article for class). My position is that since the article was unpaywalled and freely available, it was fair to have students read it and an AI summarize it. In fact if I had paid a human research assistant to summarize the article for me, that would be no different than having Perplexity do it.
That said, this is a position and could be discussed. The issue of copyright is fraught regarding GenAI, and some thoughtful students are particularly sensitive to the impact of GenAI on artists and creatives when someone else profits from the artists work. I did not explicitly charge students for this exercise. In that sense my university made no money from it. But it did make money from the tuition that paid for the class overall. I assemble coursepacks using the Harvard Business School website, and Harvard is paid for every article I assign. That did not happen here.
With that, I will wish all teachers and professors very good luck this Fall. . .
Resources
· Ethan Mollick has become an important resource for educational change for business schools in this area, and he frequently reviews new developments in GenAI. His substack is well worth a look if you are interested in more ideas and examples: https://www.oneusefulthing.org/.
· I will also recommend Jason Gulya (substack: https://higherai.substack.com/ ) and Mike Kentz (substack: https://mikekentz.substack.com/ ) for thoughtful ideas on integrating AI in the classroom. (Disclosure: Mike and I corresponded about some of my exercises this summer — which doesn’t mean he endorses them!)
Bruce Clark is an Associate Professor of Marketing at the D’Amore-McKim School of Business at Northeastern University where he has been a teaching mentor for both online and on-ground teaching. He researches, writes, speaks, and consults on managerial decision-making, especially regarding marketing and branding strategy, and how managers learn about their markets, though increasingly he is engaged with GenAI in business and higher ed. You can find him on LinkedIn at https://www.linkedin.com/in/bruceclarkprof/.
Beyond the headline image, no AI was used in the writing of this article!