Markstrat Bob: A Generative AI Simulation Coach
A cautionary tale on using GPTs as coaches
I have been experimenting in class with ungraded GenAI exercises in class since Spring of 2023 and have also talked with a variety of student groups about GenAI inside and outside my university during this time. But in Summer 24, I decided to go deep on this in a graduate class as part of what I came to describe as my “Hot AI Summer.”
The class was a graduate elective in Marketing and Strategy, and was comprised of 18 students from a variety of programs (MS Management, MBA, MS Business Analytics) and levels of business experience (zero to lots). My impression is that we are hitting an inflection point in student use (or at least acknowledged use!) of GenAI. I assigned 12% of the final grade in this class across three GenAI exercises. These were obviously business applications, but I think in concept are generally applicable across a broad range of disciplines.
Here is the exercise that did not work very well.
Custom GPT as simulation coach
Open AI’s GPTs promise a Gen AI application customized to a particular use case. If you have a free or paid account, go to the Chat GPT website and you will see an option in the left-hand menu labeled “Explore GPTs”. Click on this and you will be in the Open AI equivalent of an app store, with various tools classified under categories such as Writing, Productivity, and Education. Some of these are produced by companies (Khan Academy has authored a couple as of this writing) and some by individual users. Some require accounts or logins beyond your GPT account, some are free.
For many years, I have taught my MBA elective partly by using a simulation called Markstrat. This is produced by www.stratx.com, and is one of the most widely used marketing strategy simulations in higher ed. Markstrat puts four to six teams of students in competition with one another and/or computerized players in a fictional consumer durables industry (think consumer electronics). Teams compete over several periods for sales and profit, making decisions about targeting, product portfolio, pricing, distriubtion, and advertising.
In the Summer 24 course I ran five teams competing over eight periods. Students were graded on three deliverables in the simulation: a Competitor Analysis due with Period 4 decisions (10% of final grade), a Learning Analysis due with Period 8 decisions (20% of final grade), and Team Performance in the simulation (5% of final grade).
Markstrat is complicated. One of the things I do as part of teaching with the simulation is to try to coach students in their decision making when needed. Students often struggle with tasks such as production forecasting or interpreting their studies and financial reports. Markstrat is a dynamic competitive environment as well — performance is governed heavily by the decisions competing teams make in the context of market potential. I try to help students think through decisions and trade-offs without giving them answers.
Coaching works (though not perfectly, as I cannot predict for one team what competing teams will do), but it assumes (a) students ask me for help, which they can be reluctant to do and (b) I am available at the time they need help. As my most ambitious activity of Summer 24, I decided to create my own GPT coach for the simulation, which I thought could address both problems. I also hoped it would help students learn more in the simulation and avoid basic mistakes.
I am deeply experienced with Markstrat but shallowly experienced with creating and using GPTs. I have worked with GPTs and created some as tests, but this is the first one I developed and fully deployed with students. (I’ll come back to that point.) I sought out guidance on the Open AI website and a couple of independent websites. My goal was to create an imitation of my approach, a coach who would help students with tasks, decisions, and trade-offs without giving them answers.
I began by uploading my own ppts and class lecture transcripts on Markstrat to the GPT I initially called “Markstrat Coach.” I then uploaded materials from independent public links that StratX includes on its website. What I did not do was upload the Markstrat manual, since that is a paywalled, copyrighted document.
After this, I “configured” the GPT with instructions. This turned out to be an interative process. As I went through testing, I found I needed to add more instructions. GPT itself assisted in this with a chatbot in the configuration window. I restricted the GPT to not access the web but only use the uploaded materials. The role I assigned the GPT was as follows:
You are a helpful assistant for business students participating in the Markstrat simulation. You specialize in interpreting and analyzing simulation financial and market research reports and explaining key results. Your role is to help students interpret and analyze their results, and suggest possible options for improvement.
Developing and testing took about a week (part-time, as I was teaching the class and doing other work). Within a few days, it became obvious that “Markstrat Coach” was too ambitious a name for the GPT. There were two problems. First, it regularly ignored some of my instructions (leading to more instructions, eventually totaling 420 words). Second, and more importantly, it hallucinated on multiple occasions, suggesting actions that were not possible in the simulation (e.g., run a social media campaign). It was often useful, but not reliably so.
I renamed the GPT “Markstrat Bob”.
(Disclosures: While I am acquainted with the Markstrat founders and some personnel, I neither consulted with nor was compensated by Markstrat in this venture. They are finding out about it at pretty much the same time you are.)
The Assignment
After four rounds of the simulation, I gave all students access to Bob. I did this on the principle that they needed to know enough about Markstrat to be able to ask questions and evaluate the answers. All students had free or paid GPT accounts as a requirement of the course. They had experience with using a different GPT earlier in the term, so knew how to access one.
After testing, I felt the need to give students a lot of tips on the GPT. Following were the Bob instructions:
Markstrat Bob Instructions
Here is a link to Markstrat Bob. It will work the same way as clicking on the link to the Framework Finder GPT we used earlier in the term: XXXX
I have assigned Bob a role as follows: “You specialize in interpreting and analyzing simulation financial and market research reports and explaining key results.” I have trained Bob on all of my ppt presentations and a variety of public resources on the StratX website. (Note this does not include the manual, as that is a copyrighted document.)
Open your GPT account and then click the link above. If you click on (or type) Hi Bob, Bob will introduce himself.
I would like everyone as individuals to at least try Bob out. If nothing else, ask him something about the simulation that you already know the answer to from playing it over the first four periods.
As teams, you may find Bob helpful in assisting with analyses to improve your decisions in the second half of the simulation. I cannot guarantee this will work (any more than I can guarantee my own advice will work), but I have been testing and refining Bob for about a week now, and will give you some tips below. Bob may hallucinate, but the tips will reduce hallucinations. You still need to be confident Bob is giving you a good answer.
Bob is not supposed to give you specific recommendations, but he may. If he happens to do this, ask him for a second recommendation or to identify pros and cons of his recommendation. He may be wrong. “Bob told us to do it” will not be an excuse for poor simulation performance.
As teams, I will ask you in your Learning Analysis assignment (to come) to either recommend or not recommend Bob to future management with a brief explanation as to why. This will be a (very small) part of your grade on that assignment. I will also ask you to individually reflect on Bob as part of your final reflection assignment (to come).
Please do not share Bob with people outside our class. I will be deactivating him at the end of the term.
Tips
· You may want to review the Simple Prompting Strategies infographic I posted under the Course Materials Module. Bob already has a role, but the other four S’s all apply. Bob will be as good as the prompt you give him. If you don’t like what he’s giving you, try a different prompt.
· Bob cannot access the simulation, but you may upload your team report or any other Markstrat file you have produced to him. He will be able to read your file as a pdf, for example. I would encourage you to do this before you ask him questions, as it will help him understand your team’s situation.
· Bob is best at specific questions. For example, I asked Bob to perform a competitor analysis on one of your teams and it was awful. However, if you ask Bob to help you with a specific question about the simulation or your results, he is much better (if not perfect).
· Bob is better if you take him step-by-step through a problem (as are all LLMs). If you would like Bob to perform an analysis, ask him to first extract data from Study X and produce a table to confirm he has done this correctly. Now ask him to perform Analysis Y on this data. If the analysis has multiple steps, take him through one step at a time (which will also allow you to check he is doing it right).
· Bob can perform analyses or produce tables that combine information from different studies, which is one of his more useful applications. First ask him to extract the data you want from Study X, then ask him to extract the data you want from Study Y. Now ask him to combine the data in the way you wish.
· Bob is good at producing arguments around options, but be specific in the option and limit his responses. For example, I have had good results from asking him “What are three pros and three cons of doing [Specific Thing].” You can then ask Bob to help you think through one of those pros or cons.
· Ask Bob for multiple ways of thinking about an issue. Ask him to give you a better answer or suggest a different way of addressing your question. This is a way of reducing the impact of any hallucinations.
· I recommend uploading one file to Bob at a time. He sometimes gets confused if you give him multiple files at once. Start with File A, have Bob do something. Then upload File B, and have Bob do something. Then he can work with the two outputs.
· Bob sometimes forgets he can pull numbers out of documents. If he does, simply remind him what data you want from which study.
· Bob can help you with doing things in the simulation, e.g., “How do we specify [action] in Markstrat?” Ask him for step-by-step instructions. Then you can open the simulation in a separate window, try his suggestion, and quickly correct if he makes a mistake.
· If you recognize Bob has done something wrong, point it out and ask him to correct the mistake and answer the question again.
· You may copy any table Bob produces and paste it into another document. Paint the table and hit Ctrl-C to copy it, then paste where you wish.
Learning Analysis Instructions (excerpt)
Following are the instructions in the section of the Learning Analysis addressing Bob:
III. History and Learning — discuss the history of your firm’s activities and outcomes to date. Do not give a period-by-period description of all your decisions, but identify the major turning points in the industry and your firm’s activities from your perspective. Give future management advice on what you have learned about competing in the Markstrat environment, in terms of strategy, tactics, analysis tools, decision processes, etc. that were especially helpful or harmful. In this context, please indicate whether you would recommend Markstrat Bob to future management and why/why not?
Course Reflection Instructions
Finally, here are the course AI reflection instructions, in which students could reflect on Bob:
For your final AI assignment, I would like you to reflect on your experience with Generative AI throughout the course. This should include both the role of GenAI in our in-class discussions (e.g., the Perplexity case) and your use of GenAI out of class (e.g., Markstrat Bob). Please address the following three questions:
1. How, if at all, has your experience in this course changed your attitude towards GenAI? Explain why.
2. How, if at all, has your experience in this course changed your use of GenAI? Explain why.
3. Going forward, what is something you would experiment with using GenAI for? What is something you would not experiment with using GenAI for? Explain why.
Please submit no more than one single-spaced page of reflection. This assignment will count for approximately 4% of your final grade in the class.
The Results
Objective results are difficult to measure in this exercise because different teams likely used the GPT in different ways. Unlike other assignments in this course, I did not give students a uniform starter prompt that would ensure some comparability; teams could ask Bob whatever questions they liked, just as they might ask me.
There are also reasons to expect that Bob would not make an objective difference in team performance. First, since all teams had the same tool, there is no competitive advantage in having it. Second, different teams have different situations. By the end of Period 4, there were three large firms and two doing smaller firms. It can be difficult for smaller firms to improve because they have fewer resources than the larger firms. In any case, access to Bob did not change the competitive situation. By the end of Period 8, the same teams were larger and smaller.
The results I have are the Learning Analysis and Course Reflection comments on Bob. Teams and individuals described Bob as a mixed experience. These findings should be taken with a grain of salt. There are likely demand effects. Given I had previewed they might have a mixed experience in my instructions, that might prime their answers to these exercises. And given the professor has indicated “Bob may be mixed,” do students think the professor wants to hear “Bob may be mixed”?
In the Learning Analyses, four out of five teams recommended Bob for at least some uses. The four recommending teams all suggested Bob could be useful for narrower questions such as interpreting studies. At the same time, many teams experienced errors in calculation or recommendations outside the simulation’s scope (as I had in testing). The highest performing team admitted it had only used Bob because I had required it for the Learning Analysis — by Period 4 they had a well functioning decision process they did not want to disrupt. They tested Bob, however, by asking it strategic questions about their situation and came away unimpressed. The team that did not recommend Bob simply found his recommendations too generic.
I did not issue a separate grade for the Learning Analysis paragraph on Bob, but generally speaking I looked for a clear recommendation to future management, evidence of usage and thought (what did they ask and what did they get back), and specific examples.
In their final course AI reflections, 14 of 18 students mentioned Bob in some way (they were not required to). Several remark on its usefulness for organizing and interpreting information. Two students noted that using Bob encouraged them to think about scenario planning, though the course also featured a session on this.
Other students found Bob generic, vague, and prone to hallucinations and errors. One student frustrated with errors decided to ask Bob for formulas rather than calculations so that she could make the correct calculations herself.
On the performance issue identified at the beginning of this section, a student indicated that despite what seemed like insightful suggestions, she did not see much improvement in her team’s fortunes. Another student made the sharp observation that given Bob makes errors, it is possible that multiple teams using Bob could make errors that would compound in the market, lowering industry performance overall.
Again, there was no separate “Bob grade” in the final reflection. Generally I looked for specific examples and thoughtful commentary.
The Feedback
As these were experiments — and presented as such to students — I collected feedback via an anonymous survey at the end of the course. (Note my official teaching ratings did not appear influenced by these experiments.) Sixteen students responded.
This exercise was the least popular of the summer: only eight of sixteen rated the exercise either “useful” or “very useful” on a five-point scale (all others were neutral or negative). Informal feedback suggested it was a mixed experience as well. One comment raised the issue of whether a custom model can beat a general model: a student said to me, “Perplexity knows a lot about Markstrat,” and wondered why we needed a custom GPT.
Discussion
This was a business exercise, but as AI tutors and coaches are very much in the news for education in general, I hope is useful even as a cautionary tale. Given this did not go very well, let me reflect on some possible reasons this did not go very well.
“You Did it Wrong”
Some might wonder if I am simply not very good at creating GPTs. Point taken. But I am a relatively sophisticated GenAI user and consulted the Open AI website, its chatbot, and independent guides in constructing the tool. If I cannot do this well with those resources, that suggests it may be hard to do. This is not great news for experimenting faculty, though it may be better news for companies that want to throw resources at creating a good GPT. Things might have been different, for example, if I had uploaded the Markstrat manual. I have shared this article with a colleague at StratX. . .
“Wrong Domain”
This came up in conversation with my fellow AI explorer Mike Kentz, who is referenced at the end of this article. Brainstorming, we wondered if GPTs may simply not work very well on sprawling, complicated problems in dynamic domains. If true, this is bad news for all the vendors who are telling us GenAI will solve all the world’s problems . . .
“Once Burned, Twice Shy”
Many teams experienced at least one hallucination from Bob, as I warned them they might. But those warnings and experiences might simply lead students to distrust any recommendation they received from Bob.
I’m reminded of a PhD course I took decades ago that was using a brand new textbook. The textbook turned out to be riddled with proofreading errors. Once the professor corrected all the errors in the first class, all my subsequent confusion about the material provoked the question “am I just stupid or is this another typo?” I neither trusted nor liked the textbook, and I remember the experience 30 years later.
In this sense, hallucinations may have both short- and long-term effects. In the short term, a wrong answer may have substantial consequences. In the long term, every wrong answer I get downgrades my opinion of the worth of GenAI products. In their final reflections, some students remarked that their dissatisfaction with one AI experience or another in my class made them less likely to use GenAI going forward.
“It Doesn’t Work . . . Yet”
The siren song of all technology advances is that if it doesn’t work now, it will in the future. Ethan Mollick, referenced at the end of this article, sometimes reminds people that the current AI we have is the worst AI we will ever have. I think he’s right about this, but that doesn’t change the distrust that poor early experiences may engender.
More specific to this assignment, one might reasonably wonder if performance of a GPT now (Dec 2024) might be better than five months earlier. I do not have a current class using Markstrat to experiment with, but I have revisited my GPT this week.
I repeated the test prompts I used in GPT testing in the summer. Overall, I would say that with the current (as of Dec 2024) version of GPT 4o, the GPT performs marginally better. Presentation is definitely improved, and it does better at answering basic questions. However, like the summer version it falters in more complicated analyses, occasionally giving errant analysis and poor advice. In some responses it is better than the Summer version, in some worse.
A different “yet” might be that I used the wrong tool. As a second test, I created a “Markstrat Bob” using the free version of Google’s NotebookLM. Note that according to Google, the current version is now powered by its latest version at the time of writing, Gemini 2.0.
I uploaded the same files as for my GPT and then used the same test prompts with the Dec 2024 version of NotebookLM. Perhaps because I could not offer it specific instructions as in the GPT, Notebook LM markedly underperformed either the Summer or the December GPT. Its answers to basic questions were brief and somewhat generic. In more detailed analysis, it had some truly spectacular hallucinations (extracting the wrong data from a report, making up data for an entire exhibit).
While I cannot determine why this experiment fell short of my (and student) desires, I hope that my tour through this experience is thought provoking and perhaps helpful as instructors contemplate their own use of GenAI tools.
Things I might do differently
One thing I did not do was have teams or students submit their chats with Bob as I did in other aspects of the course. With multiple possible chats over four periods and four weeks, this seemed like overkill for them and for me. However, this would have both encouraged more use (“we have to hand something in”) and given me a sense of volume of use. Both would allow some qualitative inferences as to whether higher quality or quantity of usage was associated with higher performance.
Students also suggested some interesting ideas in their feedback. One suggested more structure around quantitative exercises with Bob. Another thought I should do more to teach students to use Bob. A third wanted to know how I made Bob, which is an interesting thing I may share if there is a next time.
With that, I will wish all teachers and professors very good luck this Spring. . .
Resources
· Ethan Mollick has become an important resource for educational change for business schools in this area, and he frequently reviews new developments in GenAI. His substack is well worth a look if you are interested in more ideas and examples: https://www.oneusefulthing.org/.
· I will also recommend Jason Gulya (substack: https://higherai.substack.com/ ) and Mike Kentz (substack: https://mikekentz.substack.com/ ) for thoughtful ideas on integrating AI in the classroom. (Disclosure: Mike and I corresponded about some of my exercises this summer — which doesn’t mean he endorses them!)
Bruce Clark is an Associate Professor of Marketing at the D’Amore-McKim School of Business at Northeastern University where he has been a teaching mentor for both online and on-ground teaching. He researches, writes, speaks, and consults on managerial decision-making, especially regarding marketing and branding strategy, and how managers learn about their markets, though increasingly he is engaged with GenAI in business and higher ed. You can find him on LinkedIn at https://www.linkedin.com/in/bruceclarkprof/.
Beyond the headline image, no AI was used in the writing of this article!