News Rapid Rundown - December and January's AI news

Dan Bowen and Ray Fleming

02 February 2024

This week's episode is an absolute bumper edition. We paused our Rapid Rundown of the news and research in AI for the Australian summer holidays - and to bring you more of the recent interviews. So this episode we've got two months to catch up with! We also started mentioning Ray's AI Workshop in Sydney on 20th February. Three hours of exploring AI through the lens of organisational leaders, and a Design Thinking exercise to cap it off, to help you apply your new knowledge in company with a small group. Details & tickets here: https://www.innovategpt.com.au/event And now, all the links to every news article and research we discussed: News stories The Inside Story of Microsoft's Partnership with OpenAI https://www.newyorker.com/magazine/2023/12/11/the-inside-story-of-microsofts-partnership-with-openai All about the dram that unfolded at OpenAI, and Microsoft, from 17th November, when the OpenAI CEO, Sam Altman suddenly got fired. And because it's 10,000 words, I got ChatGPT to write me the one-paragraph summary: This article offers a gripping look at the unexpected drama that unfolded inside Microsoft, a real tech-world thriller that's as educational as it is enthralling. It's a tale of high-stakes decisions and the unexpected firing of a key figure that nearly upended a crucial partnership in the tech industry. It's an excellent read to understand how big tech companies handle crises and the complexities of partnerships in the fast-paced world of AI MinterEllison sets up own AI Copilot to enhance productivity https://www.itnews.com.au/news/minterellison-sets-up-own-ai-copilot-603200 This is interesting because it's a firm of highly skilled white collar professionals, and the Chief Digital Officer gave some statistics of the productivity changes they'd seen since starting to use Microsoft's co-pilots: "at least half the group suggests that from using Copilot, they save two to five hours per day," "One-fifth suggest they're saving at least five hours a day. Nine out of 10 would recommend Copilot to a colleague." "Finally, 89 percent suggest it's intuitive to use, which you never see with the technology, so it's been very easy to drive that level of adoption." Greg Adler also said "Outside of Copilot, we've also started building our own Gen AI toolsets to improve the productivity of lawyers and consultants." Cheating Fears Over Chatbots Were Overblown, New Research Suggests https://www.nytimes.com/2023/12/13/technology/chatbot-cheating-schools-students.html Although this is US news, let's celebrate that the New York Times reports that Stanford education researchers have found that AI chatbots have not boosted overall cheating rates in schools. Hurrah! Maybe the punch is that they said that in their survey, the cheating rate has stayed about the same - at 60-70% Also interesting in the story is the datapoint that 32% of US teens hadn't heard of ChatGPT. And less than a quarter had heard a lot about it. Game changing use of AI to test the Student Experience. https://www.mlive.com/news/grand-rapids/2024/01/your-classmate-could-be-an-ai-student-at-this-michigan-university.html Ferris State University is enrolling two 'AI students' into classes (Ann and Fry). They will sit (virtually) alongside the students to attend lectures, take part in discussions and write assignments. as more students take the non-traditional route into and through university. "The goal of the AI student experiment is for Ferris State staff to learn what the student experience is like today" "Researchers will set up computer systems and microphones in Ann and Fry's classrooms so they can listen to their professor's lectures and any classroom discussions, Thompson said. At first, Ann and Fry will only be able to observe the class, but the goal is for the AI students to soon be able to speak during classroom discussions and have two-way conversations with their classmates, Thompson said. The AI students won't have a physical, robotic form that will be walking the hallways of Ferris State – for now, at least. Ferris State does have roving bots, but right now researchers want to focus on the classroom experience before they think about adding any mobility to Ann and Fry, Thompson said." "Researchers plan to monitor Ann and Fry's experience daily to learn what it's like being a student today, from the admissions and registration process, to how it feels being a freshman in a new school. Faculty and staff will then use what they've learned to find ways to make higher education more accessible." Research Papers Towards Accurate Differential Diagnosis with Large Language Models https://arxiv.org/pdf/2312.00164.pdf There has been a lot of past work trying to use AI to help with medical decision-making, but they often used other forms of AI, not LLMs. Now Google has trained a LLM specifically for diagnoses and in a randomized trial with 20 clinicians and 302 real-world medical cases, AI correctly diagnosed 59% of hard cases. Doctors only got 33% right even when they had access to Search and medical references. (Interestingly, doctors & AI working together did well, but not as good as AI did alone) The LLM's assistance was especially beneficial in challenging cases, hinting at its potential for specialist-level support. How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation https://arxiv.org/ftp/arxiv/papers/2311/2311.17696.pdf The researcher from the Education University of Hong Kong, used Open AI's GPT-4, in November, to create the chatbot tutor that was fed with course guides and materials to be able to tutor a student in a natural conversation. He describes the strengths as the natural conversation and human-like responses, and the ability to cover any topic as long as domain knowledge documents were available. The downsides highlighted are the accuracy risks, and that the performance depends on the quality and clarity of the student's question, and the quality of the course materials. In fact, on accuracy they conclude "Therefore, the AI tutor's answers should be verified and validated by the instructor or other reliable sources before being accepted as correct" which isn't really that helpful. TBH This is more of a project description than a research paper, but a good read nonetheless, to give confidence in AI tutors, and provides design outlines that others might find useful. Harnessing Large Language Models to Enhance Self-Regulated Learning via Formative Feedback https://arxiv.org/abs/2311.13984 Researchers in German universities created an open-access tool or platform called LEAP to provide formative feedback to students, to support self-regulated learning in Physics. They found it stimulated students' thinking and promoted deeper learning. It's also interesting that between development and publication, the release of new features in ChatGPT allows you to create a tutor yourself with some of the capabilities of LEAP. The paper includes examples of the prompts that they use, which means you can replicate this work yourself - or ask them to use their platform. ChatGPT in the Classroom: Boon or Bane for Physics Students' Academic Performance? https://arxiv.org/abs/2312.02422 These Columbian researchers let half of the students on a course loose with the help of ChatGPT, and the other half didn't have access. Both groups got the lecture, blackboard video and simulation teaching. The result? Lower performance for the ones who had ChatGPT, and a concern over reduced critical thinking and independent learning. If you don't want to do anything with generative AI in your classroom, or a colleague doesn't, then this is the research they might quote! The one thing that made me sit up and take notice was that they included a histogram of the grades for students in the two groups. Whilst the students in the control group had a pretty normal distribution and a spread across the grades, almost every single student in the ChatGPT group got exactly the same grade. Which makes me think that they all used ChatGPT for the assessment as well, which explains why they were all just above average. So perhaps the experiment led them to switch off learning AND switch off doing the assessment. So perhaps not a surprising result after all. And perhaps, if instead of using the free version they'd used the paid GPT-4, they might all have aced the exam too! Multiple papers on ChatGPT in Education There's been a rush of papers in early December in journals, produced by university researchers right across Asia, about the use of AI in Nursing Education, Teacher Professional Development, setting Maths questions, setting questions after reading textbooks and in Higher Education in Tamansiswa International Journal in Education and Science, International Conference on Design and Digital Communication, Qatar University and Universitas Negeri Malang in Indonesia. One group of Brazilian researchers tested in in elementary schools. And a group of 7 researchers from University of Michigan Medical School and 4 Japanese universities discovered that GPT-4 beat 2nd year medical residents significantly in Japan's General Medicine In-Training Examination (in Japanese!) with the humans scoring 56% and GPT-4 scoring 70%. Also fascinating in this research is that they classified all the questions as easy, normal or difficult. And GPT-4 did worse than humans in the easy problems (17% worse!), but 25% better in the normal and difficult problems. All these papers come to similar conclusions - things are changing, and there's upsides - and potential downsides to be managed. Imagine the downside of AI being better than humans at passing exams the harder they get! ChatGPT for generating questions and assessments based on accreditations https://arxiv.org/abs/2312.00047