-
Notifications
You must be signed in to change notification settings - Fork 23
FAQ
AI Text is a Moodle question type that uses an external Large Language Model/AI System to evaluate student responses to quiz questions. This raises both teaching and technical issues.
TL;DR mostly. LLM systems are inherently unreliable, they deliver responses built on statistical analysis of publicly available data. This data will include “well known” but sometimes entirely inaccurate information. They will also embed widely held prejudice,bias and misinformation. For this reason responses should always be regarded as preliminary and with an expectation that some will be misleading and or wrong
Thanks to Tom F for this question
https://moodle.org/mod/forum/discuss.php?d=455612#p1889767
This is probably an issue with the LLM (Large Language Model) that is being accessed. They are "non deterministic" meaning the same input can result in different outputs. One reason for the response "I'm sorry, I can't assist with that request." is that the guard rails interpret the prompt as something it should not give an answer to. I have seen it give this response to the most innocuous prompting.
It would be useful to get a list of typical prompts that get that sort of response. This feature has not been implemented in main yet, but it could help get information on this
https://github.com/marcusgreen/moodle-qtype_aitext/wiki/Prompt-Logging
Also changing to an "uncensored" model might help (not available from OpenAI/ChatGPT to my knowledge).
Two of the possible benefits of LLM’s for feedback are that it comes quickly and it is mostly correct, plus it can be more correct with improved prompting. This begs the question as to if a quick incorrect answer is better than no answer at all, or a delayed correct answer.
This tool should never be used for “high stakes” summative assessments. It is designed to promote learning and to offer quick feedback. If the evaluation of learning is high stakes, e.g. decides some significant benefit to a student this should not be used.
What is and what is not cheating is subjective and depends on context. Since the dawn of Educational Technology students have attempted to shortcut the need to learn content by doing things such as writing answers on their skin, using calculators for maths and copying from web sites. This tool is no different.
Anecdote: My bill for 12 months use of OpenAI ChatGPT including making a set of questions available on a public website with instant sign up is approximately $USD 50. I have also made extensive use of Groq cloud which is a high performance LLM system that offers access without financial cost (or at least I am not aware of any way to pay for it.
Once the cost of installation and maintenance has been covered there is an ongoing cost for the LLM/Inference system. I do not have direct experience with measuring the costs of large numbers of students using Inference systems but I am in contact with people who do and I will update this section as I get more information.
Typically Inference systems (e.g. OpenAI/ChatGPT) charge in units of millions of tokens. There are some high cost leading edge systems but so far costs per student seam “reasonable”, i.e. within historical ranges of putting a computer on a desk, providing textbooks and the fractionalised cost of having a teacher in a classroom.
However unlike those costs Inference costs have fallen dramatically over the last two years and it is likely they will fall further. There is huge investment in alternatives to the GPU approach to Inference that is likely to significantly change the economics of inference.
AI Systems can implement a limit on the number of requests they can process within a set amount of time, e.g. X thousand requests per second. Because of the burst-like nature of quiz responses this could result in hitting this sort of barrier.