Here's my understanding of it - An LLM trained exclusively on Khan Academy is going to have a nice fuzzy cloud of "wrong answers" and "things to say to wrong answers." A website with a test on it is going to have a hard-coded, verifiable, 100% accurate "right answer/wrong answer" matrix for any test they run. The mistake is in thinking that the LLM can be trained to grade papers. It's going to give you a fuzzy cloud of "here's the vicinity of the right answer." That fuzzy cloud, however, beats the tar out of "nothing" when we're talking about automated instruction. The solution space of "right answers" and "wrong answers" leans heavily towards correct approaches when the problem is "provide a list of helpful tips to a seventh grader struggling with solving for X in a simple algebra problem" or "how would you instruct a student who is confusing the ordinate and abscissa." Online tutorials are pretty shit right now and have been since the dawn of acoustic coupler modems. But realistically? AI should be Clippy. AI was born to be Clippy, that annoying thing you can ignore when it's out of its depth but which can do things like "looks like you're trying to make a table out of this data, would you like me to take a crack at it and then you can beat it to fit/paint it to match?" Just don't let it judge college admissions essays.