4

There was this question I came up with that was very good at inducing hallucinations on what at the time I thought was a *lobotomized* LLM.

I can't recall the exact wording right now, but in essence you asked it to perform OpenGL batched draw calls in straight x86_64 assembly. It would begin writing seemingly correct code, quickly run out of registers, and then immediately start making up register names instead of moving data to memory.

You may say: big deal, it has nowhere to pull from to answer such an arcane fucking riddle, so of course it's going to bullshit you. That's not the point. The point is it cannot realize that it's running out of registers, and more importantly, that it makes up a multitude of register names which _will_ degrade the context due to the introduction of absolute fabrications, leading to the error propagating further even if you clearly point out the obvious mistake.

Basically, my thought process went as follows: if it breaks at something fundamental, then it __will__ most certainly break in every other situation, in either subtle or overt ways.

Which begged the question: is it a trait of _this_ model in particular, or is it applicable to LLMs in general?

I felt I was on to something, but I couldn't be sure because, again, I was under the impression that the model on which I tested this was too old and stupid so as to consider these results significant proof of anything; AI is certainly not my field, so I had to entertain the idea that I could be wrong, albeit I did so begrudgingly -- for obvious reasons, I want at least "plausible based on my observations" rather than just "I can feel it in my balls".

So, as time went on, I made similar tests on other models whenever I got a chance to do so, and full disclosure, I spent no money on this so you may utilize that fact in your doomed attempt to disprove me lmao. Anyway, it's been a long enough while, I think, and I have a feeling you folks can guess the final answer already:

(**SLIGHTLY OMINOUS DRUM ROLL**)

The "lobotomy" in question was merely a low cap on context tokens (~4000), which I never went over in the first place; newer/"more advanced" models don't fare any better, and I have been _very_ lenient in what I consider a passable answer.

So that's that, is what I'm starting to think: I was right all along, and went through the burdensome hurdle of sincerely questioning the immaculate intuition of my balls entirely for naught -- learn from this mistake and never question your own mystical seniority. Just kidding, but not really.

The problem with the force of belief is it can work both ways, by which I mean, belief that I could be wrong is the reason I bothered looking further into it, whereas belief to the contrary very much compels me to dismiss doubt entirely. I don't need that, I need certainty, dammit. And though I cannot in good faith say that I am _certain_, "sufficiently convinced" will have to do for the time being.

TL;DR I don't know but the more I see it just seems shittier.

Comments
  • 4
    I'm glad you're talking about the context window and the model. So many people use these things as black boxes and have no idea how they work.

    They can't "hallucinate" or "lie" .. they literally have no intent. What you have discovered is the truth: They are random next word (really token) generators. Anything they get right is literally by accident. People seem fine with things being right accidentally 80% of the time ... just waiting for that structural engineer who uses AI and it gets load bearing wrong and 8 people die in a bridge collapse.

    3Blue1Brown has some excellent videos of how LLMs work, including the vector space addition, the transformers, preceptron blocks and attention blocks. There's a good 7 min layman's video too. I'd recommend going through them. They're eye opening.
  • 2
    Your last paragraphs sounds kinda like AI hallucinations, didn't understand anything
  • 1
    "AI"s are complex prediction engines. Try making the same request in C++ or another language where it is commonly implemented, and you'll most likely get a far better result
  • 1
    Please always mention that models you did use. In context window Gemini is best now, it can have a book around 1500 pages as context window.
  • 2
    @Pogromist I was trying to see for myself whether using any of these for coding was a good or bad idea; a lot of folks will say something along the lines of "it's just another tool", but my intuition, and now my own experience, tell me that it's a pretty awful one. That's more or less what I was trying to say.

    @BordedDev Yep, I wasn't so much interested in whether they could handle drudgework like fetching the correct boilerpaste, but whether prediction was good enough to handle solving an actual problem. It isn't.

    @tamagotchi I've made a point of not naming them as in the end they all have the exact same Achilles' heel; conclusion being it just doesn't matter which model you use.

    I've read many comments that very much read like this: "just try [a fortune-teller], it's way better than [a fortune-teller]". Sorry, but that's just missing the point entirely.
  • 1
    @Liebranca I can assure you, it makes a lot of difference if you take claude opus or anything else. And let users decide and test them themselves by reproducing things. Now it is like i'm complaining that snake is running slow on all models while the fact is that snake only runs slow on a Nokia 3330. Nokia 3310 had the best snek.
  • 1
    @tamagotchi A problem inextricable to a type of system does not magically go away when you move from one instance to another.
  • 1
    @Liebranca now you just invented a word. Admit it.
  • 1
    @Liebranca @tamagotchi I wonder if you dissemble a bunch of games and train them on that what the results area. They are just tools after all, that greatly reduce the google search loop. I definitely think they will take over a lot of "grunt" dev aka web, and probably a good chunk of game dev. Just think about how many open source (or let's be honest source available, not like they'll make the difference when training) there are. I bet it's also great at making minecraft mods.
Add Comment