Is it just me or the clocks frequently break or change appearance without the page being refreshed?
Edit: nevermind, I skipped past the sentence explaining that every minute, the site prompts LLMs for a new solution. This is hilariously sad how LLMs aren’t able to be consistent from one prompt to another.
Apologies for the late reply, but it turns out I can’t let that sit. Sorry for the rant, but I work in RL and saying “it’s just dice rolls” is insulting to my entire line of work. :(
A probability distribution is not the same as random dice roll. Dice rolls are uniformly and independently random, whereas the probability distributions for LLMs are conditional on the context and the model’s learned parameters. Additionally, all modern LLMs use top K and p sampling–which filters the probability distribution to only high confidence words–so the probability of it choosing to say random garbage is exactly zero.
The issues with LLMs have nothing to do with their sampling from random distributions. That’s just a minor part of their training, and some LLMs don’t even do random sampling since they use tree search. The issues with LLMs are the result of people trying to teach it intelligence using behavior cloning on a corpus of human words and images. Words can’t encode wisdom, only knowledge. Wisdom can only be gained through lived experience.
How well do you think you would perform if you were born into a cave, forced to read a thousand dictionaries in order with no context, and then your only interaction with the outside world was a single question from a single human, and then you died? If you ask me, the LLMs are doing suprisingly well given their “lived experiences”.
“conditional on the context and the model’s learned parameters.” you seem to be under the wrong impression that “random dice roll” == “random dice roll from a uniform distribution”. I didn’t say that. If it outputs a probability distribution, which it does, then you sample it randomly according to that distribution, not a uniform one.
As for your last paragraph: I wasn’t, I didn’t do that, and if that’s all the system can do then people should stop claiming it is even remotely intelligent. Whatever the excuses, the systems aren’t (and won’t be getting) there. If you’re trying to get me to empathize with a couple of matrices, then you’re not going to succeed.
I don’t care if you get offended because someone else doesn’t like your line of work. I think what you do is actively harmful to humanity. I also dislike weapons manufacturers, how they feel about it is irrelevant. You’re no different
This is hilariously sad how LLMs aren’t able to be consistent from one prompt to another.
Typically that’s configurable. Like for a chatbot, you’d want it to give the same/similar results for a given question, where with a character creator, you might want the results to vary so you can re-run until you get something you like.
When I tried it Kimi K2 was surprisingly consistent and not even as bad as the others. Occasionally the numbers or hands (I couldn’t really tell which) were possitioned a bit off, for example the seconds hand will appear to be horizontal but the 9 or 3 will be slightly below or slightly above the hand. But whoever can center a div may throw the first stone, and it’s not going to be me for sure
Is it just me or the clocks frequently break or change appearance without the page being refreshed?
Edit: nevermind, I skipped past the sentence explaining that every minute, the site prompts LLMs for a new solution. This is hilariously sad how LLMs aren’t able to be consistent from one prompt to another.
It’s the expected result if your big ol’ artificial intelligence wannabe is ultimately just a stochastic word combinator.
if every single token is, at the end, chosen by random dice roll (and they are) then this is exactly what you’d expect.
that’s a massive oversimplification
not really. If the system outputs a probability distribution, then by definition, you’re picking somewhat randomly. So not really a simplification
Apologies for the late reply, but it turns out I can’t let that sit. Sorry for the rant, but I work in RL and saying “it’s just dice rolls” is insulting to my entire line of work. :(
A probability distribution is not the same as random dice roll. Dice rolls are uniformly and independently random, whereas the probability distributions for LLMs are conditional on the context and the model’s learned parameters. Additionally, all modern LLMs use top K and p sampling–which filters the probability distribution to only high confidence words–so the probability of it choosing to say random garbage is exactly zero.
The issues with LLMs have nothing to do with their sampling from random distributions. That’s just a minor part of their training, and some LLMs don’t even do random sampling since they use tree search. The issues with LLMs are the result of people trying to teach it intelligence using behavior cloning on a corpus of human words and images. Words can’t encode wisdom, only knowledge. Wisdom can only be gained through lived experience.
How well do you think you would perform if you were born into a cave, forced to read a thousand dictionaries in order with no context, and then your only interaction with the outside world was a single question from a single human, and then you died? If you ask me, the LLMs are doing suprisingly well given their “lived experiences”.
“conditional on the context and the model’s learned parameters.” you seem to be under the wrong impression that “random dice roll” == “random dice roll from a uniform distribution”. I didn’t say that. If it outputs a probability distribution, which it does, then you sample it randomly according to that distribution, not a uniform one.
As for your last paragraph: I wasn’t, I didn’t do that, and if that’s all the system can do then people should stop claiming it is even remotely intelligent. Whatever the excuses, the systems aren’t (and won’t be getting) there. If you’re trying to get me to empathize with a couple of matrices, then you’re not going to succeed.
I don’t care if you get offended because someone else doesn’t like your line of work. I think what you do is actively harmful to humanity. I also dislike weapons manufacturers, how they feel about it is irrelevant. You’re no different
Typically that’s configurable. Like for a chatbot, you’d want it to give the same/similar results for a given question, where with a character creator, you might want the results to vary so you can re-run until you get something you like.
Of course that wouldn’t be as funny here.
When I tried it Kimi K2 was surprisingly consistent and not even as bad as the others. Occasionally the numbers or hands (I couldn’t really tell which) were possitioned a bit off, for example the seconds hand will appear to be horizontal but the 9 or 3 will be slightly below or slightly above the hand. But whoever can center a div may throw the first stone, and it’s not going to be me for sure
You can click on the button in the top right corner (with a question mark) to have explanations. The clocks are refreshed every minute