AI Breakthrough: LLM reasoning in complex math signals new enterprise applications

Published on 07/20/2025Trend Spotting / Early Adopter Signals

A reported 'Breakthrough in LLM reasoning on complex math problems' is a significant emerging technology indicator. This advancement suggests LLMs are becoming more capable of logical and analytical tasks beyond simple text generation. This opens up vast commercial opportunities in fields requiring complex problem-solving, such as scientific research, engineering, financial modeling, and advanced education. Companies can explore integrating these enhanced LLM capabilities into their products, offering specialized AI-driven solutions, or developing training programs for these new AI tools.

Origin Reddit Post

r/futurology

Breakthrough in LLM reasoning on complex math problems

Posted by u/Similar-Document969007/20/2025
Wow

Top Comments

u/fuku_visit
Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i
u/GepardenK
No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn
u/robotlasagna
The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'
u/Dear-Mix-5841
Yeah buddy, because A.I. uses a capitalized “And” to start sentences.
u/kyriosity-at-github
The keyword is "claims" and a kiddish illustration as the state of the things.
u/Andy12_
Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o
u/charmcharmcharm
I don’t think that’s the comparison that is being made, GenericFatGuy.
u/NinjaLanternShark
Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t
u/Disastrous-Form-3613
"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"
u/ZERV4N
Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the
u/not_mig
As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on
u/NinjaLanternShark
Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t
u/daronjay
Wow, what a collection of new goal posts!
u/Similar-Document9690
Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a
u/xt-89
A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.
u/GenericFatGuy
The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.
u/talligan
Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an
u/DrBimboo
I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda
u/GepardenK
Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s
u/CorruptedFlame
You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin
u/talligan
It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.
u/Fr00stee
well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct
u/not_mig
That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy
u/talligan
Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an
u/azhder
Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.
u/Qcconfidential
I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.
u/Mirar
It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.
u/GepardenK
It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr
u/xt-89
A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.
u/Joke_of_a_Name
Depending on the artists in the future, we're gonna need serious ballad solutions.
u/not_mig
As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on
u/SeriousGeorge2
>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe
u/SleepyCorgiPuppy
Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.
u/Daniel1827
What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc
u/krefik
It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai 
u/ZERV4N
Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.
u/Fr00stee
well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct
u/cwright017
You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob
u/TheMadWho
well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there
u/Etroarl55
Does this mean CS is even more giga cooked now 😭
u/FreeNumber49
She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta
u/Fr00stee
well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct
u/Revolutionary-Bag-52
No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models
u/GenericFatGuy
They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c
u/Dear-Mix-5841
Yeah buddy, because A.I. uses a capitalized “And” to start sentences.
u/FreeNumber49
She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta
u/TheMadWho
well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there
u/Affectionate-Rain495
what did you expect from r/futurology, people hate technology here 
u/Affectionate-Rain495
what did you expect from r/futurology, people hate technology here 
u/hollowgram
How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo
u/Dear-Mix-5841
Yeah buddy, because A.I. uses a capitalized “And” to start sentences.
u/krefik
It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai 
u/michael-65536
Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data
u/ColdStorageParticle
But still it solved an already solved math problem right? It did not solve something that is not solved yet?
u/azhder
Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.
u/SupermarketIcy4996
Now if you could explain that to all the people who keep saying it's just a different kind of Google search.
u/Dear-Mix-5841
All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm
u/SupermarketIcy4996
This adult world is so vague. How could we simplify it to infant level.
u/spryes
Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso
u/ZERV4N
Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.
u/hollowgram
How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo
u/MachinationMachine
How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve. 
u/azhder
It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav
u/azhder
Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.
u/NinjaLanternShark
I'm not telling anyone I've made a "breakthrough" from who I was last week.
u/xt-89
A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.
u/Revolutionary-Bag-52
No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models
u/Mirar
It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.
u/Lokon19
I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.
u/GenericFatGuy
The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.
u/ItsAConspiracy
If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is
u/not_mig
As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on
u/hollowgram
How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo
u/Sad-Reality-9400
How would you define AI?
u/ExplorerNo1496
Well how will this change AI practically especially for research
u/al-Assas
Oh, no. This does sound like a genuine improvement of the neural network itself. Progress should have plateaued out by now. This is not going to end well.
u/yblad
Journal paper or it didn't happen. A tweet isn't evidence that something has been done.
u/Lokon19
I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.
u/azhder
It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav
u/CorruptedFlame
Google "breakthrough".
u/xt-89
A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.
u/SeriousGeorge2
>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe
u/d7sg
We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?
u/SupermarketIcy4996
Now if you could explain that to all the people who keep saying it's just a different kind of Google search.
u/Qcconfidential
I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.
u/azhder
To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making
u/michael-65536
Corporations run by horses, and gecko billionaires? Or...
u/azhder
It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav
u/Etroarl55
Does this mean CS is even more giga cooked now 😭
u/yblad
Journal paper or it didn't happen. A tweet isn't evidence that something has been done.
u/krefik
It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai 
u/NinjaLanternShark
Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does
u/xt-89
A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.
u/Andy12_
Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o
u/GepardenK
Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s
u/GepardenK
>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so
u/talligan
Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an
u/a_brain
Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben
u/xt-89
A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.
u/marrow_monkey
It is just predicting the next token /s
u/NinjaLanternShark
That's steps. What's the difference between steps and reasoning?
u/xt-89
A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.
u/FreeNumber49
Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er
u/Affectionate-Rain495
It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people
u/azhder
To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making
u/Similar-Document9690
Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a
u/Fr00stee
I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google
u/GenericFatGuy
They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c
u/michael-65536
Corporations run by horses, and gecko billionaires? Or...
u/Javamac8
From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself
u/Affectionate-Rain495
what did you expect from r/futurology, people hate technology here 
u/GepardenK
Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s
u/ExplorerNo1496
Man I really want to know how they've done it
u/FreeNumber49
Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er
u/GepardenK
>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so
u/azhder
Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't
u/robotlasagna
The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'
u/yblad
Journal paper or it didn't happen. A tweet isn't evidence that something has been done.
u/GenericFatGuy
They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c
u/SleepyCorgiPuppy
Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.
u/Sad-Reality-9400
Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?
u/d7sg
We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?
u/not_mig
As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on
u/Daniel1827
What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc
u/GepardenK
It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr
u/Lucky_Yam_1581
Will labs release models that can get an IMO gold medal and world’s second best in coding at the same time? If we do get access, what a common folk like me should do with it? 
u/spryes
Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso
u/d7sg
We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?
u/SleepyCorgiPuppy
Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.
u/xt-89
A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.
u/ColdStorageParticle
But still it solved an already solved math problem right? It did not solve something that is not solved yet?
u/Javamac8
From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself
u/michael-65536
Corporations run by horses, and gecko billionaires? Or...
u/FuturologyBot
The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang
u/SupermarketIcy4996
This adult world is so vague. How could we simplify it to infant level.
u/NinjaLanternShark
I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d
u/kyriosity-at-github
The keyword is "claims" and a kiddish illustration as the state of the things.
u/Dear-Mix-5841
All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm
u/fuku_visit
Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i
u/michael-65536
Okay, now you've cleared up what you didn't say, (and what I didn't say you said). I take that to mean you're not willing to think about or respond to what I actually did say? Your prerogat
u/Lokon19
I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.
u/FuturologyBot
The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang
u/Alternative-Soil2576
What are you trying to prove?
u/SFanatic
I’ll trust in the power of LLMs when one can make me a 7 pointed star
u/d7sg
We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?
u/SFanatic
I’ll trust in the power of LLMs when one can make me a 7 pointed star
u/GepardenK
It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr
u/Lucky_Yam_1581
Will labs release models that can get an IMO gold medal and world’s second best in coding at the same time? If we do get access, what a common folk like me should do with it? 
u/SeriousGeorge2
>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe
u/SupermarketIcy4996
Now if you could explain that to all the people who keep saying it's just a different kind of Google search.
u/Affectionate-Rain495
It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people
u/SupermarketIcy4996
Now if you could explain that to all the people who keep saying it's just a different kind of Google search.
u/EnlightenedSinTryst
Addressed meaning what?
u/marrow_monkey
It is just predicting the next token /s
u/marrow_monkey
It is just predicting the next token /s
u/Similar-Document9690
Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a
u/cwright017
Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house
u/Similar-Document9690
Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a
u/ExplorerNo1496
Well how will this change AI practically especially for research
u/daronjay
Wow, what a collection of new goal posts!
u/Lokon19
I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.
u/Affectionate-Rain495
It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people
u/MachinationMachine
For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.
u/yblad
Journal paper or it didn't happen. A tweet isn't evidence that something has been done.
u/GenericFatGuy
The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.
u/robotlasagna
That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un
u/cwright017
You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob
u/ItsAConspiracy
If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is
u/a_brain
Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben
u/FreeNumber49
She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta
u/ItsAConspiracy
If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is
u/SupermarketIcy4996
This adult world is so vague. How could we simplify it to infant level.
u/ZERV4N
Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the
u/FreeNumber49
Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er
u/[deleted]
[deleted]
u/fuku_visit
Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?
u/NinjaLanternShark
That's steps. What's the difference between steps and reasoning?
u/CorruptedFlame
You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin
u/robotlasagna
The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'
u/DrBimboo
I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda
u/azhder
It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav
u/Fr00stee
well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct
u/snoee
Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed
u/Alternative-Soil2576
What are you trying to prove?
u/daronjay
Wow, what a collection of new goal posts!
u/Similar-Document9690
Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a
u/FreeNumber49
Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er
u/TheMadWho
well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there
u/SupermarketIcy4996
Now if you could explain that to all the people who keep saying it's just a different kind of Google search.
u/talligan
Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an
u/Affectionate-Rain495
what did you expect from r/futurology, people hate technology here 
u/NinjaLanternShark
I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d
u/Fr00stee
I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google
u/Affectionate-Rain495
what did you expect from r/futurology, people hate technology here 
u/Fr00stee
well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct
u/FreeNumber49
She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta
u/Sad-Reality-9400
Well your definition isn't very useful then is it?
u/Sad-Reality-9400
Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?
u/Fr00stee
well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct
u/woodenanteater
Now if only your comment didn't ring of AI either...
u/FuturologyBot
The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang
u/Dear-Mix-5841
Yeah buddy, because A.I. uses a capitalized “And” to start sentences.
u/NinjaLanternShark
I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d
u/FreeNumber49
Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er
u/yblad
Journal paper or it didn't happen. A tweet isn't evidence that something has been done.
u/Fr00stee
I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google
u/lostinspaz
the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps
u/spryes
Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso
u/ItsAConspiracy
If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is
u/azhder
To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making
u/cwright017
You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob
u/SupermarketIcy4996
AI denialists sound awfully lot like climate change denialists.
u/d7sg
We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?
u/NinjaLanternShark
I'm not telling anyone I've made a "breakthrough" from who I was last week.
u/fuku_visit
LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.
u/Daniel1827
What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc
u/Joke_of_a_Name
Depending on the artists in the future, we're gonna need serious ballad solutions.
u/GenericFatGuy
They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c
u/krefik
It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai 
u/FreeNumber49
She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta
u/NinjaLanternShark
I'm not telling anyone I've made a "breakthrough" from who I was last week.
u/fuku_visit
Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i
u/abyssazaur
You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne
u/[deleted]
[deleted]
u/Fr00stee
I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google
u/play_yr_part
all of those will be solved when we're all paperclips
u/Alternative-Soil2576
What are you trying to prove?
u/Similar-Document9690
Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a
u/daronjay
Wow, what a collection of new goal posts!
u/robotlasagna
The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'
u/fuku_visit
LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.
u/abyssazaur
You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne
u/snoee
Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed
u/Affectionate-Rain495
It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people
u/hollowgram
How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo
u/spryes
Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso
u/FreeNumber49
She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta
u/Similar-Document9690
Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a
u/Andy12_
Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o
u/lostinspaz
the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps
u/SFanatic
I’ll trust in the power of LLMs when one can make me a 7 pointed star
u/Daniel1827
Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie
u/DrBimboo
I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda
u/EnlightenedSinTryst
Addressed meaning what?
u/Dear-Mix-5841
All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm
u/Andy12_
Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o
u/cwright017
You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob
u/Mirar
It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.
u/cwright017
Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house
u/NinjaLanternShark
Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t
u/fuku_visit
OK... I'm talking to someone who is comparing Google maps to AI.......
u/Etroarl55
Does this mean CS is even more giga cooked now 😭
u/fuku_visit
OK... I'm talking to someone who is comparing Google maps to AI.......
u/Mirar
It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.
u/Dear-Mix-5841
Yeah buddy, because A.I. uses a capitalized “And” to start sentences.
u/fuku_visit
LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.
u/play_yr_part
all of those will be solved when we're all paperclips
u/NinjaLanternShark
I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d
u/NinjaLanternShark
I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d
u/ExplorerNo1496
Well how will this change AI practically especially for research
u/ExplorerNo1496
Man I really want to know how they've done it
u/Joke_of_a_Name
Depending on the artists in the future, we're gonna need serious ballad solutions.
u/ColdStorageParticle
But still it solved an already solved math problem right? It did not solve something that is not solved yet?
u/Revolutionary-Bag-52
No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models
u/GepardenK
No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn
u/lostinspaz
the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps
u/FreeNumber49
Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er
u/wiztard
I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM
u/fuku_visit
Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i
u/SFanatic
I’ll trust in the power of LLMs when one can make me a 7 pointed star
u/fuku_visit
Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?
u/a_brain
Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben
u/Dear-Mix-5841
All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm
u/FreeNumber49
She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta
u/EnlightenedSinTryst
Addressed meaning what?
u/EnlightenedSinTryst
Addressed meaning what?
u/MachinationMachine
How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve. 
u/Lokon19
I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.
u/ZERV4N
Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.
u/Alternative-Soil2576
What are you trying to prove?
u/michael-65536
Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data
u/Sad-Reality-9400
Well your definition isn't very useful then is it?
u/Javamac8
From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself
u/fuku_visit
Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i
u/abyssazaur
You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne
u/NinjaLanternShark
That's steps. What's the difference between steps and reasoning?
u/NinjaLanternShark
I'm not telling anyone I've made a "breakthrough" from who I was last week.
u/FuturologyBot
The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang
u/Lokon19
I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.
u/talligan
Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an
u/not_mig
That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy
u/Andy12_
Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o
u/[deleted]
[deleted]
u/gannex
LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th
u/snoee
Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed
u/ExplorerNo1496
Well how will this change AI practically especially for research
u/talligan
It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.
u/fuku_visit
Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?
u/GepardenK
Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s
u/ExplorerNo1496
Well how will this change AI practically especially for research
u/SFanatic
I’ll trust in the power of LLMs when one can make me a 7 pointed star
u/SeriousGeorge2
>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe
u/azhder
To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making
u/ZERV4N
Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the
u/Daniel1827
Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie
u/ColdStorageParticle
But still it solved an already solved math problem right? It did not solve something that is not solved yet?
u/Sad-Reality-9400
Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?
u/fuku_visit
Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i
u/gannex
LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th
u/robotlasagna
That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un
u/GepardenK
Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s
u/FreeNumber49
Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just
u/Sad-Reality-9400
Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?
u/Joke_of_a_Name
Depending on the artists in the future, we're gonna need serious ballad solutions.
u/marrow_monkey
It is just predicting the next token /s
u/cwright017
Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house
u/ElectronicMoo
But it's not creativity "thinking", and that's what folks are on about. An llm, from word to word, doesn't have the foggiest what it's saying to you. It's a very powerful engine in pattern
u/Dear-Mix-5841
All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm
u/CorruptedFlame
Google "breakthrough".
u/SFanatic
I’ll trust in the power of LLMs when one can make me a 7 pointed star
u/Disastrous-Form-3613
"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"
u/snoee
Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed
u/Daniel1827
Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie
u/MachinationMachine
For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.
u/Fr00stee
I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google
u/wiztard
I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM
u/Fr00stee
I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google
u/gannex
LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th
u/FreeNumber49
She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta
u/wiztard
I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM
u/Javamac8
From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself
u/FreeNumber49
Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just
u/GepardenK
>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so
u/MachinationMachine
How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve. 
u/cwright017
You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob
u/NinjaLanternShark
I'm not telling anyone I've made a "breakthrough" from who I was last week.
u/[deleted]
[deleted]
u/FreeNumber49
She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta
u/azhder
Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.
u/DrBimboo
I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda
u/Disastrous-Form-3613
"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"
u/Disastrous-Form-3613
"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"
u/Sad-Reality-9400
Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?
u/NinjaLanternShark
I'm not telling anyone I've made a "breakthrough" from who I was last week.
u/NinjaLanternShark
That's steps. What's the difference between steps and reasoning?
u/EnlightenedSinTryst
Addressed meaning what?
u/woodenanteater
Now if only your comment didn't ring of AI either...
u/Joke_of_a_Name
Depending on the artists in the future, we're gonna need serious ballad solutions.
u/ElectronicMoo
But it's not creativity "thinking", and that's what folks are on about. An llm, from word to word, doesn't have the foggiest what it's saying to you. It's a very powerful engine in pattern
u/GepardenK
>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so
u/daronjay
Wow, what a collection of new goal posts!
u/Affectionate-Rain495
It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people
u/play_yr_part
all of those will be solved when we're all paperclips
u/michael-65536
Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data
u/Daniel1827
Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie
u/ZERV4N
Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the
u/EnlightenedSinTryst
Addressed meaning what?
u/azhder
Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't
u/fuku_visit
Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?
u/SupermarketIcy4996
Now if you could explain that to all the people who keep saying it's just a different kind of Google search.
u/Affectionate-Rain495
It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people
u/Sad-Reality-9400
How would you define AI?
u/ElectronicMoo
But it's not creativity "thinking", and that's what folks are on about. An llm, from word to word, doesn't have the foggiest what it's saying to you. It's a very powerful engine in pattern
u/spryes
Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso
u/kyriosity-at-github
The keyword is "claims" and a kiddish illustration as the state of the things.
u/Mirar
It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.
u/Sad-Reality-9400
How would you define AI?
u/Sad-Reality-9400
Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?
u/TheMadWho
well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there
u/azhder
Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.
u/robotlasagna
The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'
u/Affectionate-Rain495
what did you expect from r/futurology, people hate technology here 
u/fuku_visit
LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.
u/Sad-Reality-9400
How would you define AI?
u/Disastrous-Form-3613
"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"
u/ItsAConspiracy
If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is
u/Joke_of_a_Name
Depending on the artists in the future, we're gonna need serious ballad solutions.
u/play_yr_part
all of those will be solved when we're all paperclips
u/GepardenK
It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr
u/azhder
To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making
u/Exciting-Position716
It's simply inevitable, whether one likes it or not.  I for one see the positives in A.I.  Yes it will drastically and radically alter our entire world, our society, entire industries, et
u/hollowgram
How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo
u/Qcconfidential
I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.
u/GenericFatGuy
The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.
u/Fr00stee
I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google
u/CorruptedFlame
Google "breakthrough".
u/SupermarketIcy4996
Now if you could explain that to all the people who keep saying it's just a different kind of Google search.
u/ExplorerNo1496
Well how will this change AI practically especially for research
u/NinjaLanternShark
That's steps. What's the difference between steps and reasoning?
u/Lucky_Yam_1581
Reminds me of Ilya’s quote that if you feed an LLM with a detective novel and hide the ending and ask it to guess the ending. If it nails the ending then it understands and not just memorizin
u/GepardenK
No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn
u/abyssazaur
You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne
u/Dear-Mix-5841
All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm
u/EnlightenedSinTryst
Addressed meaning what?
u/talligan
It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.
u/GenericFatGuy
They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c
u/Sad-Reality-9400
Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?
u/fuku_visit
LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.
u/azhder
Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't
u/SupermarketIcy4996
Now if you could explain that to all the people who keep saying it's just a different kind of Google search.
u/michael-65536
Corporations run by horses, and gecko billionaires? Or...
u/FreeNumber49
Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just
u/michael-65536
Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data
u/Dear-Mix-5841
All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm
u/NinjaLanternShark
Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does
u/Mirar
It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.
u/FreeNumber49
Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just
u/michael-65536
Corporations run by horses, and gecko billionaires? Or...
u/ExplorerNo1496
Man I really want to know how they've done it
u/azhder
Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.
u/Andy12_
Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o
u/krefik
It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai 
u/NinjaLanternShark
That's steps. What's the difference between steps and reasoning?
u/Javamac8
From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself
u/Lokon19
I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.
u/GenericFatGuy
Comparing AI to climate change isn't the own you think it is.
u/snoee
Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed
u/Dear-Mix-5841
Yeah buddy, because A.I. uses a capitalized “And” to start sentences.
u/Sad-Reality-9400
How would you define AI?
u/MachinationMachine
How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve. 
u/robotlasagna
That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un
u/Sad-Reality-9400
Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?
u/SleepyCorgiPuppy
Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.
u/Dear-Mix-5841
All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm
u/Alternative-Soil2576
What are you trying to prove?
u/NinjaLanternShark
Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does
u/Exciting-Position716
It's simply inevitable, whether one likes it or not.  I for one see the positives in A.I.  Yes it will drastically and radically alter our entire world, our society, entire industries, et
u/Lokon19
I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.
u/MachinationMachine
How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve. 
u/krefik
It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai 
u/MachinationMachine
For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.
u/wiztard
I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM
u/gannex
LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th
u/azhder
To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making
u/kyriosity-at-github
The keyword is "claims" and a kiddish illustration as the state of the things.
u/NinjaLanternShark
I'm not telling anyone I've made a "breakthrough" from who I was last week.
u/michael-65536
Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data
u/ElectronicMoo
But it's not creativity "thinking", and that's what folks are on about. An llm, from word to word, doesn't have the foggiest what it's saying to you. It's a very powerful engine in pattern
u/ColdStorageParticle
But still it solved an already solved math problem right? It did not solve something that is not solved yet?
u/GepardenK
It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr
u/krefik
It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai 
u/a_brain
Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben
u/gannex
LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th
u/Sad-Reality-9400
How would you define AI?
u/NinjaLanternShark
I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d
u/GepardenK
Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s
u/spryes
Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso
u/Lucky_Yam_1581
Reminds me of Ilya’s quote that if you feed an LLM with a detective novel and hide the ending and ask it to guess the ending. If it nails the ending then it understands and not just memorizin
u/al-Assas
Oh, no. This does sound like a genuine improvement of the neural network itself. Progress should have plateaued out by now. This is not going to end well.
u/Andy12_
Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o
u/ZERV4N
Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the
u/cwright017
Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house
u/GepardenK
It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr
u/Fr00stee
I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google
u/NinjaLanternShark
Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t
u/TheMadWho
well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there
u/Sad-Reality-9400
Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?
u/ColdStorageParticle
But still it solved an already solved math problem right? It did not solve something that is not solved yet?
u/Fr00stee
I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google
u/krefik
It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai 
u/Javamac8
From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself
u/kyriosity-at-github
The keyword is "claims" and a kiddish illustration as the state of the things.
u/MachinationMachine
How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve. 
u/GenericFatGuy
Comparing AI to climate change isn't the own you think it is.
u/[deleted]
[deleted]
u/fuku_visit
Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i
u/CorruptedFlame
You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin
u/azhder
To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making
u/marrow_monkey
It is just predicting the next token /s
u/GepardenK
No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn
u/NinjaLanternShark
I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d
u/Similar-Document9690
Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a
u/GenericFatGuy
The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.
u/azhder
It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav
u/Fr00stee
well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct
u/Affectionate-Rain495
what did you expect from r/futurology, people hate technology here 
u/wiztard
I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM
u/SFanatic
I’ll trust in the power of LLMs when one can make me a 7 pointed star
u/azhder
Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.
u/daronjay
Wow, what a collection of new goal posts!
u/gannex
LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th
u/Fr00stee
well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct
u/NinjaLanternShark
I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d
u/talligan
Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an
u/krefik
It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai 
u/Daniel1827
What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc
u/spryes
Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso
u/TheMadWho
well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there
u/Exciting-Position716
It's simply inevitable, whether one likes it or not.  I for one see the positives in A.I.  Yes it will drastically and radically alter our entire world, our society, entire industries, et
u/hollowgram
How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo
u/not_mig
As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on
u/a_brain
Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben
u/hollowgram
How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo
u/ElectronicMoo
But it's not creativity "thinking", and that's what folks are on about. An llm, from word to word, doesn't have the foggiest what it's saying to you. It's a very powerful engine in pattern
u/marrow_monkey
It is just predicting the next token /s
u/talligan
Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an
u/woodenanteater
Now if only your comment didn't ring of AI either...
u/FuturologyBot
The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang
u/SleepyCorgiPuppy
Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.
u/fuku_visit
LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.
u/abyssazaur
You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne
u/Javamac8
From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself
u/fuku_visit
Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?
u/FuturologyBot
The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang
u/SleepyCorgiPuppy
Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.
u/CorruptedFlame
You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin
u/snoee
Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed
u/NinjaLanternShark
I'm not telling anyone I've made a "breakthrough" from who I was last week.
u/SeriousGeorge2
>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe
u/ExplorerNo1496
Man I really want to know how they've done it
u/SleepyCorgiPuppy
Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.
u/SleepyCorgiPuppy
Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.
u/GepardenK
It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr
u/Sad-Reality-9400
Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?
u/not_mig
As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on
u/hollowgram
How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo
u/GepardenK
It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr
u/michael-65536
Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data
u/azhder
Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.
u/a_brain
Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben
u/NinjaLanternShark
Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does
u/woodenanteater
Now if only your comment didn't ring of AI either...
u/fuku_visit
Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i
u/Sad-Reality-9400
Well your definition isn't very useful then is it?
u/Similar-Document9690
Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a
u/CorruptedFlame
Google "breakthrough".
u/Revolutionary-Bag-52
No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models
u/Exciting-Position716
It's simply inevitable, whether one likes it or not.  I for one see the positives in A.I.  Yes it will drastically and radically alter our entire world, our society, entire industries, et
u/cwright017
Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house
u/daronjay
Wow, what a collection of new goal posts!
u/Sad-Reality-9400
How would you define AI?
u/fuku_visit
OK... I'm talking to someone who is comparing Google maps to AI.......
u/Mirar
It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.
u/NinjaLanternShark
Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does
u/GepardenK
Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s
u/cwright017
Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house
u/GenericFatGuy
They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c
u/MachinationMachine
For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.
u/marrow_monkey
It is just predicting the next token /s
u/TheMadWho
well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there
u/ZERV4N
Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.
u/Affectionate-Rain495
what did you expect from r/futurology, people hate technology here 
u/CorruptedFlame
You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin
u/Disastrous-Form-3613
"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"
u/GepardenK
No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn
u/ElectronicMoo
But it's not creativity "thinking", and that's what folks are on about. An llm, from word to word, doesn't have the foggiest what it's saying to you. It's a very powerful engine in pattern
u/fuku_visit
OK... I'm talking to someone who is comparing Google maps to AI.......
u/SeriousGeorge2
>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe
u/Affectionate-Rain495
It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people
u/NinjaLanternShark
Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does
u/SeriousGeorge2
>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe
u/GepardenK
No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn
u/Sad-Reality-9400
Well your definition isn't very useful then is it?
u/azhder
Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't
u/robotlasagna
That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un
u/Sad-Reality-9400
Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?
u/Qcconfidential
I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.
u/michael-65536
Seemed like that's exactly what you were doing in that first comment, but whatever.
u/Dear-Mix-5841
Yeah buddy, because A.I. uses a capitalized “And” to start sentences.
u/MachinationMachine
For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.
u/Javamac8
From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself
u/woodenanteater
Now if only your comment didn't ring of AI either...
u/azhder
Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.
u/Revolutionary-Bag-52
No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models
u/abyssazaur
You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne
u/play_yr_part
all of those will be solved when we're all paperclips
u/d7sg
We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?
u/GenericFatGuy
They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c
u/cwright017
Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house
u/GepardenK
Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s
u/Javamac8
From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself
u/robotlasagna
The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'
u/abyssazaur
You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne
u/MachinationMachine
How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve. 
u/Lucky_Yam_1581
Reminds me of Ilya’s quote that if you feed an LLM with a detective novel and hide the ending and ask it to guess the ending. If it nails the ending then it understands and not just memorizin
u/FuturologyBot
The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang
u/woodenanteater
Now if only your comment didn't ring of AI either...
u/GepardenK
No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn
u/Lucky_Yam_1581
Will labs release models that can get an IMO gold medal and world’s second best in coding at the same time? If we do get access, what a common folk like me should do with it? 
u/Daniel1827
Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie
u/Qcconfidential
I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.
u/Qcconfidential
I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.
u/Revolutionary-Bag-52
No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models
u/Similar-Document9690
Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a
u/fuku_visit
OK... I'm talking to someone who is comparing Google maps to AI.......
u/SeriousGeorge2
>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe
u/not_mig
That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy
u/NinjaLanternShark
Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t
u/Etroarl55
Does this mean CS is even more giga cooked now 😭
u/ColdStorageParticle
But still it solved an already solved math problem right? It did not solve something that is not solved yet?
u/michael-65536
Corporations run by horses, and gecko billionaires? Or...
u/ExplorerNo1496
Man I really want to know how they've done it
u/play_yr_part
all of those will be solved when we're all paperclips
u/Lucky_Yam_1581
Will labs release models that can get an IMO gold medal and world’s second best in coding at the same time? If we do get access, what a common folk like me should do with it? 
u/NinjaLanternShark
That's steps. What's the difference between steps and reasoning?
u/GenericFatGuy
The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.
u/CorruptedFlame
Google "breakthrough".
u/fuku_visit
OK... I'm talking to someone who is comparing Google maps to AI.......
u/azhder
To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making
u/play_yr_part
all of those will be solved when we're all paperclips
u/Disastrous-Form-3613
"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"
u/SFanatic
I’ll trust in the power of LLMs when one can make me a 7 pointed star
u/robotlasagna
That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un
u/Daniel1827
What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc
u/CorruptedFlame
Google "breakthrough".
u/[deleted]
[deleted]
u/Alternative-Soil2576
What are you trying to prove?
u/not_mig
That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy
u/robotlasagna
That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un
u/Affectionate-Rain495
what did you expect from r/futurology, people hate technology here 
u/marrow_monkey
It is just predicting the next token /s
u/cwright017
Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house
u/daronjay
Wow, what a collection of new goal posts!
u/Qcconfidential
I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.
u/Lucky_Yam_1581
Will labs release models that can get an IMO gold medal and world’s second best in coding at the same time? If we do get access, what a common folk like me should do with it? 
u/NinjaLanternShark
Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t
u/talligan
Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an
u/GepardenK
>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so
u/charmcharmcharm
I don’t think that’s the comparison that is being made, GenericFatGuy.
u/GenericFatGuy
The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.
u/azhder
Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't
u/wiztard
I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM
u/SupermarketIcy4996
Now if you could explain that to all the people who keep saying it's just a different kind of Google search.
u/NinjaLanternShark
I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d
u/a_brain
Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben
u/marrow_monkey
It is just predicting the next token /s
u/SFanatic
I’ll trust in the power of LLMs when one can make me a 7 pointed star
u/ColdStorageParticle
But still it solved an already solved math problem right? It did not solve something that is not solved yet?
u/lostinspaz
the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps
u/gannex
LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th
u/DrBimboo
I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda
u/michael-65536
Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data
u/hollowgram
How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo
u/fuku_visit
OK... I'm talking to someone who is comparing Google maps to AI.......
u/ColdStorageParticle
But still it solved an already solved math problem right? It did not solve something that is not solved yet?
u/Sad-Reality-9400
How would you define AI?
u/Joke_of_a_Name
Depending on the artists in the future, we're gonna need serious ballad solutions.
u/[deleted]
[deleted]
u/woodenanteater
Now if only your comment didn't ring of AI either...
u/Sad-Reality-9400
Well your definition isn't very useful then is it?
u/ExplorerNo1496
Man I really want to know how they've done it
u/azhder
Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't
u/NinjaLanternShark
Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does
u/ZERV4N
Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.
u/Joke_of_a_Name
Depending on the artists in the future, we're gonna need serious ballad solutions.
u/SleepyCorgiPuppy
Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.
u/fuku_visit
Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?
u/GenericFatGuy
They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c
u/marrow_monkey
It is just predicting the next token /s
u/woodenanteater
Now if only your comment didn't ring of AI either...
u/ExplorerNo1496
Man I really want to know how they've done it
u/play_yr_part
all of those will be solved when we're all paperclips
u/NinjaLanternShark
That's steps. What's the difference between steps and reasoning?
u/cwright017
You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob
u/FreeNumber49
Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just
u/NinjaLanternShark
I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d
u/fuku_visit
Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i
u/GenericFatGuy
The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.
u/cwright017
You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob
u/cwright017
Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house
u/robotlasagna
That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un
u/NinjaLanternShark
Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t
u/GepardenK
>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so
u/kyriosity-at-github
The keyword is "claims" and a kiddish illustration as the state of the things.
u/michael-65536
Corporations run by horses, and gecko billionaires? Or...
u/[deleted]
[deleted]
u/GepardenK
It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr
u/michael-65536
Corporations run by horses, and gecko billionaires? Or...
u/not_mig
That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy
u/daronjay
Wow, what a collection of new goal posts!
u/not_mig
That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy
u/talligan
It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.
u/DrBimboo
I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda
u/ColdStorageParticle
But still it solved an already solved math problem right? It did not solve something that is not solved yet?
u/robotlasagna
That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un
u/ItsAConspiracy
If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is
u/kyriosity-at-github
The keyword is "claims" and a kiddish illustration as the state of the things.
u/MachinationMachine
For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.
u/Lucky_Yam_1581
Reminds me of Ilya’s quote that if you feed an LLM with a detective novel and hide the ending and ask it to guess the ending. If it nails the ending then it understands and not just memorizin
u/TheMadWho
well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there
u/GepardenK
>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so
u/al-Assas
Oh, no. This does sound like a genuine improvement of the neural network itself. Progress should have plateaued out by now. This is not going to end well.
u/xt-89
A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.
u/ExplorerNo1496
Man I really want to know how they've done it
u/fuku_visit
OK... I'm talking to someone who is comparing Google maps to AI.......
u/abyssazaur
You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne
u/ExplorerNo1496
Well how will this change AI practically especially for research
u/CorruptedFlame
Google "breakthrough".
u/NinjaLanternShark
That's steps. What's the difference between steps and reasoning?
u/Mirar
It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.
u/kyriosity-at-github
The keyword is "claims" and a kiddish illustration as the state of the things.
u/ZERV4N
Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the
u/snoee
Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed
u/OriginalCompetitive
It’s also new that this achievement is benchmarked against the smartest young people on the planet. 
u/Alternative-Soil2576
What are you trying to prove?
u/michael-65536
Seemed like that's exactly what you were doing in that first comment, but whatever.
u/a_brain
Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben
u/spryes
Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso
u/ZERV4N
Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.
u/EnlightenedSinTryst
Addressed meaning what?
u/fuku_visit
Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?
u/GenericFatGuy
They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c
u/FreeNumber49
Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just
u/Andy12_
Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o
u/NinjaLanternShark
I'm not interested in convincing you I'm different from an AI, so let's just all it a night.
u/FreeNumber49
Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er
u/azhder
It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav
u/ZERV4N
Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the
u/woodenanteater
Now if only your comment didn't ring of AI either...
u/lostinspaz
the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps
u/michael-65536
Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data
u/ExplorerNo1496
Well how will this change AI practically especially for research
u/Revolutionary-Bag-52
No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models
u/EnlightenedSinTryst
Addressed meaning what?
u/FuturologyBot
The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang
u/Joke_of_a_Name
Depending on the artists in the future, we're gonna need serious ballad solutions.
u/not_mig
As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on
u/wiztard
I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM
u/FreeNumber49
Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er
u/Sad-Reality-9400
Well your definition isn't very useful then is it?
u/ZERV4N
Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the
u/NinjaLanternShark
Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t
u/azhder
To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making
u/michael-65536
Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data
u/michael-65536
Okay, now you've cleared up what you didn't say, (and what I didn't say you said). I take that to mean you're not willing to think about or respond to what I actually did say? Your prerogat
u/NinjaLanternShark
Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does
u/NinjaLanternShark
I'm not interested in convincing you I'm different from an AI, so let's just all it a night.
u/fuku_visit
LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.
u/GepardenK
No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn
u/talligan
Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an
u/robotlasagna
That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un
u/FuturologyBot
The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang
u/NinjaLanternShark
Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t
u/GepardenK
>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so
u/Affectionate-Rain495
It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people
u/robotlasagna
That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un
u/Daniel1827
What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc
u/NinjaLanternShark
I'm not telling anyone I've made a "breakthrough" from who I was last week.
u/GenericFatGuy
The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.
u/fuku_visit
Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?
u/Joke_of_a_Name
Depending on the artists in the future, we're gonna need serious ballad solutions.
u/FreeNumber49
Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just
u/kyriosity-at-github
The keyword is "claims" and a kiddish illustration as the state of the things.
u/Fr00stee
well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct
u/ItsAConspiracy
If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is
u/Sad-Reality-9400
Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?
u/ExplorerNo1496
Well how will this change AI practically especially for research
u/FuturologyBot
The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang
u/ZERV4N
Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the
u/NinjaLanternShark
Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does
u/robotlasagna
The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'
u/lostinspaz
the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps
u/Alternative-Soil2576
What are you trying to prove?
u/gannex
LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th
u/MachinationMachine
For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.
u/Dear-Mix-5841
Yeah buddy, because A.I. uses a capitalized “And” to start sentences.
u/snoee
Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed
u/wiztard
I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM
u/GepardenK
>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so
u/play_yr_part
all of those will be solved when we're all paperclips
u/Andy12_
Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o
u/cwright017
You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob
u/fuku_visit
Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?
u/SeriousGeorge2
>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe
u/lostinspaz
the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps
u/Sad-Reality-9400
How would you define AI?
u/not_mig
As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on
u/DrBimboo
I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda
u/Revolutionary-Bag-52
No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models
u/michael-65536
Okay, now you've cleared up what you didn't say, (and what I didn't say you said). I take that to mean you're not willing to think about or respond to what I actually did say? Your prerogat
u/Disastrous-Form-3613
"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"
u/SupermarketIcy4996
This adult world is so vague. How could we simplify it to infant level.
u/Alternative-Soil2576
What are you trying to prove?
u/FreeNumber49
Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er
u/not_mig
That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy
u/wiztard
I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM
u/Lucky_Yam_1581
Reminds me of Ilya’s quote that if you feed an LLM with a detective novel and hide the ending and ask it to guess the ending. If it nails the ending then it understands and not just memorizin
u/spryes
Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso
u/lostinspaz
the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps
u/azhder
Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't
u/d7sg
We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?
u/lostinspaz
the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps
u/fuku_visit
OK... I'm talking to someone who is comparing Google maps to AI.......
u/talligan
It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.
u/CorruptedFlame
You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin
u/azhder
Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.
u/cwright017
You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob
u/TheMadWho
well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there
u/DrBimboo
I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda
u/FreeNumber49
Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er
u/a_brain
Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben
u/ExplorerNo1496
Man I really want to know how they've done it
u/Dear-Mix-5841
All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm
u/abyssazaur
You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne
u/Disastrous-Form-3613
"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"
u/CorruptedFlame
You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin
u/daronjay
Wow, what a collection of new goal posts!
u/Affectionate-Rain495
It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people
u/Sad-Reality-9400
Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?
u/Qcconfidential
I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.
u/Sad-Reality-9400
Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?
u/talligan
It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.
u/d7sg
We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?
u/Qcconfidential
I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.
u/[deleted]
[deleted]
u/al-Assas
Oh, no. This does sound like a genuine improvement of the neural network itself. Progress should have plateaued out by now. This is not going to end well.
u/ZERV4N
Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.
u/fuku_visit
LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.
u/Sad-Reality-9400
Well your definition isn't very useful then is it?
u/not_mig
As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on
u/DrBimboo
I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda
u/GepardenK
No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn
u/ZERV4N
Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.
u/Daniel1827
Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie
u/robotlasagna
The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'
u/talligan
It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.
u/Daniel1827
Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie
u/talligan
It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.
u/talligan
It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.
u/gannex
LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th
u/d7sg
We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?
u/spryes
Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso
u/Lokon19
I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.
u/EnlightenedSinTryst
Addressed meaning what?
u/ZERV4N
Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.
u/snoee
Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed
u/Mirar
It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.
u/GepardenK
Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s
u/azhder
It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav
u/Dear-Mix-5841
Yeah buddy, because A.I. uses a capitalized “And” to start sentences.
u/FreeNumber49
Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just
u/SupermarketIcy4996
AI denialists sound awfully lot like climate change denialists.
u/robotlasagna
The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'
u/Daniel1827
What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc

Ask AI About This

Get deeper insights about this topic from our AI assistant

Start Chat

Create Your Own

Generate custom insights for your specific needs

Get Started