AI Breakthrough: LLM reasoning in complex math signals new enterprise applications

Origin Reddit Post

r/futurology

Breakthrough in LLM reasoning on complex math problems

Posted by u/Similar-Document9690•07/20/2025

Wow

Top Comments

u/fuku_visit

Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i

u/GepardenK

No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn

u/robotlasagna

The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'

u/Dear-Mix-5841

Yeah buddy, because A.I. uses a capitalized “And” to start sentences.

u/kyriosity-at-github

The keyword is "claims" and a kiddish illustration as the state of the things.

u/Andy12_

Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o

u/charmcharmcharm

I don’t think that’s the comparison that is being made, GenericFatGuy.

u/NinjaLanternShark

Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t

u/Disastrous-Form-3613

"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"

u/ZERV4N

Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the

u/not_mig

As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on

u/NinjaLanternShark

Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t

u/daronjay

Wow, what a collection of new goal posts!

u/Similar-Document9690

Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a

u/xt-89

A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.

u/GenericFatGuy

The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.

u/talligan

Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an

u/DrBimboo

I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda

u/GepardenK

Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s

u/CorruptedFlame

You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin

u/talligan

It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.

u/Fr00stee

well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct

u/not_mig

That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy

u/talligan

Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an

u/azhder

Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.

u/Qcconfidential

I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.

u/Mirar

It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.

u/GepardenK

It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr

u/xt-89

A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.

u/Joke_of_a_Name

Depending on the artists in the future, we're gonna need serious ballad solutions.

u/not_mig

As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on

u/SeriousGeorge2

>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe

u/SleepyCorgiPuppy

Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.

u/Daniel1827

What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc

u/krefik

It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai

u/ZERV4N

Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.

u/Fr00stee

well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct

u/cwright017

You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob

u/TheMadWho

well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there

u/Etroarl55

Does this mean CS is even more giga cooked now 😭

u/FreeNumber49

She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta

u/Fr00stee

well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct

u/Revolutionary-Bag-52

No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models

u/GenericFatGuy

They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c

u/Dear-Mix-5841

Yeah buddy, because A.I. uses a capitalized “And” to start sentences.

u/FreeNumber49

She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta

u/TheMadWho

well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there

u/Affectionate-Rain495

what did you expect from r/futurology, people hate technology here

u/Affectionate-Rain495

what did you expect from r/futurology, people hate technology here

u/hollowgram

How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo

u/Dear-Mix-5841

Yeah buddy, because A.I. uses a capitalized “And” to start sentences.

u/krefik

It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai

u/michael-65536

Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data

u/ColdStorageParticle

But still it solved an already solved math problem right? It did not solve something that is not solved yet?

u/azhder

Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.

u/SupermarketIcy4996

Now if you could explain that to all the people who keep saying it's just a different kind of Google search.

u/Dear-Mix-5841

All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm

u/SupermarketIcy4996

This adult world is so vague. How could we simplify it to infant level.

u/spryes

Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso

u/ZERV4N

Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.

u/hollowgram

How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo

u/MachinationMachine

How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve.

u/azhder

It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav

u/azhder

Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.

u/NinjaLanternShark

I'm not telling anyone I've made a "breakthrough" from who I was last week.

u/xt-89

A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.

u/Revolutionary-Bag-52

No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models

u/Mirar

It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.

u/Lokon19

I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.

u/GenericFatGuy

The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.

u/ItsAConspiracy

If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is

u/not_mig

As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on

u/hollowgram

How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo

u/Sad-Reality-9400

How would you define AI?

u/ExplorerNo1496

Well how will this change AI practically especially for research

u/al-Assas

Oh, no. This does sound like a genuine improvement of the neural network itself. Progress should have plateaued out by now. This is not going to end well.

u/yblad

Journal paper or it didn't happen. A tweet isn't evidence that something has been done.

u/Lokon19

I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.

u/azhder

It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav

u/CorruptedFlame

Google "breakthrough".

u/xt-89

A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.

u/SeriousGeorge2

>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe

u/d7sg

We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?

u/SupermarketIcy4996

Now if you could explain that to all the people who keep saying it's just a different kind of Google search.

u/Qcconfidential

I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.

u/azhder

To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making

u/michael-65536

Corporations run by horses, and gecko billionaires? Or...

u/azhder

It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav

u/Etroarl55

Does this mean CS is even more giga cooked now 😭

u/yblad

Journal paper or it didn't happen. A tweet isn't evidence that something has been done.

u/krefik

It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai

u/NinjaLanternShark

Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does

u/xt-89

A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.

u/Andy12_

Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o

u/GepardenK

Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s

u/GepardenK

>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so

u/talligan

Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an

u/a_brain

Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben

u/xt-89

A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.

u/marrow_monkey

It is just predicting the next token /s

u/NinjaLanternShark

That's steps. What's the difference between steps and reasoning?

u/xt-89

A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.

u/FreeNumber49

Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er

u/Affectionate-Rain495

It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people

u/azhder

To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making

u/Similar-Document9690

Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a

u/Fr00stee

I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google

u/GenericFatGuy

They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c

u/michael-65536

Corporations run by horses, and gecko billionaires? Or...

u/Javamac8

From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself

u/Affectionate-Rain495

what did you expect from r/futurology, people hate technology here

u/GepardenK

Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s

u/ExplorerNo1496

Man I really want to know how they've done it

u/FreeNumber49

Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er

u/GepardenK

>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so

u/azhder

Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't

u/robotlasagna

The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'

u/yblad

Journal paper or it didn't happen. A tweet isn't evidence that something has been done.

u/GenericFatGuy

They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c

u/SleepyCorgiPuppy

Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.

u/Sad-Reality-9400

Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?

u/d7sg

We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?

u/not_mig

As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on

u/Daniel1827

What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc

u/GepardenK

It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr

u/Lucky_Yam_1581

Will labs release models that can get an IMO gold medal and world’s second best in coding at the same time? If we do get access, what a common folk like me should do with it?

u/spryes

Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso

u/d7sg

We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?

u/SleepyCorgiPuppy

Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.

u/xt-89

A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.

u/ColdStorageParticle

But still it solved an already solved math problem right? It did not solve something that is not solved yet?

u/Javamac8

From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself

u/michael-65536

Corporations run by horses, and gecko billionaires? Or...

u/FuturologyBot

The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang

u/SupermarketIcy4996

This adult world is so vague. How could we simplify it to infant level.

u/NinjaLanternShark

I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d

u/kyriosity-at-github

The keyword is "claims" and a kiddish illustration as the state of the things.

u/Dear-Mix-5841

All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm

u/fuku_visit

Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i

u/michael-65536

Okay, now you've cleared up what you didn't say, (and what I didn't say you said). I take that to mean you're not willing to think about or respond to what I actually did say? Your prerogat

u/Lokon19

I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.

u/FuturologyBot

The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang

u/Alternative-Soil2576

What are you trying to prove?

u/SFanatic

I’ll trust in the power of LLMs when one can make me a 7 pointed star

u/d7sg

We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?

u/SFanatic

I’ll trust in the power of LLMs when one can make me a 7 pointed star

u/GepardenK

It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr

u/Lucky_Yam_1581

Will labs release models that can get an IMO gold medal and world’s second best in coding at the same time? If we do get access, what a common folk like me should do with it?

u/SeriousGeorge2

>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe

u/SupermarketIcy4996

Now if you could explain that to all the people who keep saying it's just a different kind of Google search.

u/Affectionate-Rain495

It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people

u/SupermarketIcy4996

Now if you could explain that to all the people who keep saying it's just a different kind of Google search.

u/EnlightenedSinTryst

Addressed meaning what?

u/marrow_monkey

It is just predicting the next token /s

u/marrow_monkey

It is just predicting the next token /s

u/Similar-Document9690

Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a

u/cwright017

Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house

u/Similar-Document9690

Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a

u/ExplorerNo1496

Well how will this change AI practically especially for research

u/daronjay

Wow, what a collection of new goal posts!

u/Lokon19

I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.

u/Affectionate-Rain495

It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people

u/MachinationMachine

For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.

u/yblad

Journal paper or it didn't happen. A tweet isn't evidence that something has been done.

u/GenericFatGuy

The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.

u/robotlasagna

That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un

u/cwright017

You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob

u/ItsAConspiracy

If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is

u/a_brain

Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben

u/FreeNumber49

She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta

u/ItsAConspiracy

If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is

u/SupermarketIcy4996

This adult world is so vague. How could we simplify it to infant level.

u/ZERV4N

Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the

u/FreeNumber49

Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er

u/[deleted]

[deleted]

u/fuku_visit

Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?

u/NinjaLanternShark

That's steps. What's the difference between steps and reasoning?

u/CorruptedFlame

You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin

u/robotlasagna

The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'

u/DrBimboo

I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda

u/azhder

It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav

u/Fr00stee

well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct

u/snoee

Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed

u/Alternative-Soil2576

What are you trying to prove?

u/daronjay

Wow, what a collection of new goal posts!

u/Similar-Document9690

Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a

u/FreeNumber49

Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er

u/TheMadWho

well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there

u/SupermarketIcy4996

Now if you could explain that to all the people who keep saying it's just a different kind of Google search.

u/talligan

Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an

u/Affectionate-Rain495

what did you expect from r/futurology, people hate technology here

u/NinjaLanternShark

I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d

u/Fr00stee

I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google

u/Affectionate-Rain495

what did you expect from r/futurology, people hate technology here

u/Fr00stee

well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct

u/FreeNumber49

She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta

u/Sad-Reality-9400

Well your definition isn't very useful then is it?

u/Sad-Reality-9400

Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?

u/Fr00stee

well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct

u/woodenanteater

Now if only your comment didn't ring of AI either...

u/FuturologyBot

The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang

u/Dear-Mix-5841

Yeah buddy, because A.I. uses a capitalized “And” to start sentences.

u/NinjaLanternShark

I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d

u/FreeNumber49

Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er

u/yblad

Journal paper or it didn't happen. A tweet isn't evidence that something has been done.

u/Fr00stee

I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google

u/lostinspaz

the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps

u/spryes

Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso

u/ItsAConspiracy

If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is

u/azhder

To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making

u/cwright017

You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob

u/SupermarketIcy4996

AI denialists sound awfully lot like climate change denialists.

u/d7sg

We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?

u/NinjaLanternShark

I'm not telling anyone I've made a "breakthrough" from who I was last week.

u/fuku_visit

LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.

u/Daniel1827

What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc

u/Joke_of_a_Name

Depending on the artists in the future, we're gonna need serious ballad solutions.

u/GenericFatGuy

They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c

u/krefik

It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai

u/FreeNumber49

She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta

u/NinjaLanternShark

I'm not telling anyone I've made a "breakthrough" from who I was last week.

u/fuku_visit

Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i

u/abyssazaur

You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne

u/[deleted]

[deleted]

u/Fr00stee

I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google

u/play_yr_part

all of those will be solved when we're all paperclips

u/Alternative-Soil2576

What are you trying to prove?

u/Similar-Document9690

Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a

u/daronjay

Wow, what a collection of new goal posts!

u/robotlasagna

The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'

u/fuku_visit

LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.

u/abyssazaur

You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne

u/snoee

Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed

u/Affectionate-Rain495

It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people

u/hollowgram

How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo

u/spryes

Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso

u/FreeNumber49

She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta

u/Similar-Document9690

Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a

u/Andy12_

Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o

u/lostinspaz

the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps

u/SFanatic

I’ll trust in the power of LLMs when one can make me a 7 pointed star

u/Daniel1827

Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie

u/DrBimboo

I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda

u/EnlightenedSinTryst

Addressed meaning what?

u/Dear-Mix-5841

All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm

u/Andy12_

Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o

u/cwright017

You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob

u/Mirar

It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.

u/cwright017

Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house

u/NinjaLanternShark

Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t

u/fuku_visit

OK... I'm talking to someone who is comparing Google maps to AI.......

u/Etroarl55

Does this mean CS is even more giga cooked now 😭

u/fuku_visit

OK... I'm talking to someone who is comparing Google maps to AI.......

u/Mirar

It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.

u/Dear-Mix-5841

Yeah buddy, because A.I. uses a capitalized “And” to start sentences.

u/fuku_visit

LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.

u/play_yr_part

all of those will be solved when we're all paperclips

u/NinjaLanternShark

I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d

u/NinjaLanternShark

I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d

u/ExplorerNo1496

Well how will this change AI practically especially for research

u/ExplorerNo1496

Man I really want to know how they've done it

u/Joke_of_a_Name

Depending on the artists in the future, we're gonna need serious ballad solutions.

u/ColdStorageParticle

But still it solved an already solved math problem right? It did not solve something that is not solved yet?

u/Revolutionary-Bag-52

No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models

u/GepardenK

No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn

u/lostinspaz

the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps

u/FreeNumber49

Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er

u/wiztard

I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM

u/fuku_visit

Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i

u/SFanatic

I’ll trust in the power of LLMs when one can make me a 7 pointed star

u/fuku_visit

Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?

u/a_brain

Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben

u/Dear-Mix-5841

All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm

u/FreeNumber49

She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta

u/EnlightenedSinTryst

Addressed meaning what?

u/EnlightenedSinTryst

Addressed meaning what?

u/MachinationMachine

How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve.

u/Lokon19

I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.

u/ZERV4N

Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.

u/Alternative-Soil2576

What are you trying to prove?

u/michael-65536

Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data

u/Sad-Reality-9400

Well your definition isn't very useful then is it?

u/Javamac8

From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself

u/fuku_visit

Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i

u/abyssazaur

You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne

u/NinjaLanternShark

That's steps. What's the difference between steps and reasoning?

u/NinjaLanternShark

I'm not telling anyone I've made a "breakthrough" from who I was last week.

u/FuturologyBot

The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang

u/Lokon19

I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.

u/talligan

Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an

u/not_mig

That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy

u/Andy12_

Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o

u/[deleted]

[deleted]

u/gannex

LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th

u/snoee

Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed

u/ExplorerNo1496

Well how will this change AI practically especially for research

u/talligan

It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.

u/fuku_visit

Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?

u/GepardenK

Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s

u/ExplorerNo1496

Well how will this change AI practically especially for research

u/SFanatic

I’ll trust in the power of LLMs when one can make me a 7 pointed star

u/SeriousGeorge2

>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe

u/azhder

To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making

u/ZERV4N

Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the

u/Daniel1827

Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie

u/ColdStorageParticle

But still it solved an already solved math problem right? It did not solve something that is not solved yet?

u/Sad-Reality-9400

Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?

u/fuku_visit

Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i

u/gannex

LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th

u/robotlasagna

That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un

u/GepardenK

Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s

u/FreeNumber49

Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just

u/Sad-Reality-9400

Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?

u/Joke_of_a_Name

Depending on the artists in the future, we're gonna need serious ballad solutions.

u/marrow_monkey

It is just predicting the next token /s

u/cwright017

Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house

u/ElectronicMoo

But it's not creativity "thinking", and that's what folks are on about. An llm, from word to word, doesn't have the foggiest what it's saying to you. It's a very powerful engine in pattern

u/Dear-Mix-5841

All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm

u/CorruptedFlame

Google "breakthrough".

u/SFanatic

I’ll trust in the power of LLMs when one can make me a 7 pointed star

u/Disastrous-Form-3613

"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"

u/snoee

Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed

u/Daniel1827

Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie

u/MachinationMachine

For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.

u/Fr00stee

I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google

u/wiztard

I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM

u/Fr00stee

I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google

u/gannex

LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th

u/FreeNumber49

She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta

u/wiztard

I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM

u/Javamac8

From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself

u/FreeNumber49

Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just

u/GepardenK

>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so

u/MachinationMachine

How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve.

u/cwright017

You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob

u/NinjaLanternShark

I'm not telling anyone I've made a "breakthrough" from who I was last week.

u/[deleted]

[deleted]

u/FreeNumber49

She-it…all the tech billionaires have to do is address ONE of those things and I’m back on board. Bill Gates is the only one who has managed to do something like this, yet he still gets atta

u/azhder

Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.

u/DrBimboo

I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda

u/Disastrous-Form-3613

"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"

u/Disastrous-Form-3613

"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"

u/Sad-Reality-9400

Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?

u/NinjaLanternShark

I'm not telling anyone I've made a "breakthrough" from who I was last week.

u/NinjaLanternShark

That's steps. What's the difference between steps and reasoning?

u/EnlightenedSinTryst

Addressed meaning what?

u/woodenanteater

Now if only your comment didn't ring of AI either...

u/Joke_of_a_Name

Depending on the artists in the future, we're gonna need serious ballad solutions.

u/ElectronicMoo

But it's not creativity "thinking", and that's what folks are on about. An llm, from word to word, doesn't have the foggiest what it's saying to you. It's a very powerful engine in pattern

u/GepardenK

>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so

u/daronjay

Wow, what a collection of new goal posts!

u/Affectionate-Rain495

It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people

u/play_yr_part

all of those will be solved when we're all paperclips

u/michael-65536

Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data

u/Daniel1827

Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie

u/ZERV4N

Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the

u/EnlightenedSinTryst

Addressed meaning what?

u/azhder

Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't

u/fuku_visit

Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?

u/SupermarketIcy4996

Now if you could explain that to all the people who keep saying it's just a different kind of Google search.

u/Affectionate-Rain495

It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people

u/Sad-Reality-9400

How would you define AI?

u/ElectronicMoo

But it's not creativity "thinking", and that's what folks are on about. An llm, from word to word, doesn't have the foggiest what it's saying to you. It's a very powerful engine in pattern

u/spryes

Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso

u/kyriosity-at-github

The keyword is "claims" and a kiddish illustration as the state of the things.

u/Mirar

It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.

u/Sad-Reality-9400

How would you define AI?

u/Sad-Reality-9400

Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?

u/TheMadWho

well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there

u/azhder

Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.

u/robotlasagna

The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'

u/Affectionate-Rain495

what did you expect from r/futurology, people hate technology here

u/fuku_visit

LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.

u/Sad-Reality-9400

How would you define AI?

u/Disastrous-Form-3613

"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"

u/ItsAConspiracy

If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is

u/Joke_of_a_Name

Depending on the artists in the future, we're gonna need serious ballad solutions.

u/play_yr_part

all of those will be solved when we're all paperclips

u/GepardenK

It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr

u/azhder

To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making

u/Exciting-Position716

It's simply inevitable, whether one likes it or not. I for one see the positives in A.I. Yes it will drastically and radically alter our entire world, our society, entire industries, et

u/hollowgram

How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo

u/Qcconfidential

I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.

u/GenericFatGuy

The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.

u/Fr00stee

I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google

u/CorruptedFlame

Google "breakthrough".

u/SupermarketIcy4996

Now if you could explain that to all the people who keep saying it's just a different kind of Google search.

u/ExplorerNo1496

Well how will this change AI practically especially for research

u/NinjaLanternShark

That's steps. What's the difference between steps and reasoning?

u/Lucky_Yam_1581

Reminds me of Ilya’s quote that if you feed an LLM with a detective novel and hide the ending and ask it to guess the ending. If it nails the ending then it understands and not just memorizin

u/GepardenK

No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn

u/abyssazaur

You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne

u/Dear-Mix-5841

All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm

u/EnlightenedSinTryst

Addressed meaning what?

u/talligan

It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.

u/GenericFatGuy

They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c

u/Sad-Reality-9400

Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?

u/fuku_visit

LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.

u/azhder

Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't

u/SupermarketIcy4996

Now if you could explain that to all the people who keep saying it's just a different kind of Google search.

u/michael-65536

Corporations run by horses, and gecko billionaires? Or...

u/FreeNumber49

Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just

u/michael-65536

Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data

u/Dear-Mix-5841

All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm

u/NinjaLanternShark

Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does

u/Mirar

It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.

u/FreeNumber49

Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just

u/michael-65536

Corporations run by horses, and gecko billionaires? Or...

u/ExplorerNo1496

Man I really want to know how they've done it

u/azhder

Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.

u/Andy12_

Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o

u/krefik

It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai

u/NinjaLanternShark

That's steps. What's the difference between steps and reasoning?

u/Javamac8

From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself

u/Lokon19

I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.

u/GenericFatGuy

Comparing AI to climate change isn't the own you think it is.

u/snoee

Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed

u/Dear-Mix-5841

Yeah buddy, because A.I. uses a capitalized “And” to start sentences.

u/Sad-Reality-9400

How would you define AI?

u/MachinationMachine

How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve.

u/robotlasagna

That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un

u/Sad-Reality-9400

Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?

u/SleepyCorgiPuppy

Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.

u/Dear-Mix-5841

All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm

u/Alternative-Soil2576

What are you trying to prove?

u/NinjaLanternShark

Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does

u/Exciting-Position716

It's simply inevitable, whether one likes it or not. I for one see the positives in A.I. Yes it will drastically and radically alter our entire world, our society, entire industries, et

u/Lokon19

I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.

u/MachinationMachine

How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve.

u/krefik

It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai

u/MachinationMachine

For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.

u/wiztard

I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM

u/gannex

LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th

u/azhder

To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making

u/kyriosity-at-github

The keyword is "claims" and a kiddish illustration as the state of the things.

u/NinjaLanternShark

I'm not telling anyone I've made a "breakthrough" from who I was last week.

u/michael-65536

Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data

u/ElectronicMoo

But it's not creativity "thinking", and that's what folks are on about. An llm, from word to word, doesn't have the foggiest what it's saying to you. It's a very powerful engine in pattern

u/ColdStorageParticle

But still it solved an already solved math problem right? It did not solve something that is not solved yet?

u/GepardenK

It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr

u/krefik

It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai

u/a_brain

Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben

u/gannex

LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th

u/Sad-Reality-9400

How would you define AI?

u/NinjaLanternShark

I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d

u/GepardenK

Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s

u/spryes

Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso

u/Lucky_Yam_1581

Reminds me of Ilya’s quote that if you feed an LLM with a detective novel and hide the ending and ask it to guess the ending. If it nails the ending then it understands and not just memorizin

u/al-Assas

Oh, no. This does sound like a genuine improvement of the neural network itself. Progress should have plateaued out by now. This is not going to end well.

u/Andy12_

Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o

u/ZERV4N

Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the

u/cwright017

Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house

u/GepardenK

It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr

u/Fr00stee

I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google

u/NinjaLanternShark

Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t

u/TheMadWho

well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there

u/Sad-Reality-9400

Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?

u/ColdStorageParticle

But still it solved an already solved math problem right? It did not solve something that is not solved yet?

u/Fr00stee

I mean... the entire point of the LLM is to guess what is the most likely answer for something that isn't in the training set otherwise it's just a worse version of google

u/krefik

It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai

u/Javamac8

From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself

u/kyriosity-at-github

The keyword is "claims" and a kiddish illustration as the state of the things.

u/MachinationMachine

How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve.

u/GenericFatGuy

Comparing AI to climate change isn't the own you think it is.

u/[deleted]

[deleted]

u/fuku_visit

Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i

u/CorruptedFlame

You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin

u/azhder

To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making

u/marrow_monkey

It is just predicting the next token /s

u/GepardenK

No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn

u/NinjaLanternShark

I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d

u/Similar-Document9690

Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a

u/GenericFatGuy

The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.

u/azhder

It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav

u/Fr00stee

well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct

u/Affectionate-Rain495

what did you expect from r/futurology, people hate technology here

u/wiztard

I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM

u/SFanatic

I’ll trust in the power of LLMs when one can make me a 7 pointed star

u/azhder

Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.

u/daronjay

Wow, what a collection of new goal posts!

u/gannex

LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th

u/Fr00stee

well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct

u/NinjaLanternShark

I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d

u/talligan

Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an

u/krefik

It's quite trivial to get rid of all the above. There were multiple books and movies about that solution. In many cases generated by ai

u/Daniel1827

What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc

u/spryes

Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso

u/TheMadWho

well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there

u/Exciting-Position716

It's simply inevitable, whether one likes it or not. I for one see the positives in A.I. Yes it will drastically and radically alter our entire world, our society, entire industries, et

u/hollowgram

How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo

u/not_mig

As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on

u/a_brain

Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben

u/hollowgram

How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo

u/ElectronicMoo

But it's not creativity "thinking", and that's what folks are on about. An llm, from word to word, doesn't have the foggiest what it's saying to you. It's a very powerful engine in pattern

u/marrow_monkey

It is just predicting the next token /s

u/talligan

Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an

u/woodenanteater

Now if only your comment didn't ring of AI either...

u/FuturologyBot

The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang

u/SleepyCorgiPuppy

Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.

u/fuku_visit

LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.

u/abyssazaur

You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne

u/Javamac8

From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself

u/fuku_visit

Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?

u/FuturologyBot

The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang

u/SleepyCorgiPuppy

Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.

u/CorruptedFlame

You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin

u/snoee

Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed

u/NinjaLanternShark

I'm not telling anyone I've made a "breakthrough" from who I was last week.

u/SeriousGeorge2

>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe

u/ExplorerNo1496

Man I really want to know how they've done it

u/SleepyCorgiPuppy

Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.

u/SleepyCorgiPuppy

Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.

u/GepardenK

It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr

u/Sad-Reality-9400

Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?

u/not_mig

As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on

u/hollowgram

How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo

u/GepardenK

It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr

u/michael-65536

Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data

u/azhder

Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.

u/a_brain

Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben

u/NinjaLanternShark

Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does

u/woodenanteater

Now if only your comment didn't ring of AI either...

u/fuku_visit

Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i

u/Sad-Reality-9400

Well your definition isn't very useful then is it?

u/Similar-Document9690

Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a

u/CorruptedFlame

Google "breakthrough".

u/Revolutionary-Bag-52

No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models

u/Exciting-Position716

It's simply inevitable, whether one likes it or not. I for one see the positives in A.I. Yes it will drastically and radically alter our entire world, our society, entire industries, et

u/cwright017

Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house

u/daronjay

Wow, what a collection of new goal posts!

u/Sad-Reality-9400

How would you define AI?

u/fuku_visit

OK... I'm talking to someone who is comparing Google maps to AI.......

u/Mirar

It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.

u/NinjaLanternShark

Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does

u/GepardenK

Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s

u/cwright017

Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house

u/GenericFatGuy

They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c

u/MachinationMachine

For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.

u/marrow_monkey

It is just predicting the next token /s

u/TheMadWho

well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there

u/ZERV4N

Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.

u/Affectionate-Rain495

what did you expect from r/futurology, people hate technology here

u/CorruptedFlame

You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin

u/Disastrous-Form-3613

"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"

u/GepardenK

No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn

u/ElectronicMoo

But it's not creativity "thinking", and that's what folks are on about. An llm, from word to word, doesn't have the foggiest what it's saying to you. It's a very powerful engine in pattern

u/fuku_visit

OK... I'm talking to someone who is comparing Google maps to AI.......

u/SeriousGeorge2

>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe

u/Affectionate-Rain495

It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people

u/NinjaLanternShark

Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does

u/SeriousGeorge2

>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe

u/GepardenK

No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn

u/Sad-Reality-9400

Well your definition isn't very useful then is it?

u/azhder

Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't

u/robotlasagna

That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un

u/Sad-Reality-9400

Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?

u/Qcconfidential

I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.

u/michael-65536

Seemed like that's exactly what you were doing in that first comment, but whatever.

u/Dear-Mix-5841

Yeah buddy, because A.I. uses a capitalized “And” to start sentences.

u/MachinationMachine

For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.

u/Javamac8

From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself

u/woodenanteater

Now if only your comment didn't ring of AI either...

u/azhder

Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.

u/Revolutionary-Bag-52

No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models

u/abyssazaur

You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne

u/play_yr_part

all of those will be solved when we're all paperclips

u/d7sg

We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?

u/GenericFatGuy

They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c

u/cwright017

Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house

u/GepardenK

Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s

u/Javamac8

From what I can gather, prior to this, these types of math problems required the LLM to outsource at least parts of the problem to other tools. Now the capability is baked into the LLM itself

u/robotlasagna

The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'

u/abyssazaur

You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne

u/MachinationMachine

How exactly would you "benchmark hack" the IMO? Every problem is entirely new and unique and requires original reasoning to solve.

u/Lucky_Yam_1581

Reminds me of Ilya’s quote that if you feed an LLM with a detective novel and hide the ending and ask it to guess the ending. If it nails the ending then it understands and not just memorizin

u/FuturologyBot

The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang

u/woodenanteater

Now if only your comment didn't ring of AI either...

u/GepardenK

No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn

u/Lucky_Yam_1581

Will labs release models that can get an IMO gold medal and world’s second best in coding at the same time? If we do get access, what a common folk like me should do with it?

u/Daniel1827

Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie

u/Qcconfidential

I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.

u/Qcconfidential

I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.

u/Revolutionary-Bag-52

No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models

u/Similar-Document9690

Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure language model, solving IMO-level math problems that normally require hours of deep, a

u/fuku_visit

OK... I'm talking to someone who is comparing Google maps to AI.......

u/SeriousGeorge2

>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe

u/not_mig

That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy

u/NinjaLanternShark

Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t

u/Etroarl55

Does this mean CS is even more giga cooked now 😭

u/ColdStorageParticle

But still it solved an already solved math problem right? It did not solve something that is not solved yet?

u/michael-65536

Corporations run by horses, and gecko billionaires? Or...

u/ExplorerNo1496

Man I really want to know how they've done it

u/play_yr_part

all of those will be solved when we're all paperclips

u/Lucky_Yam_1581

Will labs release models that can get an IMO gold medal and world’s second best in coding at the same time? If we do get access, what a common folk like me should do with it?

u/NinjaLanternShark

That's steps. What's the difference between steps and reasoning?

u/GenericFatGuy

The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.

u/CorruptedFlame

Google "breakthrough".

u/fuku_visit

OK... I'm talking to someone who is comparing Google maps to AI.......

u/azhder

To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making

u/play_yr_part

all of those will be solved when we're all paperclips

u/Disastrous-Form-3613

"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"

u/SFanatic

I’ll trust in the power of LLMs when one can make me a 7 pointed star

u/robotlasagna

That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un

u/Daniel1827

What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc

u/CorruptedFlame

Google "breakthrough".

u/[deleted]

[deleted]

u/Alternative-Soil2576

What are you trying to prove?

u/not_mig

That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy

u/robotlasagna

That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un

u/Affectionate-Rain495

what did you expect from r/futurology, people hate technology here

u/marrow_monkey

It is just predicting the next token /s

u/cwright017

Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house

u/daronjay

Wow, what a collection of new goal posts!

u/Qcconfidential

I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.

u/Lucky_Yam_1581

Will labs release models that can get an IMO gold medal and world’s second best in coding at the same time? If we do get access, what a common folk like me should do with it?

u/NinjaLanternShark

Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t

u/talligan

Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an

u/GepardenK

>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so

u/charmcharmcharm

I don’t think that’s the comparison that is being made, GenericFatGuy.

u/GenericFatGuy

The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.

u/azhder

Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't

u/wiztard

I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM

u/SupermarketIcy4996

Now if you could explain that to all the people who keep saying it's just a different kind of Google search.

u/NinjaLanternShark

I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d

u/a_brain

Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben

u/marrow_monkey

It is just predicting the next token /s

u/SFanatic

I’ll trust in the power of LLMs when one can make me a 7 pointed star

u/ColdStorageParticle

But still it solved an already solved math problem right? It did not solve something that is not solved yet?

u/lostinspaz

the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps

u/gannex

LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th

u/DrBimboo

I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda

u/michael-65536

Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data

u/hollowgram

How does this square with this other research showing LLM math reasoning is worse than what has been reported? https://www.reddit.com/r/OpenAI/comments/1m3ovkt/new_research_exposes_how_ai_mo

u/fuku_visit

OK... I'm talking to someone who is comparing Google maps to AI.......

u/ColdStorageParticle

But still it solved an already solved math problem right? It did not solve something that is not solved yet?

u/Sad-Reality-9400

How would you define AI?

u/Joke_of_a_Name

Depending on the artists in the future, we're gonna need serious ballad solutions.

u/[deleted]

[deleted]

u/woodenanteater

Now if only your comment didn't ring of AI either...

u/Sad-Reality-9400

Well your definition isn't very useful then is it?

u/ExplorerNo1496

Man I really want to know how they've done it

u/azhder

Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't

u/NinjaLanternShark

Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does

u/ZERV4N

Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.

u/Joke_of_a_Name

Depending on the artists in the future, we're gonna need serious ballad solutions.

u/SleepyCorgiPuppy

Sadly the root of a lot of these problems are humans themselves. Unless AI just takes over and keep us as pets.

u/fuku_visit

Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?

u/GenericFatGuy

They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c

u/marrow_monkey

It is just predicting the next token /s

u/woodenanteater

Now if only your comment didn't ring of AI either...

u/ExplorerNo1496

Man I really want to know how they've done it

u/play_yr_part

all of those will be solved when we're all paperclips

u/NinjaLanternShark

That's steps. What's the difference between steps and reasoning?

u/cwright017

You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob

u/FreeNumber49

Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just

u/NinjaLanternShark

I feel like terms like thinking, reasoning, creativity, problem solving, original ideas, etc are overused and overly vague for describing AI systems. I'm still not sure what's fundamentally d

u/fuku_visit

Think of it this way.... It can currently provide outputs which meet IMO levels to be considered correct. If you didnt know it was AI you'd think it was very very impressive. I just think i

u/GenericFatGuy

The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.

u/cwright017

You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob

u/cwright017

Well reasoning models can output their reasoning. It doesn’t just spit out the answer, it will detail the steps it takes to getting there. Hey go build me a house, ok well to build a house

u/robotlasagna

That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un

u/NinjaLanternShark

Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t

u/GepardenK

>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so

u/kyriosity-at-github

The keyword is "claims" and a kiddish illustration as the state of the things.

u/michael-65536

Corporations run by horses, and gecko billionaires? Or...

u/[deleted]

[deleted]

u/GepardenK

It doesn't "solve" them in the traditional sense of the word. It is being led to something that is likely to resemble the answer by following the input against the weights provided by its tr

u/michael-65536

Corporations run by horses, and gecko billionaires? Or...

u/not_mig

That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy

u/daronjay

Wow, what a collection of new goal posts!

u/not_mig

That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy

u/talligan

It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.

u/DrBimboo

I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda

u/ColdStorageParticle

But still it solved an already solved math problem right? It did not solve something that is not solved yet?

u/robotlasagna

That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un

u/ItsAConspiracy

If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is

u/kyriosity-at-github

The keyword is "claims" and a kiddish illustration as the state of the things.

u/MachinationMachine

For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.

u/Lucky_Yam_1581

Reminds me of Ilya’s quote that if you feed an LLM with a detective novel and hide the ending and ask it to guess the ending. If it nails the ending then it understands and not just memorizin

u/TheMadWho

well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there

u/GepardenK

>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so

u/al-Assas

Oh, no. This does sound like a genuine improvement of the neural network itself. Progress should have plateaued out by now. This is not going to end well.

u/xt-89

A lot of those papers weren’t focusing on the latest and greatest models for reasoning. Or, they had a definition of reasoning that was unfair in that humans wouldn’t live to that definition.

u/ExplorerNo1496

Man I really want to know how they've done it

u/fuku_visit

OK... I'm talking to someone who is comparing Google maps to AI.......

u/abyssazaur

You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne

u/ExplorerNo1496

Well how will this change AI practically especially for research

u/CorruptedFlame

Google "breakthrough".

u/NinjaLanternShark

That's steps. What's the difference between steps and reasoning?

u/Mirar

It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.

u/kyriosity-at-github

The keyword is "claims" and a kiddish illustration as the state of the things.

u/ZERV4N

Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the

u/snoee

Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed

u/OriginalCompetitive

It’s also new that this achievement is benchmarked against the smartest young people on the planet.

u/Alternative-Soil2576

What are you trying to prove?

u/michael-65536

Seemed like that's exactly what you were doing in that first comment, but whatever.

u/a_brain

Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben

u/spryes

Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso

u/ZERV4N

Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.

u/EnlightenedSinTryst

Addressed meaning what?

u/fuku_visit

Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?

u/GenericFatGuy

They've been looking for a hot new buzzword to take hold for years. 99% of these stories about how revolutionary AI is becoming are written or backed by entities that have a direct stake in c

u/FreeNumber49

Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just

u/Andy12_

Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o

u/NinjaLanternShark

I'm not interested in convincing you I'm different from an AI, so let's just all it a night.

u/FreeNumber49

Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er

u/azhder

It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav

u/ZERV4N

Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the

u/woodenanteater

Now if only your comment didn't ring of AI either...

u/lostinspaz

the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps

u/michael-65536

Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data

u/ExplorerNo1496

Well how will this change AI practically especially for research

u/Revolutionary-Bag-52

No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models

u/EnlightenedSinTryst

Addressed meaning what?

u/FuturologyBot

The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang

u/Joke_of_a_Name

Depending on the artists in the future, we're gonna need serious ballad solutions.

u/not_mig

As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on

u/wiztard

I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM

u/FreeNumber49

Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er

u/Sad-Reality-9400

Well your definition isn't very useful then is it?

u/ZERV4N

Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the

u/NinjaLanternShark

Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t

u/azhder

To make it simple for you: the same way you would AGI. To answer correctly: - artificial means using some artistry i.e. deliberate human made, not something that comes natural like making

u/michael-65536

Sure, you feel that way. But did you think, reason, creatively problem-solve, have original ideas about it etc? Seems like you might have just used a statistical model of your training data

u/michael-65536

Okay, now you've cleared up what you didn't say, (and what I didn't say you said). I take that to mean you're not willing to think about or respond to what I actually did say? Your prerogat

u/NinjaLanternShark

Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does

u/NinjaLanternShark

I'm not interested in convincing you I'm different from an AI, so let's just all it a night.

u/fuku_visit

LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.

u/GepardenK

No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn

u/talligan

Its an irony that a sub about futurology has knee jerk reactions against completely wild tech like AI. It's not that I expect everyone to be pro AI or whatever, but I would expect stronger an

u/robotlasagna

That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un

u/FuturologyBot

The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang

u/NinjaLanternShark

Like I said, *more* right answers than the last version. I know "the answer" isn't in the training set but that's always been the difference between an LLM and a Google search. I'm just t

u/GepardenK

>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so

u/Affectionate-Rain495

It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people

u/robotlasagna

That’s a fair assessment and something like creativity is something humans like attribute to themselves and not LLMs. The problem is creativity is already seen in other animals so it’s not un

u/Daniel1827

What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc

u/NinjaLanternShark

I'm not telling anyone I've made a "breakthrough" from who I was last week.

u/GenericFatGuy

The difference is that now the marketing departments of the AI world have a new tool in their tool belt to fleece investors of their money.

u/fuku_visit

Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?

u/Joke_of_a_Name

Depending on the artists in the future, we're gonna need serious ballad solutions.

u/FreeNumber49

Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just

u/kyriosity-at-github

The keyword is "claims" and a kiddish illustration as the state of the things.

u/Fr00stee

well you would hope that the proof is actually correct the vast majority of the time otherwise it's not useful in real life if the accuracy is like 75/25 correct

u/ItsAConspiracy

If it accomplishes tasks that, for humans, require thinking, reason, creativity, problem solving, or original ideas, then I don't see why we wouldn't use the same terms for whatever the AI is

u/Sad-Reality-9400

Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?

u/ExplorerNo1496

Well how will this change AI practically especially for research

u/FuturologyBot

The following submission statement was provided by /u/Similar-Document9690: --- Submission statement: This wasn’t an AI using tools, plug-ins, or external calculators. This was a pure lang

u/ZERV4N

Yeah, but how exactly does that work? The LLM can do the tools work itself? Has it learned to become like an algorithmic Swiss Army knife using natural language or it's just "predicting" the

u/NinjaLanternShark

Ok, good. So even Google search does this a tiny bit -- if you search for "apple when harvest" it doesn't indiscriminately give you information about when computers are available, and it does

u/robotlasagna

The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'

u/lostinspaz

the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps

u/Alternative-Soil2576

What are you trying to prove?

u/gannex

LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th

u/MachinationMachine

For pure maths I'd wager impressive original research solutions generated primarily by AI will be coming within the next year or two.

u/Dear-Mix-5841

Yeah buddy, because A.I. uses a capitalized “And” to start sentences.

u/snoee

Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed

u/wiztard

I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM

u/GepardenK

>If you didnt know it was AI you'd think it was very very impressive. Yes, I would have been impressed, all the way up until the point I got to know the answer was searched rather than so

u/play_yr_part

all of those will be solved when we're all paperclips

u/Andy12_

Those performance drops were reported on a pair of math benchmarks that are basically "here's a bunch of numbers. We need to solve equation X. The answer is a single number". With that type o

u/cwright017

You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob

u/fuku_visit

Don't you think calling it a glorified search engine is a bit reductionist given it can solve IMO problems?

u/SeriousGeorge2

>I'm still not sure what's fundamentally different here other than "got the right answer more often than before..." The difference is that the model is getting the answers at all. It doe

u/lostinspaz

the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps

u/Sad-Reality-9400

How would you define AI?

u/not_mig

As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on

u/DrBimboo

I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda

u/Revolutionary-Bag-52

No because thats literally what a LLM is, if its goal is not predicting what the next set of wordsmight be we are not talking a LLM, but about different models

u/michael-65536

Okay, now you've cleared up what you didn't say, (and what I didn't say you said). I take that to mean you're not willing to think about or respond to what I actually did say? Your prerogat

u/Disastrous-Form-3613

"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"

u/SupermarketIcy4996

This adult world is so vague. How could we simplify it to infant level.

u/Alternative-Soil2576

What are you trying to prove?

u/FreeNumber49

Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er

u/not_mig

That's fine. Show the exact inputs, show that all it took was training a specific nn topology on an apropriate data set. I doubt they did that. No reason to believe that they didn't go heavy

u/wiztard

I don't disagree with you conclusion but your reasoning doesn't make sense. We are related to all life we know of and it makes sense that we have a lot of similarities with other animals. LLM

u/Lucky_Yam_1581

Reminds me of Ilya’s quote that if you feed an LLM with a detective novel and hide the ending and ask it to guess the ending. If it nails the ending then it understands and not just memorizin

u/spryes

Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso

u/lostinspaz

the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps

u/azhder

Concrete isn''t the opposite of vague. Sorry about that. The definition will remain abstract. If you're trying to make it precise, that's up to you. I have given you my definition. You aren't

u/d7sg

We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?

u/lostinspaz

the only new thing here is that it has been noticed to do this for math. got in deep research mode has been doing this kind of behavior (and spelling out its reasoning and backtracking steps

u/fuku_visit

OK... I'm talking to someone who is comparing Google maps to AI.......

u/talligan

It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.

u/CorruptedFlame

You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin

u/azhder

Will not be surprised it’s the same grifters that could no longer push crypto stuff by muddying the waters that are now pushing the AI that isn’t AI.

u/cwright017

You need to reason to figure out the correct sequence of steps. For example if I say I want 3 lengths of wood at 1m each but they are sold at 1.5m lengths. Without any reasoning of the prob

u/TheMadWho

well if you could use that prove things that haven’t been proved before, it would still be quite useful no matter how it got there

u/DrBimboo

I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda

u/FreeNumber49

Let me know when hunger, crime, disease, climate change, environmental destruction, inequality, racism, sexism, religious extremism, discrimination, homophobia, asteroid avoidance, volcano er

u/a_brain

Because they have offered no information on the methodology nor have they released the model to anyone else to try, it’s impossible to say whether this is actually meaningful or just more ben

u/ExplorerNo1496

Man I really want to know how they've done it

u/Dear-Mix-5841

All I see in the comments are people dismissing this. This is truly revolutionary - especially as it demonstrates its ability to come up with goals and benchmarks in a non-verifiable environm

u/abyssazaur

You know an answer to an IMO problem is a 10 page proof right? And it did make headlines? Ergo not an incremental breakthrough. I literally don't know what else it could take to count as ne

u/Disastrous-Form-3613

"Ugh, I've been telling everybody that LLMs can't reason and here's a proof that they can. How do I downplay this to still look good?"

u/CorruptedFlame

You won't believe it until the code is out? Umm, I hate to break it to you, but the code won't be answering any questions lol. The whole point of deep learning stuff like this is that the fin

u/daronjay

Wow, what a collection of new goal posts!

u/Affectionate-Rain495

It could literally be coming up with novel scientific breakthroughs, but it still wouldn't be "newsworthy" to these people

u/Sad-Reality-9400

Thank you for the explanation. So how are you thinking about "a new way"? What does that mean to you?

u/Qcconfidential

I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.

u/Sad-Reality-9400

Right ..I'm trying to make it more concrete so we're not just waving our hands. How would a larger context be different in kind than what we have now rather than just different in complexity?

u/talligan

It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.

u/d7sg

We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?

u/Qcconfidential

I see more posts about AI on this sub than anything else. If AI is actually our future we are done as a species. Does no one else realize this? The whole thing is insanely cynical.

u/[deleted]

[deleted]

u/al-Assas

Oh, no. This does sound like a genuine improvement of the neural network itself. Progress should have plateaued out by now. This is not going to end well.

u/ZERV4N

Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.

u/fuku_visit

LLMs might share fundamental core aspects of functionality of a search-engine, but they really are not glorified search-engines. That's like saying that a laptop is a glorified AND gate.

u/Sad-Reality-9400

Well your definition isn't very useful then is it?

u/not_mig

As my previous sumbission was taken off due to not meeting a character count minimum I just want to say that I do not believe the claims until the code is out. Too much bootstrapping goes on

u/DrBimboo

I dont think so. We didnt have problems identifying what reasoning is, until some people who are waaaaay overconfident in their understanding of a thing that displays reasoning, had an agenda

u/GepardenK

No, that part would actually be fine. If LLMs really could formulate novel proofs, then who cares if it got it wrong most of the time. You could just check each and discard the ones that didn

u/ZERV4N

Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.

u/Daniel1827

Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie

u/robotlasagna

The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'

u/talligan

It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.

u/Daniel1827

Reliably scoring gold is very impressive, and a lot more impressive than reliably scoring silver. Getting gold on a one off is impressive, but how impressive it is depends on how it was achie

u/talligan

It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.

u/talligan

It's very likely your parents did (hopefully, if you had decent ones) when growing up, however.

u/gannex

LLMs with better mathematical reasoning will be very, very useful. LLMs can be quite helpful for deriving equations, but their limitations tend to show fairly quickly and you have to guide th

u/d7sg

We hear a lot about how good AI is at maths but when will we start to see journal published research of AI based solutions to real problems?

u/spryes

Yes. For that you need to wait for an AI system to solve one of the Millennium Prize problems. This is still fairly groundbreaking for automating labor though because it seems that the reaso

u/Lokon19

I think too many people still have an outdated view of AI. Like when you mention AI they think about what ChatGPT 1 was capable of doing. The newest models have come a long long ways.

u/EnlightenedSinTryst

Addressed meaning what?

u/ZERV4N

Those have been solved. We know how to undo all of that stuff. but rich people would just rather hoard their wealth and great machines to help them do more of it while they kill the poor.

u/snoee

Here you go, friend: https://chatgpt.com/share/687caac7-73d4-800f-b4f1-3a8072e9b6ed

u/Mirar

It's math, though. Not just counting. Basically you have to write a mathematical proof and show your reasoning at this level.

u/GepardenK

Yes, but unless actual calculation on part of the AI was involved, we are still talking about a glorified search engine that takes an input and tries to predict what output we would like to s

u/azhder

It's deliberately vague. If you have an algorithm that re-writes itself, then it's definitely "new way". If you have a large context, far larger than what current LLM's are using, and you hav

u/Dear-Mix-5841

Yeah buddy, because A.I. uses a capitalized “And” to start sentences.

u/FreeNumber49

Except most studies show that humans aren’t responsible, it’s the corporations and billionaires fighting government regulation who are to blame. But of course you knew that already, you just

u/SupermarketIcy4996

AI denialists sound awfully lot like climate change denialists.

u/robotlasagna

The thing i would counter with is: 1. What is creativity? 2. What is your thesis on why creativity must be a uniquely biological thing? Right now the discussion is people was "well LLM'

u/Daniel1827

What does "benchmark hacking" involve here? I find it hard to imagine that there is much that can be done to make IMO problems easier for LLMs. Even if they specifically optimised for IMO, sc

Origin Reddit Post

r/futurology

Breakthrough in LLM reasoning on complex math problems

Top Comments

Ask AI About This

Create Your Own