Tips for AI developers to look beyond economic chains

Since Openai launched Chatgpt in 2022, AI companies have been locked in a race to build increasingly large models, leading to huge investments in building Data Center. But by the end of last year, There is a rumble The advantage of model scaling is to hit the wall. this Poor performance The idea is further added to the largest model ever, the GPT-4.5.

This situation prompts people’s attention to divert, and the researchers’ goal is to make machines “think” more like humans. Now, instead of building larger models, researchers give them more time to think about problems. In 2023, a team at Google introduced the chain of ideas (COT) technology, where the Big Language Model (LLMS) works step by step through problems.

This approach provides impressive features for next-generation inference models such as Openai’s O3, Google’s Gemini 2.5, Anthropic’s Claude 3.7, and DeepSeek’s R1. Now, with the number of cognitively inspired technologies, AI papers now cite “thinking”, “thinking” and “reasoning.”

“The next revolution for anyone who has been seriously exploring AI research since last spring is not about scale,” said Igor Grossmann, a psychology professor at the University of Waterloo, Canada. “It’s not about size anymore, it’s about how you operate using a knowledge base, how you optimize it to suit different contexts.”

How AI reasoning works

LLM uses statistical probability at its core to predict the next token (the technical name of the text block that the model cooperates with) and in a string of text. But COT technology shows that simply pushing the model to respond with a series of intermediate “inference” steps before significantly improving the answers on mathematical and logical questions.

“It’s surprisingly good at work,” said Kanishk Gandhi, a computer science graduate student at Stanford University. Since then, researchers have designed many extensions of technologies, including “thought trees”, “thought maps”, “thought logic” and “thought iteration”.

Leading model developers also use reinforcement learning to bake the technology into their models, generating COT responses by obtaining the basic model, and then rewarding those that bring the best final answer. Gandhi said that in the process, the model developed various cognitive strategies that reflected how humans solve complex problems, such as breaking them down into simpler tasks and backtracking to correct errors in early inference steps.

But the way these models are trained may cause problems. Michael Saxon, a graduate student at the University of California, Santa Barbara, said. Reinforcement learning requires a way to verify that the response is correct to determine whether the reward is given. This means that the inference model is primarily trained on tasks that are easy to verify, such as math, coding or logic puzzles. As a result, they tend to solve all problems, just as they are complex inference problems, which can lead to overthinking.

In the latest experiments described in the preprint paper, he and colleagues provide a series of intentionally simple tasks for various AI models and show that the tokens used by inference models get the correct answer than traditional LLMS. In some cases, this overthinking even leads to poor performance. Interestingly, the Saxons say the same way as the model you deal with is proven to be very effective. The researchers obtained the model to estimate how many tokens are needed to solve the problem, and then regularly update it during the inference process, with the content involving how many left before the answer needs to be given.

“That’s a regular lesson,” Saxon said. “Even if these models don’t act like humans in many important ways, methods inspired by our cognition may work.”

Where AI reasoning fails

There are still important gaps in the reasoning ability of these models. Martha Lewis, assistant professor at Neurosambolic AI at the University of Amsterdam, recently compared LLMS with human ability to reason by using analogies, which is believed to form the basis of many creative thinking.

Both the model and humans perform well when tested for standard versions similar to inference tests. However, when they obtained new variants of the test, the model performance nose dive compared to humans. The possible explanation is that, Lewis said, problems similar to the standard versions of these tests in the model’s training data, they simply use shallow pattern matching to find solutions rather than reasoning. The test was conducted on OpenAI’s older GPT-3, GPT-3.5 and GPT-4 models, and Lewis said newer inference models may perform better. However, experiments show caution when talking about AI’s cognitive abilities.

“Since models do produce very fluent output, it feels like they are doing more than they can actually do,” Lewis said. “I don’t think we should say that these models are reasoning without really testing the meaning of reasoning in a particular environment.”

Another important area where AI reasoning abilities may be insufficient is the ability to think about the psychological state of others, which is called psychological theory. Several papers show that LLM can solve classical psychological tests of this ability, but researchers at the Allen AI Institute (AI2) suspect that this example performance may be due to the inclusion of the tests in the training dataset.

Therefore, researchers created a new set of realistic-based ideas that each measure the model’s ability to infer someone’s mental state, predict how that state affects their behavior, and judge whether their behavior is justified. For example, the model might be told that someone picks up a pack of chips in a supermarket, but the contents are moldy. Then ask the person if he knows that the chip is moldy, if they will still buy fries, and if that makes sense.

The team found that while models are good at predicting mental states, they are good at predicting behavior and judging rationality. AI2 research scientist Ronan Le Bras suspects this is because these models calculate the possibility of action based on all of their available data, for example, they know someone is unlikely to buy moldy chips. Even if the models can infer someone’s mental state, they don’t seem to take this state into account when predicting their behavior.

However, the researchers found that reminding them of mental state prediction models, or giving them a specific COT prompt to tell them to consider the character’s awareness and greatly improve performance. It is important that the model must use the correct inference pattern for specific problems, said Yuling Gu, a young investigator of AI2. “We hope that this reasoning will be more deeply integrated into these models in the future,” she said.

Can metacognition improve artificial intelligence performance?

Grossmann of the University of Waterloo said that making models flexible across a variety of tasks might require a more fundamental transformation. Last November, he co-authored a paper with leading AI researchers that highlighted the need to immerse models in metacognition, which they described as “the ability to reflect and regulate a person’s thinking process.”

Today’s model is a “professional nonsense generator,” Grossman said, which raises the best guess for any question without the ability to identify or convey its uncertainty. They also have difficulty adapting to specific environments or considering various perspectives, things that come naturally to humans. Grossmann said that using this type of metacognitive ability to provide models will not only improve performance, but will also make it easier to follow its inference process.

He added that doing so would be tricky because it would involve huge efforts to determine training data for certainty or relevance, or to add new modules to models such as evaluating confidence in inference steps. The inference model already uses more computing resources and energy than the standard LLM, and adding these additional training requirements or processing cycles can make the situation worse. “This could turn a lot of small companies out of business,” Grossman said. “And there are environmental costs associated with it.”

Nevertheless, he firmly believes that trying to imitate the cognitive process behind human intelligence is the most obvious way forward, even though most efforts today are very simple. “We don’t know another way of thinking,” he said. “We can only invent things that understand a certain concept.”

This article was updated on May 9, 2025 to correct Igor Grossmann’s quote.

From your website article

What's Hot

AT&T’s new locking option blocks thieves trying to access your account

Steam can now show you that the framework generation has changed your game

Hewlett Packard Enterprise $14B acquisition of Juniper, the judiciary clears after settlement

Tips for AI developers to look beyond economic chains

Unlock performance: Accelerate Pandas operation using Polars

CTGT’s AI platform is built to eliminate bias, hallucination in AI models

See blood clots before the strike

AI-controlled robot shows unstable driving, NHTSA problem Tesla

Estonia’s AI Leap brings chatbots to school

The competition between agents and controls enterprise AI

Review: Xiaomi’s New Mobile with Hi-fi and Home Cinema System

Smart Home Décor : Technology Offers a Slew of Options

Edifier W240TN Earbud Review: Fancy Specs Aren’t Everything

AT&T’s new locking option blocks thieves trying to access your account

Steam can now show you that the framework generation has changed your game

Hewlett Packard Enterprise $14B acquisition of Juniper, the judiciary clears after settlement

Unlock performance: Accelerate Pandas operation using Polars

Our Picks

AT&T’s new locking option blocks thieves trying to access your account

Steam can now show you that the framework generation has changed your game

Hewlett Packard Enterprise $14B acquisition of Juniper, the judiciary clears after settlement

Top Reviews

Review: Xiaomi’s New Mobile with Hi-fi and Home Cinema System

Smart Home Décor : Technology Offers a Slew of Options

Edifier W240TN Earbud Review: Fancy Specs Aren’t Everything

Subscribe to Updates

What's Hot

Tips for AI developers to look beyond economic chains

How AI reasoning works

Where AI reasoning fails

Can metacognition improve artificial intelligence performance?

Related Posts