Within just a week, more than 2 million users have joined the waiting list to access China’s Manus general AI agent . Manus AI is being touted as the “second DeepSeek moment” coming from China. This is when the agent is in closed beta, and can only be accessed via an invitation.

The frenzy has taken hold, with many calling Manus AI a “breakthrough” and a fitting response to OpenAI’s Deep Research agent , especially as China continues to deliver new AI innovations at a cheaper rate . However, the hype surrounding Manus is blown out of proportion, partly because of AI influencers who are making big claims on social media platforms. Here’s why I think Manus AI is a promising start, but not a breakthrough.

Why Manus AI is Not a Breakthrough

The reason DeepSeek was a breakthrough is that it finally replicated OpenAI’s RL method to deliver performance along the lines of o-series reasoning models . Moreover, the DeepSeek team did it on a shoestring budget compared to OpenAI’s training costs. Later on, DeepSeek introduced and open-sourced its GRPO training method , which helped other labs to train frontier-class reasoning models.

These were all fresh innovations, and the DeepSeek team from China achieved them despite the GPU constraints imposed by the US. On the other hand, the Manus general AI agent has clubbed together Anthropic’s Claude 3.5 Sonnet model, and several fine-tuned Qwen models, and relies on the Browser Use open-source project.

manus ai agent homepage - 1

Image Credit: Manus AI via YouTube

While better integration and tooling remain advantages, the real breakthrough lies in pioneering frontier-class models optimized for agentic tasks. Anthropic’s Claude 3.5 Sonnet is one of the best AI models for agentic tasks and coding. In fact, the team behind Manus is internally testing the new Claude 3.7 Sonnet unified model and finds it ‘promising’.

Basically, building capable AI models is still the moat, and it will continue to be in the near future. That said, the Manus AI team must be commended for chaining a lot of tools and environments to complete a task. As I said above, it’s a promising start toward an agentic future.

Manus AI Agent Stumbles

We do not have access to Manus AI, but some X users got early access and they have shared their experiences. Derya Unutmaz, a Biomedical scientist, shared the results on X after running Manus and OpenAI’s Deep Research agent side by side.

He found that Deep Research completed the task in 15 minutes, but Manus ran for 50 minutes and failed to complete the task. He also stated that Manus doesn’t reference sources like Deep Research.

Deep Research finished in under 15 minutes. Unfortunately, Manus AI failed after 50 minutes at step 18/20! 😑 It was performing quite well-I was watching Manus’ output & it seemed excellent. However, running the same prompt a second time is a bit frustrating as it takes too long! https://t.co/bGtmOI65CP — Derya Unutmaz, MD (@DeryaTR_) March 8, 2025

Similarly, X user teortaxesTex tried the Manus agent and said it’s better at regurgitating stuff like LLMs than performing agentic tasks. Another X user, TheXeophon, also shared his findings after using the Manus agent, which completely failed to mention Nintendo Switch after researching the gaming console market.

In fact, the viral video showing that Manus AI agent automating 50 tasks, turned out to be fake. Yichao ‘Peak’ Ji, the chief scientist of Manus, said, “this video is definitely NOT Manus” with a laughing emoji.

Despite initial stumbles, we must remember that Manus is still in its closed beta phase, and writing it off would be premature. However, it’s equally important to be measured while trying out new AI products. Manus may not be a breakthrough, but it’s an ambitious start in the right direction.

As AI models continue to get better at agentic tasks, new products built on top of them will also see improvement. The Manus AI team has already stated that the agent will be improved significantly before a wider public release. Now, whether it lives up to the hype remains to be seen, but it’s surely a notable development to watch out for.

Manus AI Is Not China’s Second DeepSeek Moment; See Beyond the Hype - 2

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Add new comment

Name

Email ID

Δ

Author - 3 Author - 4 Author - 5 Author - 6 Author - 7 Author - 8 Author - 9 openai releases gpt-4.5 ai model to chatgpt pro users - 10
  • OpenAI has finally unveiled the GPT-4.5 model and it’s rolling out to ChatGPT Pro subscribers. ChatGPT Plus users will get GPT-4.5 next week.
  • It’s not a frontier model, and doesn’t outperform o-series reasoning models, but delivers better performance than GPT-4o.
  • OpenAI says GPT-4.5 has a thoughtful personality and excels at creative writing. It also exhibits fewer hallucinations.

OpenAI introduced GPT-4o, a non-reasoning model to ChatGPT users back in May 2024. Finally, over 10 months later, the hot AI startup has unveiled its next-generation GPT-4.5 AI model, codenamed ‘Orion’ today. GPT-4.5 is the last non-reasoning model from OpenAI as the upcoming GPT-5 will merge the o3 reasoning model to create a unified AI system.

OpenAI says GPT-4.5 is the “largest and most knowledgeable language model” developed by the company so far, but it’s not a frontier model. It’s designed to be more general-purpose than STEM-focused o-series reasoning models.

It means that GPT-4.5 excels at creative writing, natural conversation, practical problem-solving, and offers a broader knowledge base. Note that it’s a multimodal model so it can process images and files too.

What is interesting is that GPT-4.5 exhibits fewer hallucinations than GPT-4o . Its hallucination rate dropped to 37.1% from GPT-4o’s 61.8%. And GPT-4.5’s accuracy improved to 62.5% from GPT-4o’s 38.2%. Apart from that, early testers say that GPT-4.5 is “warm, intuitive, and natural” during conversations.

gpt-4.5 hallucination rate - 11

Image Credit: OpenAI

As for benchmarks, GPT-4.5 outperforms GPT-4o in MMLU across 14 languages. Next, in SWE-bench Verified which evaluates the ability to solve real-world software issues, GPT-4.5 achieves 38% while GPT-4o gets 30.7%. That said, it performs worse than the o1, o3, and o3-mini reasoning models.

In the new SWE-Lancer benchmark developed by OpenAI which evaluates performance on real-world, economically valuable software engineering tasks, GPT-4.5 solved 32.6% of the tasks, compared to GPT-4o’s 23.3%. In GPQA (Science), GPT-4.5 scored 71.4% while GPT-4o got 53.6%.

gpt-4.5 benchmark scores - 12

Image Credit: OpenAI

About availability, GPT-4.5 is rolling out to ChatGPT Pro users starting today. And OpenAI says starting next week, GPT-4.5 will be available to ChatGPT Plus, Team, and Edu users.

All in all, it appears scaling LLMs via pre-training has hit a wall , and that’s why OpenAI says GPT-4.5 will be the last non-reasoning model. In the benchmark numbers, it’s clear that o-series reasoning models perform exceptionally well, even on older base models.

Nevertheless, in every aspect, GPT-4.5 performs better than GPT-4o while being 10x more efficient. It has a refined personality, produces superior writing, and has a broader world knowledge. Now, anticipation builds for the unified GPT-5 AI system which will integrate the o3 reasoning model. It’s likely to be released in May this year.

Manus AI Is Not China’s Second DeepSeek Moment; See Beyond the Hype - 13

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Add new comment

Name

Email ID

Δ