China's DeepSeek launches next-gen AI model. Here's what makes it different

3 weeks ago 7

Anna Barclay | Getty Images News | Getty Images

Chinese startup DeepSeek's latest experimental model promises to increase efficiency and improve AI's ability to handle a lot of information at a fraction of the cost, but questions remain over how effective and safe the architecture is.  

DeepSeek sent Silicon Valley into a frenzy when it launched its first model R1 out of nowhere last year, showing that it's possible to train large language models (LLMs) quickly, on less powerful chips, using fewer resources.

The company released DeepSeek-V3.2-Exp on Monday, an experimental version of its current model DeepSeek-V3.1-Terminus, which builds further on its mission to increase efficiency in AI systems, according to a post on the AI forum Hugging Face.

"DeepSeek V3.2 continues the focus on efficiency, cost reduction, and open-source sharing," Adina Yakefu, Chinese community lead at Hugging Face, told CNBC. "The big improvement is a new feature called DSA (DeepSeek Sparse Attention), which makes the AI better at handling long documents and conversations. It also cuts the cost of running the AI in half compared to the previous version."

"It's significant because it should make the model faster and more cost-effective to use without a noticeable drop in performance," said Nick Patience, vice president and practice lead for AI at The Futurum Group. "This makes powerful AI more accessible to developers, researchers, and smaller companies, potentially leading to a wave of new and innovative applications."

The pros and cons of sparse attention 

An AI model makes decisions based on its training data and new information, such as a prompt. Say an airline wants to find the best route from A to B, while there are many options, not all are feasible. By filtering out the less viable routes, you dramatically reduce the amount of time, fuel and, ultimately, money, needed to make the journey. That is exactly sparse attention does, it only factors in data that it thinks is important given the task at hand, as opposed to other models thus far which have crunched all data in the model.

"So basically, you cut out things that you think are not important," said Ekaterina Almasque, the cofounder and managing partner of new venture capital fund BlankPage Capital.

Sparse attention is a boon for efficiency and the ability to scale AI given fewer resources are needed, but one concern is that it could lead to a drop in how reliable models are due to the lack of oversight in how and why it discounts information.

"The reality is, they [sparse attention models] have lost a lot of nuances," said Almasque, who was an early supporter of Dataiku and Darktrace, and an investor in Graphcore. "And then the real question is, did they have the right mechanism to exclude not important data, or is there a mechanism excluding really important data, and then the outcome will be much less relevant?"

This could be particularly problematic for AI safety and inclusivity, the investor noted, adding that it may not be "the optimal one or the safest" AI model to use compared with competitors or traditional architectures. 

DeepSeek, however, says the experimental model works on par with its V3.1-Terminus. Despite speculation of a bubble forming, AI remains at the centre of geopolitical competition with the U.S. and China vying for the winning spot. Yakefu noted that DeepSeek's models work "right out of the box" with Chinese-made AI chips, such as Ascend and Cambricon, meaning they can run locally on domestic hardware without any extra setup.

Deepseek trains breakthrough R1 model at a fraction of US costs

DeepSeek also shared the actual programming code and tools needed to use the experimental model, she said. "This means other people can learn from it and build their own improvements."

But for Almasque, the very nature of this means the tech may not be defensible. "The approach is not super new," she said, noting the industry has been "talking about sparse models since 2015" and that DeepSeek is not able to patent its technology due to being open source. DeepSeek's competitive edge, therefore, must lie in how it decides what information to include, she added.

The company itself acknowledges V3.2-Exp is an "intermediate step toward our next-generation architecture," per the Hugging Face post.

As Patience pointed out, "this is DeepSeek's value prop all over: efficiency is becoming as important as raw power."

"DeepSeek is playing the long game to keep the community invested in their progress," Yakefu added. "People will always go for what is cheap, reliable, and effective."

Read Entire Article






<