Article Scraper

Back to Articles

DeepSeek: A Small Chinese Company Is Shaking Up AI Titans

The New Daily

SKIPPED

Details

URL: https://www.thenewdaily.com.au/life/science/2025/01/28/deepseek-explainer
Date Published: 28 Jan 2025
Priority Score: 3
Australian: No
Created: 8 Mar 2025, 02:41 pm

Authors (1)

Tongliang Liu
ENRICHED

Description

Small Chinese tech company DeepSeek has shaken the artificial intelligence insustry. Here's how it did it and what that means.

Summary

The article highlights the achievements of DeepSeek, a Chinese AI company that has released highly efficient AI models capable of competing with U.S. giants like OpenAI and Anthropic. Despite using significantly fewer resources, DeepSeek's models have shown impressive performance, notably its V3 and R1 models, which excel in problem-solving and reasoning tasks. These advancements have created a significant impact in the AI industry, prompting a reassessment of market values and highlighting the potential for more democratized AI research and usage through their open-source licensing. The implications for AI safety and governance are not directly addressed, although the discussion of more efficient models could indirectly contribute to accessibility and innovation in AI safety research.

Body

AdvertisementScienceWorldDeepSeek: how a small Chinese AI company is shaking up US tech heavyweightsTongliang LiuJan 28, 2025, updatedJan 29, 2025ShareThe federal government has banned the use of DeepSeek services on government devices.Chinese artificial intelligence (AI) company DeepSeek has sentshockwaves through the tech community, with the release of extremely efficient AI models that can compete with cutting-edge products from US companies such as OpenAI and Anthropic.Founded in 2023, DeepSeek hasachieved its resultswith a fraction of the cash and computing power of its competitors.DeepSeek’s “reasoning” R1 model, released last week, provoked excitement among researchers, shock among investors, and responses from AI heavyweights. The company followed up on January 28 witha modelthat can work with images as well as text.So what has DeepSeek done, and how did it do it?What DeepSeek didIn December, DeepSeek released itsV3 model. This is a very powerful “standard” large language model that performs at a similar level to OpenAI’s GPT-4o and Anthropic’s Claude 3.5.While these models are prone to errors andsometimes make up their own facts, they can carry out tasks such as answering questions, writing essays and generating computer code. Onsome testsof problem-solving and mathematical reasoning, they score better than the average human.V3 was trained at areported costof about US$5.58 million ($9 million). This is dramatically cheaper than GPT-4, for example, which costmore than US$100 million($160 million) to develop.DeepSeek also claims to have trained V3 using around 2,000 specialised computer chips, specificallyH800 GPUs made by NVIDIA. This is again much fewer than other companies, which may have usedup to 16,000of the more powerful H100 chips.On January 20, DeepSeek released another model,called R1. This is a so-called “reasoning” model, which tries to work through complex problems step by step. These models seem to be better at many tasks that require context and have multiple interrelated parts, such as reading comprehension and strategic planning.The R1 model is a tweaked version of V3, modified with a technique called reinforcement learning. R1 appears to work at a similar level toOpenAI’s o1, released last year.DeepSeek also used the same technique to make “reasoning” versions of small open-source models that can run on home computers.This release has sparked a huge surge of interest in DeepSeek, driving up the popularity of itsV3-powered chatbot appand triggering amassive price crashin tech stocks as investors re-evaluate the AI industry. At the time of writing, chipmaker NVIDIAhas lost around US$600 billionin value.How DeepSeek did itDeepSeek’s breakthroughs have been in achieving greater efficiency: getting good results with fewer resources. In particular, DeepSeek’s developers have pioneered two techniques that may be adopted by AI researchers more broadly.The first has to do with a mathematical idea called “sparsity”. AI models have a lot of parameters that determine their responses to inputs (V3 has around 671 billion), but only a small fraction of these parameters is used for any given input.However, predicting which parameters will be needed isn’t easy. DeepSeek used a new technique to do this, and then trained only those parameters. As a result, its models needed far less training than a conventional approach.The other trick has to do with how V3 stores information in computer memory. DeepSeek has found a clever way to compress the relevant data, so it is easier to store and access quickly.What it meansDeepSeek’s models and techniques have been released under the freeMIT License, which means anyone can download and modify them.While this may be bad news for some AI companies – whose profits might be eroded by the existence of freely available, powerful models – it is great news for the broader AI research community.At present, a lot of AI research requires access to enormous amounts of computing resources. Researchers like myself who are based at universities (or anywhere except large tech companies) have had limited ability to carry out tests and experiments.More efficient models and techniques change the situation. Experimentation and development may now be significantly easier for us.For consumers, access to AI may also become cheaper. More AI models may be run on users’ own devices, such as laptops or phones, rather than running “in the cloud” for a subscription fee.For researchers who already have a lot of resources, more efficiency may have less of an effect. It is unclear whether DeepSeek’s approach will help to make models with better performance overall, or simply models that are more efficient.Tongliang Liu, Associate Professor of Machine Learning and Director of the Sydney AI Centre,University of SydneyThis article is republished fromThe Conversationunder a Creative Commons license. Read theoriginal article.Topics:Artificial Intelligence,ChinaShareFollow The New DailyAdvertisementMore Science>ScienceExplained: The renewables boom within our reachScience'Woolly mice' a first step to a mammoth returnEnvironmentAldi slammed as worst grocer at cutting plastic useScienceEmus aren’t as ‘dumb’ as we thoughtEnvironmentGlobal coal demand holds – but not for AustraliaScienceOff-switch found for known driver of prostate cancerOpinionFinally, answers to kids' climate-change questionsEnvironmentMan ‘swallowed’ by whale in eerily similar encounterEnvironmentFive reasons Australia needs to break up with gas