Posted by z3d in ArtInt (edited )

A Chinese lab has created what appears to be one of the most powerful “open” AI models to date.

The model, DeepSeek V3, was developed by the AI firm DeepSeek and was released on Wednesday under a permissive license that allows developers to download and modify it for most applications, including commercial ones.

DeepSeek V3 can handle a range of text-based workloads and tasks, like coding, translating, and writing essays and emails from a descriptive prompt.

According to DeepSeek’s internal benchmark testing, DeepSeek V3 outperforms both downloadable, “openly” available models and “closed” AI models that can only be accessed through an API. In a subset of coding competitions hosted on Codeforces, a platform for programming contests, DeepSeek outperforms other models, including Meta’s Llama 3.1 405B, OpenAI’s GPT-4o, and Alibaba’s Qwen 2.5 72B.

DeepSeek claims that DeepSeek V3 was trained on a dataset of 14.8 trillion tokens. In data science, tokens are used to represent bits of raw data — 1 million tokens is equal to about 750,000 words.

It’s not just the training set that’s massive. DeepSeek V3 is enormous in size: 671 billion parameters, or 685 billion on AI dev platform Hugging Face. (Parameters are the internal variables models use to make predictions or decisions.) That’s around 1.6 times the size of Llama 3.1 405B, which has 405 billion parameters.

1

Comments

You must log in or register to comment.

There's nothing here…