On November 11, 1918, the First World War ended with the signing of an armistice by Germany with the Allies. It was signed in Compiègne, France, and all fighting stopped at 11 AM on the 11th day of the 11th month, and this day is known as Armistice Day. Fast forward to 1 September 1939, when Germany invaded Poland, which marked the commencement of the 2nd World War. From the End of WWI to the start of WWII, it was 20 years, 9 months, 21 days,
DeepSeek launched its flagship product, DeepSeek v3, on 28 January 2025, surpassing the accolades won by OpenAI with its ChatGPT 4-o.
In recent days, several notable AI language models (LLMs) have been released, each introducing advancements over their predecessors. Below is a summary of these launches:
Over the past week, several notable AI language models (LLMs) have been released, each introducing advancements over previous iterations. Below is a summary of these launches:
Date | Model | Company | Key Improvements |
---|---|---|---|
January 27, 2025 | Janus-Pro | DeepSeek | Janus-Pro is an open-source multimodal AI model designed to handle both text and images. It features decoupled visual encoding for improved understanding and generation tasks, and an enhanced training process that includes extended learning on visual data and refined integration of text and visuals. These improvements enable Janus-Pro to outperform models like OpenAI’s DALL-E 3 in various benchmarks. businessinsider.com |
January 28, 2025 | Qwen2.5-1M | Alibaba | Qwen2.5-1M is an AI model capable of handling longer inputs, making it suitable for applications with high memory demands. This development allows the model to be deployed for AI agent applications requiring extended context handling. ft.com |
January 29, 2025 | Qwen 2.5-Max | Alibaba | Qwen 2.5-Max is an advanced version of Alibaba’s AI model, released unusually during the Lunar New Year. Alibaba claims that this model surpasses DeepSeek’s V3 across various benchmarks, indicating significant improvements in performance and capabilities. reuters.com |
January 30, 2025 | Tülu 3 | Allen Institute for AI (Ai2) | Tülu 3 is a fully open-source model that matches the capabilities of OpenAI’s GPT-4o and surpasses DeepSeek’s V3 in certain benchmarks. It introduces an advanced post-training methodology combining supervised fine-tuning, preference learning, and a novel reinforcement learning approach, enhancing its performance in complex reasoning tasks. venturebeat.com |
These releases reflect rapid advancements in AI language models, with each company introducing unique improvements to enhance performance, scalability, and applicability across various tasks.
These developments highlight the rapid progression in AI research, with companies striving to enhance performance, efficiency, and accessibility in their language models. Of course, they are not done after others published their LLMs; rather, it is a continuous effort by the teams who worked relentlessly. Some could be inspired politically – well, there must be a reason why Trump asked to meet with the founder of Nvidia. The question is when to pop the Champagne (the launch). The last (as of the time I wrote this article) were, not was, by the Chinese(s) , DeepSeek and Qwen-Alibaba (https://youtu.be/-t9tFsLlg04 ). Even OpenAi’s Sam Altman has to admit “OpenAI must learn from DeepSeek”.
Alibaba has recently unveiled its latest AI model, Qwen 2.5-Max, which it claims surpasses leading models such as DeepSeek V3, OpenAI’s GPT-4o, and Meta’s Llama-3.1-405B in various benchmarks. This model is designed to handle complex tasks across multiple domains, including natural language processing, coding, and reasoning. Notably, Qwen 2.5-Max employs a Mixture-of-Experts (MoE) architecture, activating only relevant parts of the model for specific tasks, thereby enhancing efficiency and scalability.
The release of Qwen 2.5-Max signifies a significant advancement in AI capabilities, particularly in its open-source accessibility, allowing developers to fine-tune and adapt the model for various applications. This development is expected to intensify competition in the AI industry, prompting other companies to accelerate their AI initiatives. The open-source nature of Qwen 2.5-Max also facilitates transparency and collaboration within the AI community, potentially leading to further innovations and applications across diverse sectors.
Following on Qwen’s path, Ai2’s Tülu 3 405B, a massive open-source AI model, has outperformed DeepSeek V3, GPT-4o, and Llama 3.1 405B on key benchmarks like PopQA, GSM8K, and MATH, proving that open models can rival top proprietary systems. Trained using 256 GPUs in parallel, Tülu 3 405B leverages advanced reinforcement learning techniques like RLVR to enhance accuracy in math, reasoning, and instruction-following.
What is the next?
Dr. Mak Wai Keong
No responses yet