Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    The cracks in the OpenAI-Microsoft relationship are reportedly widening

    June 17, 2025

    Minnesota Shooting Suspect Allegedly Used Data Broker Sites to Find Targets’ Addresses

    June 17, 2025

    How Apple Created a Custom iPhone Camera for ‘F1’

    June 17, 2025
    Facebook X (Twitter) Instagram
    AI News First
    Trending
    • The cracks in the OpenAI-Microsoft relationship are reportedly widening
    • Minnesota Shooting Suspect Allegedly Used Data Broker Sites to Find Targets’ Addresses
    • How Apple Created a Custom iPhone Camera for ‘F1’
    • How to Use ClickUp: Full ClickUp Tutorial
    • How to Fight Like a ‘Ballerina’
    • Frosteam All-in-One Facial Spa with a Facial Steamer, Ice Bath, and Aromatherapy Diffuser in One » Gadget Flow
    • Acefast Acefit Air Review: Sleek Style, Solid Substance
    • A New Obesity Pill May Burn Fat Without Suppressing Appetite
    • Home
    • AI News
    • AI Apps

      How to Use ClickUp: Full ClickUp Tutorial

      June 16, 2025

      What Is A Postgraduate Degree?

      June 15, 2025

      What is Answer Engine Optimization (AEO)

      June 15, 2025

      Types of Project Management: Methodologies and Examples

      June 14, 2025

      40+ Quality Assurance Manager Interview Questions and Answers

      June 13, 2025
    • Tech News
    • AI Smart Tech
    AI News First
    Home » Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase
    AI News 0

    Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase

    0March 7, 2025
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    The Qwen team at Alibaba has unveiled QwQ-32B, a 32 billion parameter AI model that demonstrates performance rivalling the much larger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Learning (RL) on robust foundation models.

    The Qwen team have successfully integrated agent capabilities into the reasoning model, enabling it to think critically, utilise tools, and adapt its reasoning based on environmental feedback.

    “Scaling RL has the potential to enhance model performance beyond conventional pretraining and post-training methods,” the team stated. “Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models.”

    QwQ-32B achieves performance comparable to DeepSeek-R1, which boasts 671 billion parameters (with 37 billion activated), a testament to the effectiveness of RL when applied to robust foundation models pretrained on extensive world knowledge. This remarkable outcome underscores the potential of RL to bridge the gap between model size and performance.

    The model has been evaluated across a range of benchmarks, including AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL, designed to assess its mathematical reasoning, coding proficiency, and general problem-solving capabilities.

    The results highlight QwQ-32B’s performance in comparison to other leading models, including DeepSeek-R1-Distilled-Qwen-32B, DeepSeek-R1-Distilled-Llama-70B, o1-mini, and the original DeepSeek-R1.

    Benchmark results:

    • AIME24: QwQ-32B achieved 79.5, slightly behind DeepSeek-R1-6718’s 79.8, but significantly ahead of OpenAl-o1-mini’s 63.6 and the distilled models.
    • LiveCodeBench: QwQ-32B scored 63.4, again closely matched by DeepSeek-R1-6718’s 65.9, and surpassing the distilled models and OpenAl-o1-mini’s 53.8.
    • LiveBench: QwQ-32B achieved 73.1, with DeepSeek-R1-6718 scoring 71.6, and outperforming the distilled models and OpenAl-o1-mini’s 57.5.
    • IFEval: QwQ-32B scored 83.9, very close to DeepSeek-R1-6718’s 83.3, and leading the distilled models and OpenAl-o1-mini’s 59.1.
    • BFCL: QwQ-32B achieved 66.4, with DeepSeek-R1-6718 scoring 62.8, demonstrating a lead over the distilled models and OpenAl-o1-mini’s 49.3.

    The Qwen team’s approach involved a cold-start checkpoint and a multi-stage RL process driven by outcome-based rewards. The initial stage focused on scaling RL for math and coding tasks, utilising accuracy verifiers and code execution servers. The second stage expanded to general capabilities, incorporating rewards from general reward models and rule-based verifiers.

    “We find that this stage of RL training with a small amount of steps can increase the performance of other general capabilities, such as instruction following, alignment with human preference, and agent performance, without significant performance drop in math and coding,” the team explained.

    QwQ-32B is open-weight and available on Hugging Face and ModelScope under the Apache 2.0 license, and is also accessible via Qwen Chat. The Qwen team views this as an initial step in scaling RL to enhance reasoning capabilities and aims to further explore the integration of agents with RL for long-horizon reasoning.

    “As we work towards developing the next generation of Qwen, we are confident that combining stronger foundation models with RL powered by scaled computational resources will propel us closer to achieving Artificial General Intelligence (AGI),” the team stated.

    See also: Deepgram Nova-3 Medical: AI speech model cuts healthcare transcription errors

    Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

    Explore other upcoming enterprise technology events and webinars powered by TechForge here.

    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    The cracks in the OpenAI-Microsoft relationship are reportedly widening

    June 17, 2025

    Spiraling with ChatGPT | TechCrunch

    June 16, 2025

    The AI execution gap: Why 80% of projects don’t reach production

    June 15, 2025
    Add A Comment

    Comments are closed.

    Editors Picks
    Top Reviews
    Advertisement
    Demo
    Facebook X (Twitter) Instagram Pinterest Vimeo YouTube
    • Home
    • Privacy Policy
    • About Us
    • Contact Us
    • Disclaimer
    © 2025 AI News First

    Type above and press Enter to search. Press Esc to cancel.