๐Ÿ”ด High Significance

Model Releases

๐Ÿ”ด F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare โ€” score 95 Sources: huggingface

Reinforcement Learning with Verifiable Rewards (RLVR) is commonly based on group sampling to estimate advantages and stabilize policy updates. In practice, large group sizes are not feasible due to computational limits, which biases learning toward trajectories that are already likely. Smaller group

Developer Tools

๐Ÿ”ด AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders โ€” score 80 Sources: huggingface

Sparse Autoencoders (SAEs) are powerful tools for interpreting neural representations, yet their use in audio remains underexplored. We train SAEs across all encoder layers of Whisper and HuBERT, provide an extensive evaluation of their stability, interpretability, and show their practical utility.

๐Ÿ”ด On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models โ€” score 80 Sources: huggingface

Entropy serves as a critical metric for measuring the diversity of outputs generated by large language models (LLMs), providing valuable insights into their exploration capabilities. While recent studies increasingly focus on monitoring and adjusting entropy to better balance exploration and exploit

๐ŸŸก Notable

Model Releases

๐ŸŸก Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making โ€” score 60 Sources: huggingface

We introduce Baichuan-M3, a medical-enhanced large language model engineered to shift the paradigm from passive question-answering to active, clinical-grade decision support. Addressing the limitations of existing systems in open-ended consultations, Baichuan-M3 utilizes a specialized training pipel

๐ŸŸก Testing ads in ChatGPT โ€” score 50 Sources: lab_blog/OpenAI

OpenAI begins testing ads in ChatGPT to support free access, with clear labeling, answer independence, strong privacy protections, and user control.

๐ŸŸก Bringing ChatGPT to GenAI.mil โ€” score 50 Sources: lab_blog/OpenAI

OpenAI for Government announces the deployment of a custom ChatGPT on GenAI.mil, bringing secure, safety-forward AI to U.S. defense teams.

๐ŸŸก Accelerating Mathematical and Scientific Discovery with Gemini Deep Think โ€” score 50 Sources: lab_blog/DeepMind

Research papers point to the growing impact of Deep Think across fields

Developer Tools

๐ŸŸก OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions โ€” score 60 Sources: huggingface

The rapid advancement of Large Language Models (LLMs) has catalyzed the development of autonomous agents capable of navigating complex environments. However, existing evaluations primarily adopt a deductive paradigm, where agents execute tasks based on explicitly provided rules and static goals, oft

๐ŸŸก Pisets: A Robust Speech Recognition System for Lectures and Interviews โ€” score 45 Sources: huggingface

This work presents a speech-to-text system "Pisets" for scientists and journalists which is based on a three-component architecture aimed at improving speech recognition accuracy while minimizing errors and hallucinations associated with the Whisper model. The architecture comprises primary recognit

๐ŸŸข Incremental

Model Releases

๐ŸŸข MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration โ€” score 25 Sources: huggingface

Training instability remains a critical challenge in large language model (LLM) pretraining, often manifesting as sudden gradient explosions that waste significant computational resources. We study training failures in a 5M-parameter NanoGPT model scaled via ฮผP, identifying two key phenomena precedi

๐ŸŸข Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math โ€” score 5 Sources: huggingface

Recent progress in reasoning models suggests that generating plausible attempts for research-level mathematics may be within reach, but verification remains a bottleneck, consuming scarce expert time. We hypothesize that a meaningful solution should contain enough method-level information that, when

Developer Tools

๐ŸŸข DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos โ€” score 35 Sources: huggingface

Being able to simulate the outcomes of actions in varied environments will revolutionize the development of generalist agents at scale. However, modeling these world dynamics, especially for dexterous robotics tasks, poses significant challenges due to limited data coverage and scarce action labels.

๐ŸŸข Self-Improving World Modelling with Latent Actions โ€” score 15 Sources: huggingface

Internal modelling of the world -- predicting transitions between previous states X and next states Y under actions Z -- is essential to reasoning and planning for LLMs and VLMs. Learning such models typically requires costly action-labelled trajectories. We propose SWIRL, a self-improvement framewo

๐Ÿ“„ New Papers

TitleCategoryScoreLink
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Raremodel_release77Open
AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncodersdeveloper_tool66Open
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Modelsdeveloper_tool66Open
Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Makingmodel_release64Open
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactionsdeveloper_tool64Open
Charting the Future of AI-supported Science Education: A Human-Centered Visioncs.AI0Open
Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoningcs.AI0Open
Physiologically Informed Deep Learning: A Multi-Scale Framework for Next-Generation PBPK Modelingcs.AI0Open
Agent Mars: Multi-Agent Simulation for Multi-Planetary Life Exploration and Settlementcs.AI0Open
Nexus: Inferring Join Graphs from Metadata Alone via Iterative Low-Rank Matrix Completioncs.AI0Open
Large Language Models in Peer-Run Community Behavioral Health Services: Understanding Peer Specialists and Service Users' Perspectives on Opportunities, Risks, and Mitigation Strategiescs.AI0Open
Dreaming in Code for Curriculum Learning in Open-Ended Worldscs.AI0Open
ConceptRM: The Quest to Mitigate Alert Fatigue through Consensus-Based Purity-Driven Data Cleaning for Reflection Modellingcs.AI0Open
DrugR: Optimizing Molecular Drugs through LLM-based Explicit Reasoningcs.AI0Open
RECUR: Resource Exhaustion Attack via Recursive-Entropy Guided Counterfactual Utilization and Reflectioncs.AI0Open

๐Ÿข Lab Blog Posts