๐ด High Significance
Model Releases
๐ด F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare โ score 95
Sources: huggingface
Reinforcement Learning with Verifiable Rewards (RLVR) is commonly based on group sampling to estimate advantages and stabilize policy updates. In practice, large group sizes are not feasible due to computational limits, which biases learning toward trajectories that are already likely. Smaller group
Developer Tools
๐ด AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders โ score 80
Sources: huggingface
Sparse Autoencoders (SAEs) are powerful tools for interpreting neural representations, yet their use in audio remains underexplored. We train SAEs across all encoder layers of Whisper and HuBERT, provide an extensive evaluation of their stability, interpretability, and show their practical utility.
๐ด On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models โ score 80
Sources: huggingface
Entropy serves as a critical metric for measuring the diversity of outputs generated by large language models (LLMs), providing valuable insights into their exploration capabilities. While recent studies increasingly focus on monitoring and adjusting entropy to better balance exploration and exploit
๐ก Notable
Model Releases
๐ก Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making โ score 60
Sources: huggingface
We introduce Baichuan-M3, a medical-enhanced large language model engineered to shift the paradigm from passive question-answering to active, clinical-grade decision support. Addressing the limitations of existing systems in open-ended consultations, Baichuan-M3 utilizes a specialized training pipel
๐ก Testing ads in ChatGPT โ score 50
Sources: lab_blog/OpenAI
OpenAI begins testing ads in ChatGPT to support free access, with clear labeling, answer independence, strong privacy protections, and user control.
๐ก Bringing ChatGPT to GenAI.mil โ score 50
Sources: lab_blog/OpenAI
OpenAI for Government announces the deployment of a custom ChatGPT on GenAI.mil, bringing secure, safety-forward AI to U.S. defense teams.
๐ก Accelerating Mathematical and Scientific Discovery with Gemini Deep Think โ score 50
Sources: lab_blog/DeepMind
Research papers point to the growing impact of Deep Think across fields
Developer Tools
๐ก OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions โ score 60
Sources: huggingface
The rapid advancement of Large Language Models (LLMs) has catalyzed the development of autonomous agents capable of navigating complex environments. However, existing evaluations primarily adopt a deductive paradigm, where agents execute tasks based on explicitly provided rules and static goals, oft
๐ก Pisets: A Robust Speech Recognition System for Lectures and Interviews โ score 45
Sources: huggingface
This work presents a speech-to-text system "Pisets" for scientists and journalists which is based on a three-component architecture aimed at improving speech recognition accuracy while minimizing errors and hallucinations associated with the Whisper model. The architecture comprises primary recognit
๐ข Incremental
Model Releases
๐ข MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration โ score 25
Sources: huggingface
Training instability remains a critical challenge in large language model (LLM) pretraining, often manifesting as sudden gradient explosions that waste significant computational resources. We study training failures in a 5M-parameter NanoGPT model scaled via ฮผP, identifying two key phenomena precedi
๐ข Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math โ score 5
Sources: huggingface
Recent progress in reasoning models suggests that generating plausible attempts for research-level mathematics may be within reach, but verification remains a bottleneck, consuming scarce expert time. We hypothesize that a meaningful solution should contain enough method-level information that, when
Developer Tools
๐ข DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos โ score 35
Sources: huggingface
Being able to simulate the outcomes of actions in varied environments will revolutionize the development of generalist agents at scale. However, modeling these world dynamics, especially for dexterous robotics tasks, poses significant challenges due to limited data coverage and scarce action labels.
๐ข Self-Improving World Modelling with Latent Actions โ score 15
Sources: huggingface
Internal modelling of the world -- predicting transitions between previous states X and next states Y under actions Z -- is essential to reasoning and planning for LLMs and VLMs. Learning such models typically requires costly action-labelled trajectories. We propose SWIRL, a self-improvement framewo
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare | model_release | 77 | Open |
| AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders | developer_tool | 66 | Open |
| On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models | developer_tool | 66 | Open |
| Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making | model_release | 64 | Open |
| OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions | developer_tool | 64 | Open |
| Charting the Future of AI-supported Science Education: A Human-Centered Vision | cs.AI | 0 | Open |
| Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning | cs.AI | 0 | Open |
| Physiologically Informed Deep Learning: A Multi-Scale Framework for Next-Generation PBPK Modeling | cs.AI | 0 | Open |
| Agent Mars: Multi-Agent Simulation for Multi-Planetary Life Exploration and Settlement | cs.AI | 0 | Open |
| Nexus: Inferring Join Graphs from Metadata Alone via Iterative Low-Rank Matrix Completion | cs.AI | 0 | Open |
| Large Language Models in Peer-Run Community Behavioral Health Services: Understanding Peer Specialists and Service Users' Perspectives on Opportunities, Risks, and Mitigation Strategies | cs.AI | 0 | Open |
| Dreaming in Code for Curriculum Learning in Open-Ended Worlds | cs.AI | 0 | Open |
| ConceptRM: The Quest to Mitigate Alert Fatigue through Consensus-Based Purity-Driven Data Cleaning for Reflection Modelling | cs.AI | 0 | Open |
| DrugR: Optimizing Molecular Drugs through LLM-based Explicit Reasoning | cs.AI | 0 | Open |
| RECUR: Resource Exhaustion Attack via Recursive-Entropy Guided Counterfactual Utilization and Reflection | cs.AI | 0 | Open |