πŸ”΄ High Significance

Model Releases

πŸ”΄ From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models β€” score 85 Sources: huggingface

As Large Multimodal Models (LMMs) scale up and reinforcement learning (RL) methods mature, LMMs have made notable progress in complex reasoning and decision making. Yet training still relies on static data and fixed recipes, making it difficult to diagnose capability blind spots or provide dynamic,

πŸ”΄ MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios β€” score 75 Sources: huggingface

Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is hindered by diverse rout

Developer Tools

πŸ”΄ The Trinity of Consistency as a Defining Principle for General World Models β€” score 95 Sources: huggingface

The construction of World Models capable of learning, simulating, and reasoning about objective physical laws constitutes a foundational challenge in the pursuit of Artificial General Intelligence. Recent advancements represented by video generation models like Sora have demonstrated the potential o

🟑 Notable

Model Releases

🟑 Introducing the Stateful Runtime Environment for Agents in Amazon Bedrock β€” score 50 Sources: lab_blog/OpenAI

Stateful Runtime for Agents in Amazon Bedrock brings persistent orchestration, memory, and secure execution to multi-step AI workflows powered by OpenAI.

Developer Tools

🟑 OmniGAIA: Towards Native Omni-Modal AI Agents β€” score 65 Sources: huggingface

Human intelligence naturally intertwines omni-modal perception -- spanning vision, audio, and language -- with complex reasoning and tool usage to interact with the world. However, current multi-modal LLMs are primarily confined to bi-modal interactions (e.g., vision-language), lacking the unified c

🟑 OpenAI and Amazon announce strategic partnership β€” score 50 Sources: lab_blog/OpenAI

OpenAI and Amazon announce a strategic partnership bringing OpenAI’s Frontier platform to AWS, expanding AI infrastructure, custom models, and enterprise AI agents.

🟑 Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization β€” score 45 Sources: huggingface

Exploration remains the key bottleneck for large language model agents trained with reinforcement learning. While prior methods exploit pretrained knowledge, they fail in environments requiring the discovery of novel states. We propose Exploratory Memory-Augmented On- and Off-Policy Optimization (EM

Infrastructure & Compute

🟑 Scaling AI for everyone β€” score 50 Sources: lab_blog/OpenAI

Today we’re announcing $110B in new investment at a $730B pre money valuation. This includes $30B from SoftBank, $30B from NVIDIA, and $50B from Amazon.

Other Signals

🟑 Imagination Helps Visual Reasoning, But Not Yet in Latent Space β€” score 55 Sources: huggingface

Latent visual reasoning aims to mimic human's imagination process by meditating through hidden states of Multimodal Large Language Models. While recognized as a promising paradigm for visual reasoning, the underlying mechanisms driving its effectiveness remain unclear. Motivated to demystify the tru

🟑 Joint Statement from OpenAI and Microsoft β€” score 50 Sources: lab_blog/OpenAI

Microsoft and OpenAI continue to work closely across research, engineering, and product development, building on years of deep collaboration and shared success.

🟑 An update on our mental health-related work β€” score 50 Sources: lab_blog/OpenAI

OpenAI shares updates on its mental health safety work, including parental controls, trusted contacts, improved distress detection, and recent litigation developments.

🟒 Incremental

Model Releases

🟒 AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning β€” score 35 Sources: huggingface

While Multi-Agent Systems (MAS) excel in complex reasoning, they suffer from the cascading impact of erroneous information generated by individual participants. Current solutions often resort to rigid structural engineering or expensive fine-tuning, limiting their deployability and adaptability. We

Developer Tools

🟒 Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization β€” score 25 Sources: huggingface

Recent deep research agents primarily improve performance by scaling reasoning depth, but this leads to high inference cost and latency in search-intensive scenarios. Moreover, generalization across heterogeneous research settings remains challenging. In this work, we propose Search More, Think Less

🟒 MediX-R1: Open Ended Medical Reinforcement Learning β€” score 15 Sources: huggingface

We introduce MediX-R1, an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes a baseline vision-language backbone with Group Based RL and a compos

Infrastructure & Compute

🟒 VGG-T^3: Offline Feed-Forward 3D Reconstruction at Scale β€” score 5 Sources: huggingface

We present a scalable 3D reconstruction model that addresses a critical limitation in offline feed-forward methods: their computational and memory requirements grow quadratically w.r.t. the number of input images. Our approach is built on the key insight that this bottleneck stems from the varying-l

πŸ“„ New Papers

TitleCategoryScoreLink
The Trinity of Consistency as a Defining Principle for General World Modelsdeveloper_tool206Open
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Modelsmodel_release155Open
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenariosmodel_release111Open
OmniGAIA: Towards Native Omni-Modal AI Agentsdeveloper_tool57Open
Imagination Helps Visual Reasoning, But Not Yet in Latent Spaceother47Open
Flowette: Flow Matching with Graphette Priors for Graph Generationcs.AI0Open
Evidential Neural Radiance Fieldscs.AI0Open
CycleBEV: Regularizing View Transformation Networks via View Cycle Consistency for Bird's-Eye-View Semantic Segmentationcs.AI0Open
Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problemcs.AI0Open
BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentationcs.AI0Open
SDMixer: Sparse Dual-Mixer for Time Series Forecastingcs.AI0Open
Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compressioncs.AI0Open
Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image Captioningcs.AI0Open
Pseudo Contrastive Learning for Diagram Comprehension in Multimodal Modelscs.AI0Open
Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acousticscs.AI0Open

🏒 Lab Blog Posts