π΄ High Significance
Model Releases
π΄ From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models β score 85
Sources: huggingface
As Large Multimodal Models (LMMs) scale up and reinforcement learning (RL) methods mature, LMMs have made notable progress in complex reasoning and decision making. Yet training still relies on static data and fixed recipes, making it difficult to diagnose capability blind spots or provide dynamic,
π΄ MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios β score 75
Sources: huggingface
Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is hindered by diverse rout
Developer Tools
π΄ The Trinity of Consistency as a Defining Principle for General World Models β score 95
Sources: huggingface
The construction of World Models capable of learning, simulating, and reasoning about objective physical laws constitutes a foundational challenge in the pursuit of Artificial General Intelligence. Recent advancements represented by video generation models like Sora have demonstrated the potential o
π‘ Notable
Model Releases
π‘ Introducing the Stateful Runtime Environment for Agents in Amazon Bedrock β score 50
Sources: lab_blog/OpenAI
Stateful Runtime for Agents in Amazon Bedrock brings persistent orchestration, memory, and secure execution to multi-step AI workflows powered by OpenAI.
Developer Tools
π‘ OmniGAIA: Towards Native Omni-Modal AI Agents β score 65
Sources: huggingface
Human intelligence naturally intertwines omni-modal perception -- spanning vision, audio, and language -- with complex reasoning and tool usage to interact with the world. However, current multi-modal LLMs are primarily confined to bi-modal interactions (e.g., vision-language), lacking the unified c
π‘ OpenAI and Amazon announce strategic partnership β score 50
Sources: lab_blog/OpenAI
OpenAI and Amazon announce a strategic partnership bringing OpenAIβs Frontier platform to AWS, expanding AI infrastructure, custom models, and enterprise AI agents.
π‘ Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization β score 45
Sources: huggingface
Exploration remains the key bottleneck for large language model agents trained with reinforcement learning. While prior methods exploit pretrained knowledge, they fail in environments requiring the discovery of novel states. We propose Exploratory Memory-Augmented On- and Off-Policy Optimization (EM
Infrastructure & Compute
π‘ Scaling AI for everyone β score 50
Sources: lab_blog/OpenAI
Today weβre announcing $110B in new investment at a $730B pre money valuation. This includes $30B from SoftBank, $30B from NVIDIA, and $50B from Amazon.
Other Signals
π‘ Imagination Helps Visual Reasoning, But Not Yet in Latent Space β score 55
Sources: huggingface
Latent visual reasoning aims to mimic human's imagination process by meditating through hidden states of Multimodal Large Language Models. While recognized as a promising paradigm for visual reasoning, the underlying mechanisms driving its effectiveness remain unclear. Motivated to demystify the tru
π‘ Joint Statement from OpenAI and Microsoft β score 50
Sources: lab_blog/OpenAI
Microsoft and OpenAI continue to work closely across research, engineering, and product development, building on years of deep collaboration and shared success.
π‘ An update on our mental health-related work β score 50
Sources: lab_blog/OpenAI
OpenAI shares updates on its mental health safety work, including parental controls, trusted contacts, improved distress detection, and recent litigation developments.
π’ Incremental
Model Releases
π’ AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning β score 35
Sources: huggingface
While Multi-Agent Systems (MAS) excel in complex reasoning, they suffer from the cascading impact of erroneous information generated by individual participants. Current solutions often resort to rigid structural engineering or expensive fine-tuning, limiting their deployability and adaptability. We
Developer Tools
π’ Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization β score 25
Sources: huggingface
Recent deep research agents primarily improve performance by scaling reasoning depth, but this leads to high inference cost and latency in search-intensive scenarios. Moreover, generalization across heterogeneous research settings remains challenging. In this work, we propose Search More, Think Less
π’ MediX-R1: Open Ended Medical Reinforcement Learning β score 15
Sources: huggingface
We introduce MediX-R1, an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes a baseline vision-language backbone with Group Based RL and a compos
Infrastructure & Compute
π’ VGG-T^3: Offline Feed-Forward 3D Reconstruction at Scale β score 5
Sources: huggingface
We present a scalable 3D reconstruction model that addresses a critical limitation in offline feed-forward methods: their computational and memory requirements grow quadratically w.r.t. the number of input images. Our approach is built on the key insight that this bottleneck stems from the varying-l
π New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| The Trinity of Consistency as a Defining Principle for General World Models | developer_tool | 206 | Open |
| From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models | model_release | 155 | Open |
| MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios | model_release | 111 | Open |
| OmniGAIA: Towards Native Omni-Modal AI Agents | developer_tool | 57 | Open |
| Imagination Helps Visual Reasoning, But Not Yet in Latent Space | other | 47 | Open |
| Flowette: Flow Matching with Graphette Priors for Graph Generation | cs.AI | 0 | Open |
| Evidential Neural Radiance Fields | cs.AI | 0 | Open |
| CycleBEV: Regularizing View Transformation Networks via View Cycle Consistency for Bird's-Eye-View Semantic Segmentation | cs.AI | 0 | Open |
| Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem | cs.AI | 0 | Open |
| BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation | cs.AI | 0 | Open |
| SDMixer: Sparse Dual-Mixer for Time Series Forecasting | cs.AI | 0 | Open |
| Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression | cs.AI | 0 | Open |
| Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image Captioning | cs.AI | 0 | Open |
| Pseudo Contrastive Learning for Diagram Comprehension in Multimodal Models | cs.AI | 0 | Open |
| Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics | cs.AI | 0 | Open |