๐Ÿ”ด High Significance

Model Releases

๐Ÿ”ด HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning โ€” score 85 Sources: huggingface

VLMs show strong multimodal capabilities, but they still struggle with fine-grained vision-language reasoning. We find that long CoT reasoning exposes diverse failure modes, including perception, reasoning, knowledge, and hallucination errors, which can compound across intermediate steps. However, m

Developer Tools

๐Ÿ”ด Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models โ€” score 95 Sources: huggingface

Distilled autoregressive (AR) video models enable efficient streaming generation but frequently misalign with human visual preferences. Existing reinforcement learning (RL) frameworks are not naturally suited to these architectures, typically requiring either expensive re-distillation or solver-coup

๐Ÿ”ด TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation โ€” score 75 Sources: huggingface

Vision-language models (VLMs) have shown promise in earth observation (EO), yet they struggle with tasks that require grounding complex spatial reasoning in precise pixel-level visual representations. To address this problem, we introduce TerraScope, a unified VLM that delivers pixel-grounded geospa

๐ŸŸก Notable

Model Releases

๐ŸŸก ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models โ€” score 55 Sources: huggingface

Effective collaboration begins with knowing when to ask for help. For example, when trying to identify an occluded object, a human would ask someone to remove the obstruction. Can MLLMs exhibit a similar "proactive" behavior by requesting simple user interventions? To investigate this, we introduce

Developer Tools

๐ŸŸก Hyperagents โ€” score 65 Sources: huggingface

Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and problem-solving processes. Existing approaches to self-improvement rely on fixed, handcrafted meta-level mechanisms, fundamentally limiting how fast such systems can improve. The Darwi

๐ŸŸก Creating with Sora Safely โ€” score 50 Sources: lab_blog/OpenAI

To address the novel safety challenges posed by a state-of-the-art video model as well as a new social creation platform, weโ€™ve built Sora 2 and the Sora app with safety at the foundation. Our approach is anchored in concrete protections.

๐ŸŸก The Y-Combinator for LLMs: Solving Long-Context Rot with ฮป-Calculus โ€” score 45 Sources: huggingface

LLMs are increasingly used as general-purpose reasoners, but long inputs remain bottlenecked by a fixed context window. Recursive Language Models (RLMs) address this by externalising the prompt and recursively solving subproblems. Yet existing RLMs depend on an open-ended read-eval-print loop (REPL)

๐ŸŸข Incremental

Model Releases

๐ŸŸข A Subgoal-driven Framework for Improving Long-Horizon LLM Agents โ€” score 5 Sources: huggingface

Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions, making it particularly cha

Developer Tools

๐ŸŸข FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow โ€” score 35 Sources: huggingface

Scene generation has extensive industrial applications, demanding both high realism and precise control over geometry and appearance. Language-driven retrieval methods compose plausible scenes from a large object database, but overlook object-level control and often fail to enforce scene-level style

๐ŸŸข LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation โ€” score 20 Sources: huggingface

Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained control over both foreground and background elements. However, precise face-attribute alignment across subjects remains challenging, as existing methods

๐ŸŸข Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck โ€” score 20 Sources: huggingface

Chain-of-Thought (CoT) prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing "Budget Forcing" methods reducing cost via fine-tuning with heuristic length penalties, suppress both essential reasoning and redundant filler. We recast efficient rea

๐Ÿ“„ New Papers

TitleCategoryScoreLink
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Modelsdeveloper_tool116Open
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoningmodel_release114Open
TerraScope: Pixel-Grounded Visual Reasoning for Earth Observationdeveloper_tool56Open
Hyperagentsdeveloper_tool55Open
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Modelsmodel_release45Open
When Documents Disagree: Measuring Institutional Variation in Transplant Guidance with Retrieval-Augmented Language Modelscs.AI0Open
DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignmentcs.AI0Open
Beyond Correlation: Refutation-Validated Aspect-Based Sentiment Analysis for Explainable Energy Market Returnscs.AI0Open
Unified-MAS: Universally Generating Domain-Specific Nodes for Empowering Automatic Multi-Agent Systemscs.AI0Open
Alignment as Institutional Design: From Behavioral Correction to Transaction Structure in Intelligent Systemscs.AI0Open
Effective Strategies for Asynchronous Software Engineering Agentscs.AI0Open
RuntimeSlicer: Towards Generalizable Unified Runtime State Representation for Failure Managementcs.AI0Open
A Framework for Closed-Loop Robotic Assembly, Alignment and Self-Recovery of Precision Optical Systemscs.AI0Open
Implicit Humanization in Everyday LLM Moral Judgmentscs.AI0Open
Quotient Geometry, Effective Curvature, and Implicit Bias in Simple Shallow Neural Networkscs.AI0Open

๐Ÿข Lab Blog Posts