๐ด High Significance
Model Releases
๐ด HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning โ score 85
Sources: huggingface
VLMs show strong multimodal capabilities, but they still struggle with fine-grained vision-language reasoning. We find that long CoT reasoning exposes diverse failure modes, including perception, reasoning, knowledge, and hallucination errors, which can compound across intermediate steps. However, m
Developer Tools
๐ด Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models โ score 95
Sources: huggingface
Distilled autoregressive (AR) video models enable efficient streaming generation but frequently misalign with human visual preferences. Existing reinforcement learning (RL) frameworks are not naturally suited to these architectures, typically requiring either expensive re-distillation or solver-coup
๐ด TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation โ score 75
Sources: huggingface
Vision-language models (VLMs) have shown promise in earth observation (EO), yet they struggle with tasks that require grounding complex spatial reasoning in precise pixel-level visual representations. To address this problem, we introduce TerraScope, a unified VLM that delivers pixel-grounded geospa
๐ก Notable
Model Releases
๐ก ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models โ score 55
Sources: huggingface
Effective collaboration begins with knowing when to ask for help. For example, when trying to identify an occluded object, a human would ask someone to remove the obstruction. Can MLLMs exhibit a similar "proactive" behavior by requesting simple user interventions? To investigate this, we introduce
Developer Tools
๐ก Hyperagents โ score 65
Sources: huggingface
Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and problem-solving processes. Existing approaches to self-improvement rely on fixed, handcrafted meta-level mechanisms, fundamentally limiting how fast such systems can improve. The Darwi
๐ก Creating with Sora Safely โ score 50
Sources: lab_blog/OpenAI
To address the novel safety challenges posed by a state-of-the-art video model as well as a new social creation platform, weโve built Sora 2 and the Sora app with safety at the foundation. Our approach is anchored in concrete protections.
๐ก The Y-Combinator for LLMs: Solving Long-Context Rot with ฮป-Calculus โ score 45
Sources: huggingface
LLMs are increasingly used as general-purpose reasoners, but long inputs remain bottlenecked by a fixed context window. Recursive Language Models (RLMs) address this by externalising the prompt and recursively solving subproblems. Yet existing RLMs depend on an open-ended read-eval-print loop (REPL)
๐ข Incremental
Model Releases
๐ข A Subgoal-driven Framework for Improving Long-Horizon LLM Agents โ score 5
Sources: huggingface
Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions, making it particularly cha
Developer Tools
๐ข FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow โ score 35
Sources: huggingface
Scene generation has extensive industrial applications, demanding both high realism and precise control over geometry and appearance. Language-driven retrieval methods compose plausible scenes from a large object database, but overlook object-level control and often fail to enforce scene-level style
๐ข LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation โ score 20
Sources: huggingface
Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained control over both foreground and background elements. However, precise face-attribute alignment across subjects remains challenging, as existing methods
๐ข Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck โ score 20
Sources: huggingface
Chain-of-Thought (CoT) prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing "Budget Forcing" methods reducing cost via fine-tuning with heuristic length penalties, suppress both essential reasoning and redundant filler. We recast efficient rea
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models | developer_tool | 116 | Open |
| HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning | model_release | 114 | Open |
| TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation | developer_tool | 56 | Open |
| Hyperagents | developer_tool | 55 | Open |
| ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models | model_release | 45 | Open |
| When Documents Disagree: Measuring Institutional Variation in Transplant Guidance with Retrieval-Augmented Language Models | cs.AI | 0 | Open |
| DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignment | cs.AI | 0 | Open |
| Beyond Correlation: Refutation-Validated Aspect-Based Sentiment Analysis for Explainable Energy Market Returns | cs.AI | 0 | Open |
| Unified-MAS: Universally Generating Domain-Specific Nodes for Empowering Automatic Multi-Agent Systems | cs.AI | 0 | Open |
| Alignment as Institutional Design: From Behavioral Correction to Transaction Structure in Intelligent Systems | cs.AI | 0 | Open |
| Effective Strategies for Asynchronous Software Engineering Agents | cs.AI | 0 | Open |
| RuntimeSlicer: Towards Generalizable Unified Runtime State Representation for Failure Management | cs.AI | 0 | Open |
| A Framework for Closed-Loop Robotic Assembly, Alignment and Self-Recovery of Precision Optical Systems | cs.AI | 0 | Open |
| Implicit Humanization in Everyday LLM Moral Judgments | cs.AI | 0 | Open |
| Quotient Geometry, Effective Curvature, and Implicit Bias in Simple Shallow Neural Networks | cs.AI | 0 | Open |
๐ข Lab Blog Posts
- OpenAI: Creating with Sora Safely