๐Ÿ”ด High Significance

Developer Tools

๐Ÿ”ด Does Your Reasoning Model Implicitly Know When to Stop Thinking? โ€” score 95 Sources: huggingface

Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-ti

๐Ÿ”ด VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training โ€” score 85 Sources: huggingface

Training stability remains a central challenge in reinforcement learning (RL) for large language models (LLMs). Policy staleness, asynchronous training, and mismatches between training and inference engines all cause the behavior policy to diverge from the current policy, risking training collapse.

๐Ÿ”ด Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control โ€” score 75 Sources: huggingface

Extended reality (XR) demands generative models that respond to users' tracked real-world motion, yet current video world models accept only coarse control signals such as text or keyboard input, limiting their utility for embodied interaction. We introduce a human-centric video world model that is

๐ŸŸก Notable

Model Releases

๐ŸŸก Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers โ€” score 55 Sources: huggingface

Decoding sits between a language model and everything we do with it, yet it is still treated as a heuristic knob-tuning exercise. We argue decoding should be understood as a principled optimisation layer: at each token, we solve a regularised problem over the probability simplex that trades off mode

Developer Tools

๐ŸŸก EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots โ€” score 65 Sources: huggingface

Humans can rearrange objects in cluttered environments using egocentric perception, navigating occlusions without global coordinates. Inspired by this capability, we study long-horizon multi-object non-prehensile rearrangement for mobile robots using a single egocentric camera. We introduce EgoPush,

๐ŸŸก OpenAI announces Frontier Alliance Partners โ€” score 50 Sources: lab_blog/OpenAI

OpenAI announces Frontier Alliance Partners to help enterprises move from AI pilots to production with secure, scalable agent deployments.

Infrastructure & Compute

๐ŸŸก Why we no longer evaluate SWE-bench Verified โ€” score 50 Sources: lab_blog/OpenAI

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.

๐ŸŸก Spanning the Visual Analogy Space with a Weight Basis of LoRAs โ€” score 45 Sources: huggingface

Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet {a, a', b}, the goal is to generate b' such that a : a' :: b : b'. Recent methods adapt text-t

๐ŸŸข Incremental

Developer Tools

๐ŸŸข VidEoMT: Your ViT is Secretly Also a Video Segmentation Model โ€” score 20 Sources: huggingface

Existing online video segmentation models typically combine a per-frame segmenter with complex specialized tracking modules. While effective, these modules introduce significant architectural complexity and computational overhead. Recent studies suggest that plain Vision Transformer (ViT) encoders,

๐ŸŸข SARAH: Spatially Aware Real-time Agentic Humans โ€” score 20 Sources: huggingface

As embodied agents become central to VR, telepresence, and digital human applications, their motion must go beyond speech-aligned gestures: agents should turn toward users, respond to their movement, and maintain natural gaze. Current methods lack this spatial awareness. We close this gap with the f

๐ŸŸข Sink-Aware Pruning for Diffusion Language Models โ€” score 5 Sources: huggingface

Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typically preserve attention sink tokens because AR sinks serve as stable global anchors. We show that thi

Infrastructure & Compute

๐ŸŸข DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning โ€” score 35 Sources: huggingface

Reinforcement Learning with Verifiable Rewards (RLVR) has been shown effective in enhancing the visual reflection and reasoning capabilities of Large Multimodal Models (LMMs). However, existing datasets are predominantly derived from either small-scale manual construction or recombination of prior r

๐Ÿ“„ New Papers

TitleCategoryScoreLink
Does Your Reasoning Model Implicitly Know When to Stop Thinking?developer_tool275Open
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Trainingdeveloper_tool229Open
Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Controldeveloper_tool35Open
EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robotsdeveloper_tool24Open
Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplersmodel_release20Open
Hiding in Plain Text: Detecting Concealed Jailbreaks via Activation Disentanglementcs.AI0Open
Hilbert-Augmented Reinforcement Learning for Scalable Multi-Robot Coverage and Explorationcs.AI0Open
Model Merging in the Essential Subspacecs.AI0Open
Redefining the Down-Sampling Scheme of U-Net for Precision Biomedical Image Segmentationcs.AI0Open
IR$^3$: Contrastive Inverse Reinforcement Learning for Interpretable Detection and Mitigation of Reward Hackingcs.AI0Open
SIGMAS: Second-Order Interaction-based Grouping for Overlapping Multi-Agent Swarmscs.AI0Open
FinSight-Net:A Physics-Aware Decoupled Network with Frequency-Domain Compensation for Underwater Fish Detection in Smart Aquaculturecs.AI0Open
OptiRepair: Closed-Loop Diagnosis and Repair of Supply Chain Optimization Models with LLM Agentscs.AI0Open
When AI Teammates Meet Code Review: Collaboration Signals Shaping the Integration of Agent-Authored Pull Requestscs.AI0Open
Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environmentscs.AI0Open

๐Ÿข Lab Blog Posts