๐ด High Significance
Developer Tools
๐ด Does Your Reasoning Model Implicitly Know When to Stop Thinking? โ score 95
Sources: huggingface
Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-ti
๐ด VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training โ score 85
Sources: huggingface
Training stability remains a central challenge in reinforcement learning (RL) for large language models (LLMs). Policy staleness, asynchronous training, and mismatches between training and inference engines all cause the behavior policy to diverge from the current policy, risking training collapse.
๐ด Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control โ score 75
Sources: huggingface
Extended reality (XR) demands generative models that respond to users' tracked real-world motion, yet current video world models accept only coarse control signals such as text or keyboard input, limiting their utility for embodied interaction. We introduce a human-centric video world model that is
๐ก Notable
Model Releases
๐ก Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers โ score 55
Sources: huggingface
Decoding sits between a language model and everything we do with it, yet it is still treated as a heuristic knob-tuning exercise. We argue decoding should be understood as a principled optimisation layer: at each token, we solve a regularised problem over the probability simplex that trades off mode
Developer Tools
๐ก EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots โ score 65
Sources: huggingface
Humans can rearrange objects in cluttered environments using egocentric perception, navigating occlusions without global coordinates. Inspired by this capability, we study long-horizon multi-object non-prehensile rearrangement for mobile robots using a single egocentric camera. We introduce EgoPush,
๐ก OpenAI announces Frontier Alliance Partners โ score 50
Sources: lab_blog/OpenAI
OpenAI announces Frontier Alliance Partners to help enterprises move from AI pilots to production with secure, scalable agent deployments.
Infrastructure & Compute
๐ก Why we no longer evaluate SWE-bench Verified โ score 50
Sources: lab_blog/OpenAI
SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.
๐ก Spanning the Visual Analogy Space with a Weight Basis of LoRAs โ score 45
Sources: huggingface
Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet {a, a', b}, the goal is to generate b' such that a : a' :: b : b'. Recent methods adapt text-t
๐ข Incremental
Developer Tools
๐ข VidEoMT: Your ViT is Secretly Also a Video Segmentation Model โ score 20
Sources: huggingface
Existing online video segmentation models typically combine a per-frame segmenter with complex specialized tracking modules. While effective, these modules introduce significant architectural complexity and computational overhead. Recent studies suggest that plain Vision Transformer (ViT) encoders,
๐ข SARAH: Spatially Aware Real-time Agentic Humans โ score 20
Sources: huggingface
As embodied agents become central to VR, telepresence, and digital human applications, their motion must go beyond speech-aligned gestures: agents should turn toward users, respond to their movement, and maintain natural gaze. Current methods lack this spatial awareness. We close this gap with the f
๐ข Sink-Aware Pruning for Diffusion Language Models โ score 5
Sources: huggingface
Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typically preserve attention sink tokens because AR sinks serve as stable global anchors. We show that thi
Infrastructure & Compute
๐ข DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning โ score 35
Sources: huggingface
Reinforcement Learning with Verifiable Rewards (RLVR) has been shown effective in enhancing the visual reflection and reasoning capabilities of Large Multimodal Models (LMMs). However, existing datasets are predominantly derived from either small-scale manual construction or recombination of prior r
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Does Your Reasoning Model Implicitly Know When to Stop Thinking? | developer_tool | 275 | Open |
| VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training | developer_tool | 229 | Open |
| Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control | developer_tool | 35 | Open |
| EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots | developer_tool | 24 | Open |
| Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers | model_release | 20 | Open |
| Hiding in Plain Text: Detecting Concealed Jailbreaks via Activation Disentanglement | cs.AI | 0 | Open |
| Hilbert-Augmented Reinforcement Learning for Scalable Multi-Robot Coverage and Exploration | cs.AI | 0 | Open |
| Model Merging in the Essential Subspace | cs.AI | 0 | Open |
| Redefining the Down-Sampling Scheme of U-Net for Precision Biomedical Image Segmentation | cs.AI | 0 | Open |
| IR$^3$: Contrastive Inverse Reinforcement Learning for Interpretable Detection and Mitigation of Reward Hacking | cs.AI | 0 | Open |
| SIGMAS: Second-Order Interaction-based Grouping for Overlapping Multi-Agent Swarms | cs.AI | 0 | Open |
| FinSight-Net:A Physics-Aware Decoupled Network with Frequency-Domain Compensation for Underwater Fish Detection in Smart Aquaculture | cs.AI | 0 | Open |
| OptiRepair: Closed-Loop Diagnosis and Repair of Supply Chain Optimization Models with LLM Agents | cs.AI | 0 | Open |
| When AI Teammates Meet Code Review: Collaboration Signals Shaping the Integration of Agent-Authored Pull Requests | cs.AI | 0 | Open |
| Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments | cs.AI | 0 | Open |