π΄ High Significance
Model Releases
π΄ Video-CoE: Reinforcing Video Event Prediction via Chain of Events β score 75
Sources: huggingface
Despite advances in the application of MLLMs for various video tasks, video event prediction (VEP) remains relatively underexplored. VEP requires the model to perform fine-grained temporal modeling of videos and establish logical relationships between videos and future events, which current MLLMs st
Developer Tools
π΄ Efficient Reasoning with Balanced Thinking β score 95
Sources: huggingface
Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to ineffic
π΄ MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild β score 85
Sources: huggingface
Large language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for continuous service and the necessity of updating capabilities to match shifting task distributions. O
π‘ Notable
Developer Tools
π‘ MosaicMem: Hybrid Spatial Memory for Controllable Video World Models β score 65
Sources: huggingface
Video diffusion models are moving beyond short, plausible clips toward world simulators that must remain consistent under camera motion, revisits, and intervention. Yet spatial memory remains a key bottleneck: explicit 3D structures can improve reprojection-based consistency but struggle to depict m
π‘ How we monitor internal coding agents for misalignment β score 50
Sources: lab_blog/OpenAI
How OpenAI uses chain-of-thought monitoring to study misalignment in internal coding agentsβanalyzing real-world deployments to detect risks and strengthen AI safety safeguards.
π‘ OpenAI to acquire Astral β score 50
Sources: lab_blog/OpenAI
Accelerates Codex growth to power the next generation of Python developer tools
π‘ Complementary Reinforcement Learning β score 45
Sources: huggingface
Reinforcement Learning (RL) has emerged as a powerful paradigm for training LLM-based agents, yet remains limited by low sample efficiency, stemming not only from sparse outcome feedback but also from the agent's inability to leverage prior experience across episodes. While augmenting agents with hi
Infrastructure & Compute
π‘ Alignment Makes Language Models Normative, Not Descriptive β score 55
Sources: huggingface
Post-training alignment optimizes language models to match human preference signals, but this objective is not equivalent to modeling observed human behavior. We compare 120 base-aligned model pairs on more than 10,000 real human decisions in multi-round strategic games - bargaining, persuasion, neg
π’ Incremental
Developer Tools
π’ V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning β score 35
Sources: huggingface
We present V-JEPA 2.1, a family of self-supervised models that learn dense, high-quality visual representations for both images and videos while retaining strong global scene understanding. The approach combines four key components. First, a dense predictive loss uses a masking-based objective in wh
π’ When AI Navigates the Fog of War β score 25
Sources: huggingface
Can AI reason about a war before its trajectory becomes historically obvious? Analyzing this capability is difficult because retrospective geopolitical prediction is heavily confounded by training-data leakage. We address this challenge through a temporally grounded case study of the early stages of
π’ GigaWorld-Policy: An Efficient Action-Centered World--Action Model β score 5
Sources: huggingface
World-Action Models (WAM) initialized from pre-trained video generation backbones have demonstrated remarkable potential for robot policy learning. However, existing approaches face two critical bottlenecks that hinder performance and deployment. First, jointly reasoning over future visual dynamics
Other Signals
π’ LoST: Level of Semantics Tokenization for 3D Shapes β score 15
Sources: huggingface
Tokenization is a fundamental technique in the generative modeling of various modalities. In particular, it plays a critical role in autoregressive (AR) models, which have recently emerged as a compelling option for 3D generation. However, optimal tokenization of 3D shapes remains an open question.
π New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Efficient Reasoning with Balanced Thinking | developer_tool | 153 | Open |
| MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild | developer_tool | 144 | Open |
| Video-CoE: Reinforcing Video Event Prediction via Chain of Events | model_release | 94 | Open |
| MosaicMem: Hybrid Spatial Memory for Controllable Video World Models | developer_tool | 92 | Open |
| Alignment Makes Language Models Normative, Not Descriptive | infrastructure | 50 | Open |
| PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching | cs.AI | 0 | Open |
| To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs | cs.AI | 0 | Open |
| PlanTwin: Privacy-Preserving Planning Abstractions for Cloud-Assisted LLM Agents | cs.AI | 0 | Open |
| Soft-Label Governance for Distributional Safety in Multi-Agent Systems | cs.AI | 0 | Open |
| From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents | cs.AI | 0 | Open |
| Evolutionarily Stable Stackelberg Equilibrium | cs.AI | 0 | Open |
| Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimization | cs.AI | 0 | Open |
| An SO(3)-equivariant reciprocal-space neural potential for long-range interactions | cs.AI | 0 | Open |
| SutureAgent: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Space | cs.AI | 0 | Open |
| Stress Classification from ECG Signals Using Vision Transformer | cs.AI | 0 | Open |