πŸ”΄ High Significance

Model Releases

πŸ”΄ Video-CoE: Reinforcing Video Event Prediction via Chain of Events β€” score 75 Sources: huggingface

Despite advances in the application of MLLMs for various video tasks, video event prediction (VEP) remains relatively underexplored. VEP requires the model to perform fine-grained temporal modeling of videos and establish logical relationships between videos and future events, which current MLLMs st

Developer Tools

πŸ”΄ Efficient Reasoning with Balanced Thinking β€” score 95 Sources: huggingface

Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to ineffic

πŸ”΄ MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild β€” score 85 Sources: huggingface

Large language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for continuous service and the necessity of updating capabilities to match shifting task distributions. O

🟑 Notable

Developer Tools

🟑 MosaicMem: Hybrid Spatial Memory for Controllable Video World Models β€” score 65 Sources: huggingface

Video diffusion models are moving beyond short, plausible clips toward world simulators that must remain consistent under camera motion, revisits, and intervention. Yet spatial memory remains a key bottleneck: explicit 3D structures can improve reprojection-based consistency but struggle to depict m

🟑 How we monitor internal coding agents for misalignment β€” score 50 Sources: lab_blog/OpenAI

How OpenAI uses chain-of-thought monitoring to study misalignment in internal coding agentsβ€”analyzing real-world deployments to detect risks and strengthen AI safety safeguards.

🟑 OpenAI to acquire Astral β€” score 50 Sources: lab_blog/OpenAI

Accelerates Codex growth to power the next generation of Python developer tools

🟑 Complementary Reinforcement Learning β€” score 45 Sources: huggingface

Reinforcement Learning (RL) has emerged as a powerful paradigm for training LLM-based agents, yet remains limited by low sample efficiency, stemming not only from sparse outcome feedback but also from the agent's inability to leverage prior experience across episodes. While augmenting agents with hi

Infrastructure & Compute

🟑 Alignment Makes Language Models Normative, Not Descriptive β€” score 55 Sources: huggingface

Post-training alignment optimizes language models to match human preference signals, but this objective is not equivalent to modeling observed human behavior. We compare 120 base-aligned model pairs on more than 10,000 real human decisions in multi-round strategic games - bargaining, persuasion, neg

🟒 Incremental

Developer Tools

🟒 V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning β€” score 35 Sources: huggingface

We present V-JEPA 2.1, a family of self-supervised models that learn dense, high-quality visual representations for both images and videos while retaining strong global scene understanding. The approach combines four key components. First, a dense predictive loss uses a masking-based objective in wh

🟒 When AI Navigates the Fog of War β€” score 25 Sources: huggingface

Can AI reason about a war before its trajectory becomes historically obvious? Analyzing this capability is difficult because retrospective geopolitical prediction is heavily confounded by training-data leakage. We address this challenge through a temporally grounded case study of the early stages of

🟒 GigaWorld-Policy: An Efficient Action-Centered World--Action Model β€” score 5 Sources: huggingface

World-Action Models (WAM) initialized from pre-trained video generation backbones have demonstrated remarkable potential for robot policy learning. However, existing approaches face two critical bottlenecks that hinder performance and deployment. First, jointly reasoning over future visual dynamics

Other Signals

🟒 LoST: Level of Semantics Tokenization for 3D Shapes β€” score 15 Sources: huggingface

Tokenization is a fundamental technique in the generative modeling of various modalities. In particular, it plays a critical role in autoregressive (AR) models, which have recently emerged as a compelling option for 3D generation. However, optimal tokenization of 3D shapes remains an open question.

πŸ“„ New Papers

TitleCategoryScoreLink
Efficient Reasoning with Balanced Thinkingdeveloper_tool153Open
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wilddeveloper_tool144Open
Video-CoE: Reinforcing Video Event Prediction via Chain of Eventsmodel_release94Open
MosaicMem: Hybrid Spatial Memory for Controllable Video World Modelsdeveloper_tool92Open
Alignment Makes Language Models Normative, Not Descriptiveinfrastructure50Open
PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matchingcs.AI0Open
To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMscs.AI0Open
PlanTwin: Privacy-Preserving Planning Abstractions for Cloud-Assisted LLM Agentscs.AI0Open
Soft-Label Governance for Distributional Safety in Multi-Agent Systemscs.AI0Open
From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agentscs.AI0Open
Evolutionarily Stable Stackelberg Equilibriumcs.AI0Open
Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimizationcs.AI0Open
An SO(3)-equivariant reciprocal-space neural potential for long-range interactionscs.AI0Open
SutureAgent: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Spacecs.AI0Open
Stress Classification from ECG Signals Using Vision Transformercs.AI0Open

🏒 Lab Blog Posts