πŸ”΄ High Significance

Model Releases

πŸ”΄ Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders β€” score 95 Sources: huggingface

Vision Language Model (VLM) development has largely relied on scaling model size, which hinders deployment on compute-constrained mobile and edge devices such as smartphones and robots. In this work, we explore the performance limits of compact (e.g., 2B and 8B) VLMs. We challenge the prevailing pra

Developer Tools

πŸ”΄ BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning β€” score 85 Sources: huggingface

Proximal constraints are fundamental to the stability of the Large Language Model reinforcement learning. While the canonical clipping mechanism in PPO serves as an efficient surrogate for trust regions, we identify a critical bottleneck: fixed bounds strictly constrain the upward update margin of l

πŸ”΄ Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model β€” score 75 Sources: huggingface

World models provide a powerful framework for simulating environment dynamics conditioned on actions or instructions, enabling downstream tasks such as action planning or policy learning. Recent approaches leverage world models as learned simulators, but its application to decision-time planning rem

🟑 Notable

Model Releases

🟑 Progressive Residual Warmup for Language Model Pretraining β€” score 55 Sources: huggingface

Transformer architectures serve as the backbone for most modern Large Language Models, therefore their pretraining stability and convergence speed are of central concern. Motivated by the logical dependency of sequentially stacked layers, we propose Progressive Residual Warmup (ProRes) for language

🟑 Reasoning Models Struggle to Control their Chains of Thought β€” score 45 Sources: huggingface

Chain-of-thought (CoT) monitoring is a promising tool for detecting misbehaviors and understanding the motivations of modern reasoning models. However, if models can control what they verbalize in their CoT, it could undermine CoT monitorability. To measure this undesirable capability -- CoT control

Developer Tools

🟑 WildActor: Unconstrained Identity-Preserving Video Generation β€” score 65 Sources: huggingface

Production-ready human video generation requires digital actors to maintain strictly consistent full-body identities across dynamic shots, viewpoints and motions, a setting that remains challenging for existing methods. Prior methods often suffer from face-centric behavior that neglects body-level c

🟑 OpenAI to acquire Promptfoo β€” score 50 Sources: lab_blog/OpenAI

OpenAI is acquiring Promptfoo, an AI security platform that helps enterprises identify and remediate vulnerabilities in AI systems during development.

Other Signals

🟑 From games to biology and beyond: 10 years of AlphaGo’s impact β€” score 50 Sources: lab_blog/DeepMind

Ten years since AlphaGo, we explore how it is catalyzing scientific discovery and paving a path to AGI.

🟒 Incremental

Model Releases

🟒 Physical Simulator In-the-Loop Video Generation β€” score 15 Sources: huggingface

Recent advances in diffusion-based video generation have achieved remarkable visual realism but still struggle to obey basic physical laws such as gravity, inertia, and collision. Generated objects often move inconsistently across frames, exhibit implausible dynamics, or violate physical constraints

🟒 HiMAP-Travel: Hierarchical Multi-Agent Planning for Long-Horizon Constrained Travel β€” score 5 Sources: huggingface

Sequential LLM agents fail on long-horizon planning with hard constraints like budgets and diversity requirements. As planning progresses and context grows, these agents drift from global constraints. We propose HiMAP-Travel, a hierarchical multi-agent framework that splits planning into strategic c

Developer Tools

🟒 RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies β€” score 35 Sources: huggingface

Memory is critical for long-horizon and history-dependent robotic manipulation. Such tasks often involve counting repeated actions or manipulating objects that become temporarily occluded. Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms; however, their evaluati

🟒 Dynamic Chunking Diffusion Transformer β€” score 25 Sources: huggingface

Diffusion Transformers process images as fixed-length sequences of tokens produced by a static patchify operation. While effective, this design spends uniform compute on low- and high-information regions alike, ignoring that images contain regions of varying detail and that the denoising process pro

πŸ“„ New Papers

TitleCategoryScoreLink
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encodersmodel_release124Open
BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learningdeveloper_tool63Open
Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Modeldeveloper_tool44Open
WildActor: Unconstrained Identity-Preserving Video Generationdeveloper_tool43Open
Progressive Residual Warmup for Language Model Pretrainingmodel_release41Open
SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Planscs.AI0Open
Slumbering to Precision: Enhancing Artificial Neural Network Calibration Through Sleep-like Processescs.AI0Open
Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision-Language Modelscs.AI0Open
Learning When to Trust in Contextual Banditscs.AI0Open
CCR-Bench: A Comprehensive Benchmark for Evaluating LLMs on Complex Constraints, Control Flows, and Real-World Casescs.AI0Open
Joint Return and Risk Modeling with Deep Neural Networks for Portfolio Constructioncs.AI0Open
Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inferencecs.AI0Open
VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?cs.AI0Open
Visualizing Coalition Formation: From Hedonic Games to Image Segmentationcs.AI0Open
A Reliability Evaluation of Hybrid Deterministic-LLM Based Approaches for Academic Course Registration PDF Information Extractioncs.AI0Open

🏒 Lab Blog Posts