π΄ High Significance
Model Releases
π΄ Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders β score 95
Sources: huggingface
Vision Language Model (VLM) development has largely relied on scaling model size, which hinders deployment on compute-constrained mobile and edge devices such as smartphones and robots. In this work, we explore the performance limits of compact (e.g., 2B and 8B) VLMs. We challenge the prevailing pra
Developer Tools
π΄ BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning β score 85
Sources: huggingface
Proximal constraints are fundamental to the stability of the Large Language Model reinforcement learning. While the canonical clipping mechanism in PPO serves as an efficient surrogate for trust regions, we identify a critical bottleneck: fixed bounds strictly constrain the upward update margin of l
π΄ Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model β score 75
Sources: huggingface
World models provide a powerful framework for simulating environment dynamics conditioned on actions or instructions, enabling downstream tasks such as action planning or policy learning. Recent approaches leverage world models as learned simulators, but its application to decision-time planning rem
π‘ Notable
Model Releases
π‘ Progressive Residual Warmup for Language Model Pretraining β score 55
Sources: huggingface
Transformer architectures serve as the backbone for most modern Large Language Models, therefore their pretraining stability and convergence speed are of central concern. Motivated by the logical dependency of sequentially stacked layers, we propose Progressive Residual Warmup (ProRes) for language
π‘ Reasoning Models Struggle to Control their Chains of Thought β score 45
Sources: huggingface
Chain-of-thought (CoT) monitoring is a promising tool for detecting misbehaviors and understanding the motivations of modern reasoning models. However, if models can control what they verbalize in their CoT, it could undermine CoT monitorability. To measure this undesirable capability -- CoT control
Developer Tools
π‘ WildActor: Unconstrained Identity-Preserving Video Generation β score 65
Sources: huggingface
Production-ready human video generation requires digital actors to maintain strictly consistent full-body identities across dynamic shots, viewpoints and motions, a setting that remains challenging for existing methods. Prior methods often suffer from face-centric behavior that neglects body-level c
π‘ OpenAI to acquire Promptfoo β score 50
Sources: lab_blog/OpenAI
OpenAI is acquiring Promptfoo, an AI security platform that helps enterprises identify and remediate vulnerabilities in AI systems during development.
Other Signals
π‘ From games to biology and beyond: 10 years of AlphaGoβs impact β score 50
Sources: lab_blog/DeepMind
Ten years since AlphaGo, we explore how it is catalyzing scientific discovery and paving a path to AGI.
π’ Incremental
Model Releases
π’ Physical Simulator In-the-Loop Video Generation β score 15
Sources: huggingface
Recent advances in diffusion-based video generation have achieved remarkable visual realism but still struggle to obey basic physical laws such as gravity, inertia, and collision. Generated objects often move inconsistently across frames, exhibit implausible dynamics, or violate physical constraints
π’ HiMAP-Travel: Hierarchical Multi-Agent Planning for Long-Horizon Constrained Travel β score 5
Sources: huggingface
Sequential LLM agents fail on long-horizon planning with hard constraints like budgets and diversity requirements. As planning progresses and context grows, these agents drift from global constraints. We propose HiMAP-Travel, a hierarchical multi-agent framework that splits planning into strategic c
Developer Tools
π’ RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies β score 35
Sources: huggingface
Memory is critical for long-horizon and history-dependent robotic manipulation. Such tasks often involve counting repeated actions or manipulating objects that become temporarily occluded. Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms; however, their evaluati
π’ Dynamic Chunking Diffusion Transformer β score 25
Sources: huggingface
Diffusion Transformers process images as fixed-length sequences of tokens produced by a static patchify operation. While effective, this design spends uniform compute on low- and high-information regions alike, ignoring that images contain regions of varying detail and that the denoising process pro
π New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders | model_release | 124 | Open |
| BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning | developer_tool | 63 | Open |
| Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model | developer_tool | 44 | Open |
| WildActor: Unconstrained Identity-Preserving Video Generation | developer_tool | 43 | Open |
| Progressive Residual Warmup for Language Model Pretraining | model_release | 41 | Open |
| SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans | cs.AI | 0 | Open |
| Slumbering to Precision: Enhancing Artificial Neural Network Calibration Through Sleep-like Processes | cs.AI | 0 | Open |
| Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision-Language Models | cs.AI | 0 | Open |
| Learning When to Trust in Contextual Bandits | cs.AI | 0 | Open |
| CCR-Bench: A Comprehensive Benchmark for Evaluating LLMs on Complex Constraints, Control Flows, and Real-World Cases | cs.AI | 0 | Open |
| Joint Return and Risk Modeling with Deep Neural Networks for Portfolio Construction | cs.AI | 0 | Open |
| Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference | cs.AI | 0 | Open |
| VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning? | cs.AI | 0 | Open |
| Visualizing Coalition Formation: From Hedonic Games to Image Segmentation | cs.AI | 0 | Open |
| A Reliability Evaluation of Hybrid Deterministic-LLM Based Approaches for Academic Course Registration PDF Information Extraction | cs.AI | 0 | Open |