๐Ÿ”ด High Significance

Model Releases

๐Ÿ”ด Reasoning Models Struggle to Control their Chains of Thought โ€” score 92 Sources: huggingface

Chain-of-thought (CoT) monitoring is a promising tool for detecting misbehaviors and understanding the motivations of modern reasoning models. However, if models can control what they verbalize in their CoT, it could undermine CoT monitorability. To measure this undesirable capability -- CoT control

Developer Tools

๐Ÿ”ด RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies โ€” score 75 Sources: huggingface

Memory is critical for long-horizon and history-dependent robotic manipulation. Such tasks often involve counting repeated actions or manipulating objects that become temporarily occluded. Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms; however, their evaluati

๐ŸŸก Notable

Model Releases

๐ŸŸก Physical Simulator In-the-Loop Video Generation โ€” score 42 Sources: huggingface

Recent advances in diffusion-based video generation have achieved remarkable visual realism but still struggle to obey basic physical laws such as gravity, inertia, and collision. Generated objects often move inconsistently across frames, exhibit implausible dynamics, or violate physical constraints

Developer Tools

๐ŸŸก Dynamic Chunking Diffusion Transformer โ€” score 58 Sources: huggingface

Diffusion Transformers process images as fixed-length sequences of tokens produced by a static patchify operation. While effective, this design spends uniform compute on low- and high-information regions alike, ignoring that images contain regions of varying detail and that the denoising process pro

๐ŸŸข Incremental

Developer Tools

๐ŸŸข FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling โ€” score 25 Sources: huggingface

Long-context modeling is a pivotal capability for Large Language Models, yet the quadratic complexity of attention remains a critical bottleneck, particularly during the compute-intensive prefilling phase. While various sparse attention mechanisms have been explored, they typically suffer from eithe

๐ŸŸข PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction โ€” score 8 Sources: huggingface

We introduce PixARMesh, a method to autoregressively reconstruct complete 3D indoor scene meshes directly from a single RGB image. Unlike prior methods that rely on implicit signed distance fields and post-hoc layout optimization, PixARMesh jointly predicts object layout and geometry within a unifie

๐Ÿ“„ New Papers

TitleCategoryScoreLink
Reasoning Models Struggle to Control their Chains of Thoughtmodel_release39Open
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policiesdeveloper_tool32Open
Dynamic Chunking Diffusion Transformerdeveloper_tool16Open
Generating Hierarchical JSON Representations of Scientific Sentences Using LLMscs.AI0Open
Sparsity and Out-of-Distribution Generalizationcs.AI0Open
AQuA: Toward Strategic Response Generation for Ambiguous Visual Questionscs.AI0Open
Attribution-Guided Model Rectification of Unreliable Neural Network Behaviorscs.AI0Open
Adaptive Capacity Allocation for Vision Language Action Fine-tuningcs.AI0Open
UnSCAR: Universal, Scalable, Controllable, and Adaptable Image Restorationcs.AI0Open
Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safetycs.AI0Open
Machine Learning for the Internet of Underwater Things: From Fundamentals to Implementationcs.AI0Open
Context Channel Capacity: An Information-Theoretic Framework for Understanding Catastrophic Forgettingcs.AI0Open
Dynamic Vehicle Routing Problem with Prompt Confirmation of Advance Requestscs.AI0Open
What on Earth is AlphaEarth? Hierarchical structure and functional interpretability for global land covercs.AI0Open
AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluationcs.AI0Open