๐ด High Significance
Model Releases
๐ด Reasoning Models Struggle to Control their Chains of Thought โ score 92
Sources: huggingface
Chain-of-thought (CoT) monitoring is a promising tool for detecting misbehaviors and understanding the motivations of modern reasoning models. However, if models can control what they verbalize in their CoT, it could undermine CoT monitorability. To measure this undesirable capability -- CoT control
Developer Tools
๐ด RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies โ score 75
Sources: huggingface
Memory is critical for long-horizon and history-dependent robotic manipulation. Such tasks often involve counting repeated actions or manipulating objects that become temporarily occluded. Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms; however, their evaluati
๐ก Notable
Model Releases
๐ก Physical Simulator In-the-Loop Video Generation โ score 42
Sources: huggingface
Recent advances in diffusion-based video generation have achieved remarkable visual realism but still struggle to obey basic physical laws such as gravity, inertia, and collision. Generated objects often move inconsistently across frames, exhibit implausible dynamics, or violate physical constraints
Developer Tools
๐ก Dynamic Chunking Diffusion Transformer โ score 58
Sources: huggingface
Diffusion Transformers process images as fixed-length sequences of tokens produced by a static patchify operation. While effective, this design spends uniform compute on low- and high-information regions alike, ignoring that images contain regions of varying detail and that the denoising process pro
๐ข Incremental
Developer Tools
๐ข FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling โ score 25
Sources: huggingface
Long-context modeling is a pivotal capability for Large Language Models, yet the quadratic complexity of attention remains a critical bottleneck, particularly during the compute-intensive prefilling phase. While various sparse attention mechanisms have been explored, they typically suffer from eithe
๐ข PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction โ score 8
Sources: huggingface
We introduce PixARMesh, a method to autoregressively reconstruct complete 3D indoor scene meshes directly from a single RGB image. Unlike prior methods that rely on implicit signed distance fields and post-hoc layout optimization, PixARMesh jointly predicts object layout and geometry within a unifie
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Reasoning Models Struggle to Control their Chains of Thought | model_release | 39 | Open |
| RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies | developer_tool | 32 | Open |
| Dynamic Chunking Diffusion Transformer | developer_tool | 16 | Open |
| Generating Hierarchical JSON Representations of Scientific Sentences Using LLMs | cs.AI | 0 | Open |
| Sparsity and Out-of-Distribution Generalization | cs.AI | 0 | Open |
| AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions | cs.AI | 0 | Open |
| Attribution-Guided Model Rectification of Unreliable Neural Network Behaviors | cs.AI | 0 | Open |
| Adaptive Capacity Allocation for Vision Language Action Fine-tuning | cs.AI | 0 | Open |
| UnSCAR: Universal, Scalable, Controllable, and Adaptable Image Restoration | cs.AI | 0 | Open |
| Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety | cs.AI | 0 | Open |
| Machine Learning for the Internet of Underwater Things: From Fundamentals to Implementation | cs.AI | 0 | Open |
| Context Channel Capacity: An Information-Theoretic Framework for Understanding Catastrophic Forgetting | cs.AI | 0 | Open |
| Dynamic Vehicle Routing Problem with Prompt Confirmation of Advance Requests | cs.AI | 0 | Open |
| What on Earth is AlphaEarth? Hierarchical structure and functional interpretability for global land cover | cs.AI | 0 | Open |
| AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation | cs.AI | 0 | Open |