๐ด High Significance
Developer Tools
๐ด CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty โ score 95
Sources: huggingface
Existing benchmarks for Large Language Model (LLM) agents focus on task completion under idealistic settings but overlook reliability in real-world, user-facing applications. In domains, such as in-car voice assistants, users often issue incomplete or ambiguous requests, creating intrinsic uncertain
๐ด DFlash: Block Diffusion for Flash Speculative Decoding โ score 85
Sources: huggingface
Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding mitigates this bottleneck by using a fast draft model whose outputs are verified in parallel by the targ
๐ด Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening โ score 75
Sources: huggingface
As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms adopt a mandatory checking paradigm, in which security validation is forcibly triggered at predefin
๐ก Notable
Model Releases
๐ก MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents โ score 65
Sources: huggingface
Most Large Language Model (LLM) agent memory systems rely on a small set of static, hand-designed operations for extracting memory. These fixed procedures hard-code human priors about what to store and how to revise memory, making them rigid under diverse interaction patterns and inefficient on long
Developer Tools
๐ก Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR โ score 55
Sources: huggingface
Recent applications of Reinforcement Learning with Verifiable Rewards (RLVR) to Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated significant success in enhancing reasoning capabilities for complex tasks. During RLVR training, an increase in response length is often re
๐ก Context Forcing: Consistent Autoregressive Video Generation with Long Context โ score 45
Sources: huggingface
Recent approaches to real-time long video generation typically employ streaming tuning strategies, attempting to train a long-context student using a short-context (memoryless) teacher. In these frameworks, the student performs long rollouts but receives supervision from a teacher limited to short 5
Other Signals
๐ก Making AI work for everyone, everywhere: our approach to localization โ score 50
Sources: lab_blog/OpenAI
OpenAI shares its approach to AI localization, showing how globally shared frontier models can be adapted to local languages, laws, and cultures without compromising safety.
๐ข Incremental
Model Releases
๐ข Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations โ score 25
Sources: huggingface
High-quality kernel is critical for scalable AI systems, and enabling LLMs to generate such code would advance AI development. However, training LLMs for this task requires sufficient data, a robust environment, and the process is often vulnerable to reward hacking and lazy optimization. In these ca
Developer Tools
๐ข Reinforced Attention Learning โ score 35
Sources: huggingface
Post-training with Reinforcement Learning (RL) has substantially improved reasoning in Large Language Models (LLMs) via test-time scaling. However, extending this paradigm to Multimodal LLMs (MLLMs) through verbose rationales yields limited gains for perception and can even degrade performance. We
๐ข RISE-Video: Can Video Generators Decode Implicit World Rules? โ score 10
Sources: huggingface
While generative video models have achieved remarkable visual fidelity, their capacity to internalize and reason over implicit world rules remains a critical yet under-explored frontier. To bridge this gap, we present RISE-Video, a pioneering reasoning-oriented benchmark for Text-Image-to-Video (TI2
๐ข Reinforcement World Model Learning for LLM-based Agents โ score 10
Sources: huggingface
Large language models (LLMs) have achieved strong performance in language-centric tasks. However, in agentic settings, LLMs often struggle to anticipate action consequences and adapt to environment dynamics, highlighting the need for world-modeling capabilities in LLM-based agents. We propose Reinfo
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty | developer_tool | 91 | Open |
| DFlash: Block Diffusion for Flash Speculative Decoding | developer_tool | 75 | Open |
| Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening | developer_tool | 73 | Open |
| MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents | model_release | 67 | Open |
| Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR | developer_tool | 57 | Open |
| One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models | cs.AI | 0 | Open |
| Pro-ZD: A Transferable Graph Neural Network Approach for Proactive Zero-Day Threats Mitigation | cs.AI | 0 | Open |
| Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making | cs.AI | 0 | Open |
| Toward generative machine learning for boosting ensembles of climate simulations | cs.AI | 0 | Open |
| Robust Pre-Training of Medical Vision-and-Language Models with Domain-Invariant Multi-Modal Masked Reconstruction | cs.AI | 0 | Open |
| LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning | cs.AI | 0 | Open |
| Accelerating Vision Transformers on Brain Processing Unit | cs.AI | 0 | Open |
| Attention's Gravitational Field:A Power-Law Interpretation of Positional Correlation | cs.AI | 0 | Open |
| CALM: Class-Conditional Sparse Attention Vectors for Large Audio-Language Models | cs.AI | 0 | Open |
| The Condensate Theorem: Transformers are O(n), Not $O(n^2)$ | cs.AI | 0 | Open |