π΄ High Significance
Model Releases
π΄ On Data Engineering for Scaling LLM Terminal Capabilities β score 95
Sources: huggingface
Despite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain largely undisclosed. We address this gap through a systematic study of data engineering practices for terminal agents, making two key contr
Developer Tools
π΄ Query-focused and Memory-aware Reranker for Long Context Processing β score 85
Sources: huggingface
Built upon the existing analysis of retrieval heads in large language models, we propose an alternative reranking framework that trains models to estimate passage-query relevance using the attention scores of selected heads. This approach provides a listwise solution that leverages holistic informat
π΄ Test-Time Training with KV Binding Is Secretly Linear Attention β score 75
Sources: huggingface
Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping at test time. However, our analysis reveals multiple phenomena that contradict this memorization-based interpretation. Motivated by these f
π‘ Notable
Developer Tools
π‘ PyVision-RL: Forging Open Agentic Vision Models via RL β score 65
Sources: huggingface
Reinforcement learning for agentic multimodal models often suffers from interaction collapse, where models learn to reduce tool usage and multi-turn reasoning, limiting the benefits of agentic behavior. We introduce PyVision-RL, a reinforcement learning framework for open-weight multimodal models th
π‘ From Perception to Action: An Interactive Benchmark for Vision Reasoning β score 55
Sources: huggingface
Understanding the physical structure is essential for real-world applications such as embodied agents, interactive design, and long-horizon manipulation. Yet, prevailing Vision-Language Model (VLM) evaluations still center on structure-agnostic, single-turn setups (e.g., VQA), which fail to assess a
π‘ Multi-Vector Index Compression in Any Modality β score 45
Sources: huggingface
We study efficient multi-vector retrieval for late interaction in any modality. Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos, but its computation and storage costs grow linearly with document length, making it costly for
Other Signals
π‘ Disrupting malicious uses of AI | February 2026 β score 50
Sources: lab_blog/OpenAI
Our latest threat report examines how malicious actors combine AI models with websites and social platformsβand what it means for detection and defense.
π’ Incremental
Model Releases
π’ DREAM: Deep Research Evaluation with Agentic Metrics β score 10
Sources: huggingface
Deep Research Agents generate analyst-grade reports, yet evaluating them remains challenging due to the absence of a single ground truth and the multidimensional nature of research quality. Recent benchmarks propose distinct methodologies, yet they suffer from the Mirage of Synthesis, where strong s
Developer Tools
π’ QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models β score 35
Sources: huggingface
Vision-language-action (VLA) models unify perception, language, and control for embodied agents but face significant challenges in practical deployment due to rapidly increasing compute and memory demands, especially as models scale to longer horizons and larger backbones. To address these bottlenec
π’ LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces β score 25
Sources: huggingface
Recent advances in AI-assisted programming have empowered agents to execute complex workflows via command-line interfaces, however, existing benchmarks are limited by short task horizons, data contamination from GitHub scraping, and a lack of fine-grained evaluation metrics, fail to rigorously evalu
π’ See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis β score 10
Sources: huggingface
Despite recent advances in diffusion models, AI generated images still often contain visual artifacts that compromise realism. Although more thorough pre-training and bigger models might reduce artifacts, there is no assurance that they can be completely eliminated, which makes artifact mitigation a
π New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| On Data Engineering for Scaling LLM Terminal Capabilities | model_release | 106 | Open |
| Query-focused and Memory-aware Reranker for Long Context Processing | developer_tool | 62 | Open |
| Test-Time Training with KV Binding Is Secretly Linear Attention | developer_tool | 34 | Open |
| PyVision-RL: Forging Open Agentic Vision Models via RL | developer_tool | 33 | Open |
| From Perception to Action: An Interactive Benchmark for Vision Reasoning | developer_tool | 26 | Open |
| Adversarial Robustness of Deep Learning-Based Thyroid Nodule Segmentation in Ultrasound | cs.AI | 0 | Open |
| Revisiting Text Ranking in Deep Research | cs.AI | 0 | Open |
| A Knowledge-Driven Approach to Music Segmentation, Music Source Separation and Cinematic Audio Source Separation | cs.AI | 0 | Open |
| Poisoned Acoustics | cs.AI | 0 | Open |
| GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning | cs.AI | 0 | Open |
| Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information | cs.AI | 0 | Open |
| Training Generalizable Collaborative Agents via Strategic Risk Aversion | cs.AI | 0 | Open |
| One Brain, Omni Modalities: Towards Unified Non-Invasive Brain Decoding with Large Language Models | cs.AI | 0 | Open |
| LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies | cs.AI | 0 | Open |
| ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning | cs.AI | 0 | Open |