πŸ”΄ High Significance

Model Releases

πŸ”΄ On Data Engineering for Scaling LLM Terminal Capabilities β€” score 95 Sources: huggingface

Despite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain largely undisclosed. We address this gap through a systematic study of data engineering practices for terminal agents, making two key contr

Developer Tools

πŸ”΄ Query-focused and Memory-aware Reranker for Long Context Processing β€” score 85 Sources: huggingface

Built upon the existing analysis of retrieval heads in large language models, we propose an alternative reranking framework that trains models to estimate passage-query relevance using the attention scores of selected heads. This approach provides a listwise solution that leverages holistic informat

πŸ”΄ Test-Time Training with KV Binding Is Secretly Linear Attention β€” score 75 Sources: huggingface

Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping at test time. However, our analysis reveals multiple phenomena that contradict this memorization-based interpretation. Motivated by these f

🟑 Notable

Developer Tools

🟑 PyVision-RL: Forging Open Agentic Vision Models via RL β€” score 65 Sources: huggingface

Reinforcement learning for agentic multimodal models often suffers from interaction collapse, where models learn to reduce tool usage and multi-turn reasoning, limiting the benefits of agentic behavior. We introduce PyVision-RL, a reinforcement learning framework for open-weight multimodal models th

🟑 From Perception to Action: An Interactive Benchmark for Vision Reasoning β€” score 55 Sources: huggingface

Understanding the physical structure is essential for real-world applications such as embodied agents, interactive design, and long-horizon manipulation. Yet, prevailing Vision-Language Model (VLM) evaluations still center on structure-agnostic, single-turn setups (e.g., VQA), which fail to assess a

🟑 Multi-Vector Index Compression in Any Modality β€” score 45 Sources: huggingface

We study efficient multi-vector retrieval for late interaction in any modality. Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos, but its computation and storage costs grow linearly with document length, making it costly for

Other Signals

🟑 Disrupting malicious uses of AI | February 2026 β€” score 50 Sources: lab_blog/OpenAI

Our latest threat report examines how malicious actors combine AI models with websites and social platformsβ€”and what it means for detection and defense.

🟒 Incremental

Model Releases

🟒 DREAM: Deep Research Evaluation with Agentic Metrics β€” score 10 Sources: huggingface

Deep Research Agents generate analyst-grade reports, yet evaluating them remains challenging due to the absence of a single ground truth and the multidimensional nature of research quality. Recent benchmarks propose distinct methodologies, yet they suffer from the Mirage of Synthesis, where strong s

Developer Tools

🟒 QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models β€” score 35 Sources: huggingface

Vision-language-action (VLA) models unify perception, language, and control for embodied agents but face significant challenges in practical deployment due to rapidly increasing compute and memory demands, especially as models scale to longer horizons and larger backbones. To address these bottlenec

🟒 LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces β€” score 25 Sources: huggingface

Recent advances in AI-assisted programming have empowered agents to execute complex workflows via command-line interfaces, however, existing benchmarks are limited by short task horizons, data contamination from GitHub scraping, and a lack of fine-grained evaluation metrics, fail to rigorously evalu

🟒 See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis β€” score 10 Sources: huggingface

Despite recent advances in diffusion models, AI generated images still often contain visual artifacts that compromise realism. Although more thorough pre-training and bigger models might reduce artifacts, there is no assurance that they can be completely eliminated, which makes artifact mitigation a

πŸ“„ New Papers

TitleCategoryScoreLink
On Data Engineering for Scaling LLM Terminal Capabilitiesmodel_release106Open
Query-focused and Memory-aware Reranker for Long Context Processingdeveloper_tool62Open
Test-Time Training with KV Binding Is Secretly Linear Attentiondeveloper_tool34Open
PyVision-RL: Forging Open Agentic Vision Models via RLdeveloper_tool33Open
From Perception to Action: An Interactive Benchmark for Vision Reasoningdeveloper_tool26Open
Adversarial Robustness of Deep Learning-Based Thyroid Nodule Segmentation in Ultrasoundcs.AI0Open
Revisiting Text Ranking in Deep Researchcs.AI0Open
A Knowledge-Driven Approach to Music Segmentation, Music Source Separation and Cinematic Audio Source Separationcs.AI0Open
Poisoned Acousticscs.AI0Open
GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learningcs.AI0Open
Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Informationcs.AI0Open
Training Generalizable Collaborative Agents via Strategic Risk Aversioncs.AI0Open
One Brain, Omni Modalities: Towards Unified Non-Invasive Brain Decoding with Large Language Modelscs.AI0Open
LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policiescs.AI0Open
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learningcs.AI0Open

🏒 Lab Blog Posts