AW · AI Watchtower

🔴 High Significance

Developer Tools

🔴 Adam's Law: Textual Frequency Law on Large Language Models — score 95 Sources: huggingface

While textual frequency has been validated as relevant to human cognition in reading speed, its relatedness to Large Language Models (LLMs) is seldom studied. We propose a novel research direction in terms of textual data frequency, which is an understudied topic, to the best of our knowledge. Our f

🔴 OpenWorldLib: A Unified Codebase and Definition of Advanced World Models — score 85 Sources: huggingface

World models have garnered significant attention as a promising research direction in artificial intelligence, yet a clear and unified definition remains lacking. In this paper, we introduce OpenWorldLib, a comprehensive and standardized inference framework for Advanced World Models. Drawing on the

Infrastructure & Compute

🔴 MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale — score 75 Sources: huggingface

Current document parsing methods compete primarily on model architecture innovation, while systematic engineering of training data remains underexplored. Yet SOTA models of different architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, sugges

🟡 Notable

Developer Tools

🟡 LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models — score 55 Sources: huggingface

Vision-Language-Action (VLA) models achieve strong performance in robotic manipulation by leveraging pre-trained vision-language backbones. However, in downstream robotic settings, they are typically fine-tuned with limited data, leading to overfitting to specific instruction formulations and leavin

🟡 Memory Intelligence Agent — score 45 Sources: huggingface

Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while s

Infrastructure & Compute

🟡 TriAttention: Efficient Long Reasoning with Trigonometric KV Compression — score 65 Sources: huggingface

Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE queries. However, queries rotate with position during RoPE, making representative queries very few,

🟢 Incremental

Model Releases

🟢 AURA: Always-On Understanding and Real-Time Assistance via Video Streams — score 35 Sources: huggingface

Video Large Language Models (VideoLLMs) have achieved strong performance on many video understanding tasks, but most existing systems remain offline and are not well-suited for live video streams that require continuous observation and timely response. Recent streaming VideoLLMs have made progress,

🟢 ClawArena: Benchmarking AI Agents in Evolving Information Environments — score 5 Sources: huggingface

AI agents deployed as persistent assistants must maintain correct beliefs as their information environment evolves. In practice, evidence is scattered across heterogeneous sources that often contradict one another, new information can invalidate earlier conclusions, and user preferences surface thro

Developer Tools

🟢 Can LLMs Learn to Reason Robustly under Noisy Supervision? — score 25 Sources: huggingface

Reinforcement Learning with Verifiable Rewards (RLVR) effectively trains reasoning models that rely on abundant perfect labels, but its vulnerability to unavoidable noisy labels due to expert scarcity remains critically underexplored. In this work, we take the first step toward a systematic analysis

🟢 FileGram: Grounding Agent Personalization in File-System Behavioral Traces — score 15 Sources: huggingface

Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization remains limited by severe data constraints, as strict privacy barriers and the difficulty of jointly collecting multimodal real-world traces prevent

📄 New Papers

Title	Category	Score	Link
Adam's Law: Textual Frequency Law on Large Language Models	developer_tool	509	Open
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models	developer_tool	215	Open
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale	infrastructure	126	Open
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression	infrastructure	118	Open
LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models	developer_tool	87	Open
Region-R1: Reinforcing Query-Side Region Cropping for Multi-Modal Re-Ranking	cs.AI	0	Open
Simulating the Evolution of Alignment and Values in Machine Intelligence	cs.AI	0	Open
Spec Kit Agents: Context-Grounded Agentic Workflows	cs.AI	0	Open
Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition	cs.AI	0	Open
$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models	cs.AI	0	Open
Broken by Default: A Formal Verification Study of Security Vulnerabilities in AI-Generated Code	cs.AI	0	Open
Breakthrough the Suboptimal Stable Point in Value-Factorization-Based Multi-Agent Reinforcement Learning	cs.AI	0	Open
LLMs Should Express Uncertainty Explicitly	cs.AI	0	Open
From Exposure to Internalization: Dual-Stream Calibration for In-context Clinical Reasoning	cs.AI	0	Open
Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills	cs.AI	0	Open

AI Watchtower Briefing — 2026-04-07

🔴 High Significance

Developer Tools

Infrastructure & Compute

🟡 Notable

Developer Tools

Infrastructure & Compute

🟢 Incremental

Model Releases

Developer Tools

📄 New Papers