๐Ÿ”ด High Significance

Developer Tools

๐Ÿ”ด Adam's Law: Textual Frequency Law on Large Language Models โ€” score 95 Sources: huggingface

While textual frequency has been validated as relevant to human cognition in reading speed, its relatedness to Large Language Models (LLMs) is seldom studied. We propose a novel research direction in terms of textual data frequency, which is an understudied topic, to the best of our knowledge. Our f

๐Ÿ”ด OpenWorldLib: A Unified Codebase and Definition of Advanced World Models โ€” score 85 Sources: huggingface

World models have garnered significant attention as a promising research direction in artificial intelligence, yet a clear and unified definition remains lacking. In this paper, we introduce OpenWorldLib, a comprehensive and standardized inference framework for Advanced World Models. Drawing on the

Infrastructure & Compute

๐Ÿ”ด MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale โ€” score 75 Sources: huggingface

Current document parsing methods compete primarily on model architecture innovation, while systematic engineering of training data remains underexplored. Yet SOTA models of different architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, sugges

๐ŸŸก Notable

Developer Tools

๐ŸŸก LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models โ€” score 55 Sources: huggingface

Vision-Language-Action (VLA) models achieve strong performance in robotic manipulation by leveraging pre-trained vision-language backbones. However, in downstream robotic settings, they are typically fine-tuned with limited data, leading to overfitting to specific instruction formulations and leavin

๐ŸŸก Memory Intelligence Agent โ€” score 45 Sources: huggingface

Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while s

Infrastructure & Compute

๐ŸŸก TriAttention: Efficient Long Reasoning with Trigonometric KV Compression โ€” score 65 Sources: huggingface

Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE queries. However, queries rotate with position during RoPE, making representative queries very few,

๐ŸŸข Incremental

Model Releases

๐ŸŸข AURA: Always-On Understanding and Real-Time Assistance via Video Streams โ€” score 35 Sources: huggingface

Video Large Language Models (VideoLLMs) have achieved strong performance on many video understanding tasks, but most existing systems remain offline and are not well-suited for live video streams that require continuous observation and timely response. Recent streaming VideoLLMs have made progress,

๐ŸŸข ClawArena: Benchmarking AI Agents in Evolving Information Environments โ€” score 5 Sources: huggingface

AI agents deployed as persistent assistants must maintain correct beliefs as their information environment evolves. In practice, evidence is scattered across heterogeneous sources that often contradict one another, new information can invalidate earlier conclusions, and user preferences surface thro

Developer Tools

๐ŸŸข Can LLMs Learn to Reason Robustly under Noisy Supervision? โ€” score 25 Sources: huggingface

Reinforcement Learning with Verifiable Rewards (RLVR) effectively trains reasoning models that rely on abundant perfect labels, but its vulnerability to unavoidable noisy labels due to expert scarcity remains critically underexplored. In this work, we take the first step toward a systematic analysis

๐ŸŸข FileGram: Grounding Agent Personalization in File-System Behavioral Traces โ€” score 15 Sources: huggingface

Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization remains limited by severe data constraints, as strict privacy barriers and the difficulty of jointly collecting multimodal real-world traces prevent

๐Ÿ“„ New Papers

TitleCategoryScoreLink
Adam's Law: Textual Frequency Law on Large Language Modelsdeveloper_tool509Open
OpenWorldLib: A Unified Codebase and Definition of Advanced World Modelsdeveloper_tool215Open
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scaleinfrastructure126Open
TriAttention: Efficient Long Reasoning with Trigonometric KV Compressioninfrastructure118Open
LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Modelsdeveloper_tool87Open
Region-R1: Reinforcing Query-Side Region Cropping for Multi-Modal Re-Rankingcs.AI0Open
Simulating the Evolution of Alignment and Values in Machine Intelligencecs.AI0Open
Spec Kit Agents: Context-Grounded Agentic Workflowscs.AI0Open
Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decompositioncs.AI0Open
$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Modelscs.AI0Open
Broken by Default: A Formal Verification Study of Security Vulnerabilities in AI-Generated Codecs.AI0Open
Breakthrough the Suboptimal Stable Point in Value-Factorization-Based Multi-Agent Reinforcement Learningcs.AI0Open
LLMs Should Express Uncertainty Explicitlycs.AI0Open
From Exposure to Internalization: Dual-Stream Calibration for In-context Clinical Reasoningcs.AI0Open
Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skillscs.AI0Open