๐ด High Significance
Developer Tools
๐ด Adam's Law: Textual Frequency Law on Large Language Models โ score 95
Sources: huggingface
While textual frequency has been validated as relevant to human cognition in reading speed, its relatedness to Large Language Models (LLMs) is seldom studied. We propose a novel research direction in terms of textual data frequency, which is an understudied topic, to the best of our knowledge. Our f
๐ด OpenWorldLib: A Unified Codebase and Definition of Advanced World Models โ score 85
Sources: huggingface
World models have garnered significant attention as a promising research direction in artificial intelligence, yet a clear and unified definition remains lacking. In this paper, we introduce OpenWorldLib, a comprehensive and standardized inference framework for Advanced World Models. Drawing on the
Infrastructure & Compute
๐ด MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale โ score 75
Sources: huggingface
Current document parsing methods compete primarily on model architecture innovation, while systematic engineering of training data remains underexplored. Yet SOTA models of different architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, sugges
๐ก Notable
Developer Tools
๐ก LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models โ score 55
Sources: huggingface
Vision-Language-Action (VLA) models achieve strong performance in robotic manipulation by leveraging pre-trained vision-language backbones. However, in downstream robotic settings, they are typically fine-tuned with limited data, leading to overfitting to specific instruction formulations and leavin
๐ก Memory Intelligence Agent โ score 45
Sources: huggingface
Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while s
Infrastructure & Compute
๐ก TriAttention: Efficient Long Reasoning with Trigonometric KV Compression โ score 65
Sources: huggingface
Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE queries. However, queries rotate with position during RoPE, making representative queries very few,
๐ข Incremental
Model Releases
๐ข AURA: Always-On Understanding and Real-Time Assistance via Video Streams โ score 35
Sources: huggingface
Video Large Language Models (VideoLLMs) have achieved strong performance on many video understanding tasks, but most existing systems remain offline and are not well-suited for live video streams that require continuous observation and timely response. Recent streaming VideoLLMs have made progress,
๐ข ClawArena: Benchmarking AI Agents in Evolving Information Environments โ score 5
Sources: huggingface
AI agents deployed as persistent assistants must maintain correct beliefs as their information environment evolves. In practice, evidence is scattered across heterogeneous sources that often contradict one another, new information can invalidate earlier conclusions, and user preferences surface thro
Developer Tools
๐ข Can LLMs Learn to Reason Robustly under Noisy Supervision? โ score 25
Sources: huggingface
Reinforcement Learning with Verifiable Rewards (RLVR) effectively trains reasoning models that rely on abundant perfect labels, but its vulnerability to unavoidable noisy labels due to expert scarcity remains critically underexplored. In this work, we take the first step toward a systematic analysis
๐ข FileGram: Grounding Agent Personalization in File-System Behavioral Traces โ score 15
Sources: huggingface
Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization remains limited by severe data constraints, as strict privacy barriers and the difficulty of jointly collecting multimodal real-world traces prevent
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Adam's Law: Textual Frequency Law on Large Language Models | developer_tool | 509 | Open |
| OpenWorldLib: A Unified Codebase and Definition of Advanced World Models | developer_tool | 215 | Open |
| MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale | infrastructure | 126 | Open |
| TriAttention: Efficient Long Reasoning with Trigonometric KV Compression | infrastructure | 118 | Open |
| LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models | developer_tool | 87 | Open |
| Region-R1: Reinforcing Query-Side Region Cropping for Multi-Modal Re-Ranking | cs.AI | 0 | Open |
| Simulating the Evolution of Alignment and Values in Machine Intelligence | cs.AI | 0 | Open |
| Spec Kit Agents: Context-Grounded Agentic Workflows | cs.AI | 0 | Open |
| Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition | cs.AI | 0 | Open |
| $S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models | cs.AI | 0 | Open |
| Broken by Default: A Formal Verification Study of Security Vulnerabilities in AI-Generated Code | cs.AI | 0 | Open |
| Breakthrough the Suboptimal Stable Point in Value-Factorization-Based Multi-Agent Reinforcement Learning | cs.AI | 0 | Open |
| LLMs Should Express Uncertainty Explicitly | cs.AI | 0 | Open |
| From Exposure to Internalization: Dual-Stream Calibration for In-context Clinical Reasoning | cs.AI | 0 | Open |
| Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills | cs.AI | 0 | Open |