๐Ÿ”ด High Significance

Model Releases

๐Ÿ”ด CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents โ€” score 95 Sources: huggingface

Computer-use agents (CUAs) hold great promise for automating complex desktop workflows, yet progress toward general-purpose agents is bottlenecked by the scarcity of continuous, high-quality human demonstration videos. Recent work emphasizes that continuous video, not sparse screenshots, is the crit

๐Ÿ”ด Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? โ€” score 85 Sources: huggingface

Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening reasoning traces. However, in mathematical reasoning, we find that it can reduce response length while degrading performance. We trace this degradation to the suppression of ep

Developer Tools

๐Ÿ”ด UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience โ€” score 75 Sources: huggingface

Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI

๐ŸŸก Notable

Model Releases

๐ŸŸก T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search โ€” score 55 Sources: huggingface

While prior red-teaming efforts have focused on eliciting harmful text outputs from large language models (LLMs), such approaches fail to capture agent-specific vulnerabilities that emerge through multi-step tool execution, particularly in rapidly growing ecosystems such as the Model Context Protoco

๐ŸŸก Gemini 3.1 Flash Live: Making audio AI more natural and reliable โ€” score 50 Sources: lab_blog/DeepMind

Our latest voice model has improved precision and lower latency to make voice interactions more fluid, natural and precise.

Developer Tools

๐ŸŸก EVA: Efficient Reinforcement Learning for End-to-End Video Agent โ€” score 65 Sources: huggingface

Video understanding with multimodal large language models (MLLMs) remains challenging due to the long token sequences of videos, which contain extensive temporal dependencies and redundant frames. Existing approaches typically treat MLLMs as passive recognizers, processing entire videos or uniformly

๐ŸŸก When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning โ€” score 45 Sources: huggingface

Recent progress in multimodal large language models has led to strong performance on reasoning tasks, but these improvements largely rely on high-quality annotated data or teacher-model distillation, both of which are costly and difficult to scale. To address this, we propose an unsupervised self-ev

๐ŸŸข Incremental

Developer Tools

๐ŸŸข GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents โ€” score 30 Sources: huggingface

Multimodal LLMs are increasingly deployed as perceptual backbones for autonomous agents in 3D environments, from robotics to virtual worlds. These applications require agents to perceive rapid state changes, attribute actions to the correct entities, and reason about concurrent multi-agent behaviors

๐ŸŸข Understanding the Challenges in Iterative Generative Optimization with LLMs โ€” score 30 Sources: huggingface

Generative optimization uses large language models (LLMs) to iteratively improve artifacts (such as code, workflows or prompts) using execution feedback. It is a promising approach to building self-improving agents, yet in practice remains brittle: despite active research, only 9% of surveyed agents

๐ŸŸข The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics โ€” score 15 Sources: huggingface

While recent generative video models have achieved remarkable visual realism and are being explored as world models, true physical simulation requires mastering both space and time. Current models can produce visually smooth kinematics, yet they lack a reliable internal motion pulse to ground these

๐ŸŸข 4DGS360: 360ยฐ Gaussian Reconstruction of Dynamic Objects from a Single Video โ€” score 5 Sources: huggingface

We introduce 4DGS360, a diffusion-free framework for 360^{circ} dynamic object reconstruction from casual monocular video. Existing methods often fail to reconstruct consistent 360^{circ} geometry, as their heavy reliance on 2D-native priors causes initial points to overfit to visible surface in eac

๐Ÿ“„ New Papers

TitleCategoryScoreLink
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agentsmodel_release103Open
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?model_release61Open
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experiencedeveloper_tool51Open
EVA: Efficient Reinforcement Learning for End-to-End Video Agentdeveloper_tool48Open
T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Searchmodel_release40Open
EvoForest: A Novel Machine-Learning Paradigm via Open-Ended Evolution of Computational Graphscs.AI0Open
Surrogates, Spikes, and Sparsity: Performance Analysis and Characterization of SNN Hyperparameters on Hardwarecs.AI0Open
LogSigma at SemEval-2026 Task 3: Uncertainty-Weighted Multitask Learning for Dimensional Aspect-Based Sentiment Analysiscs.AI0Open
Sovereign AI at the Front Door of Care: A Physically Unidirectional Architecture for Secure Clinical Intelligencecs.AI0Open
On the Foundations of Trustworthy Artificial Intelligencecs.AI0Open
Explaining, Verifying, and Aligning Semantic Hierarchies in Vision-Language Model Embeddingscs.AI0Open
Integrated Multi-Drone Task Allocation, Sequencing, and Optimal Trajectory Generation in Obstacle-Rich 3D Environmentscs.AI0Open
Shaping the Future of Mathematics in the Age of AIcs.AI0Open
LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metricscs.AI0Open
Decoding Market Emotions in Cryptocurrency Tweets via Predictive Statement Classification with Machine Learning and Transformerscs.AI0Open

๐Ÿข Lab Blog Posts