๐ด High Significance
Model Releases
๐ด CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents โ score 95
Sources: huggingface
Computer-use agents (CUAs) hold great promise for automating complex desktop workflows, yet progress toward general-purpose agents is bottlenecked by the scarcity of continuous, high-quality human demonstration videos. Recent work emphasizes that continuous video, not sparse screenshots, is the crit
๐ด Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? โ score 85
Sources: huggingface
Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening reasoning traces. However, in mathematical reasoning, we find that it can reduce response length while degrading performance. We trace this degradation to the suppression of ep
Developer Tools
๐ด UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience โ score 75
Sources: huggingface
Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI
๐ก Notable
Model Releases
๐ก T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search โ score 55
Sources: huggingface
While prior red-teaming efforts have focused on eliciting harmful text outputs from large language models (LLMs), such approaches fail to capture agent-specific vulnerabilities that emerge through multi-step tool execution, particularly in rapidly growing ecosystems such as the Model Context Protoco
๐ก Gemini 3.1 Flash Live: Making audio AI more natural and reliable โ score 50
Sources: lab_blog/DeepMind
Our latest voice model has improved precision and lower latency to make voice interactions more fluid, natural and precise.
Developer Tools
๐ก EVA: Efficient Reinforcement Learning for End-to-End Video Agent โ score 65
Sources: huggingface
Video understanding with multimodal large language models (MLLMs) remains challenging due to the long token sequences of videos, which contain extensive temporal dependencies and redundant frames. Existing approaches typically treat MLLMs as passive recognizers, processing entire videos or uniformly
๐ก When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning โ score 45
Sources: huggingface
Recent progress in multimodal large language models has led to strong performance on reasoning tasks, but these improvements largely rely on high-quality annotated data or teacher-model distillation, both of which are costly and difficult to scale. To address this, we propose an unsupervised self-ev
๐ข Incremental
Developer Tools
๐ข GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents โ score 30
Sources: huggingface
Multimodal LLMs are increasingly deployed as perceptual backbones for autonomous agents in 3D environments, from robotics to virtual worlds. These applications require agents to perceive rapid state changes, attribute actions to the correct entities, and reason about concurrent multi-agent behaviors
๐ข Understanding the Challenges in Iterative Generative Optimization with LLMs โ score 30
Sources: huggingface
Generative optimization uses large language models (LLMs) to iteratively improve artifacts (such as code, workflows or prompts) using execution feedback. It is a promising approach to building self-improving agents, yet in practice remains brittle: despite active research, only 9% of surveyed agents
๐ข The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics โ score 15
Sources: huggingface
While recent generative video models have achieved remarkable visual realism and are being explored as world models, true physical simulation requires mastering both space and time. Current models can produce visually smooth kinematics, yet they lack a reliable internal motion pulse to ground these
๐ข 4DGS360: 360ยฐ Gaussian Reconstruction of Dynamic Objects from a Single Video โ score 5
Sources: huggingface
We introduce 4DGS360, a diffusion-free framework for 360^{circ} dynamic object reconstruction from casual monocular video. Existing methods often fail to reconstruct consistent 360^{circ} geometry, as their heavy reliance on 2D-native priors causes initial points to overfit to visible surface in eac
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents | model_release | 103 | Open |
| Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? | model_release | 61 | Open |
| UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience | developer_tool | 51 | Open |
| EVA: Efficient Reinforcement Learning for End-to-End Video Agent | developer_tool | 48 | Open |
| T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search | model_release | 40 | Open |
| EvoForest: A Novel Machine-Learning Paradigm via Open-Ended Evolution of Computational Graphs | cs.AI | 0 | Open |
| Surrogates, Spikes, and Sparsity: Performance Analysis and Characterization of SNN Hyperparameters on Hardware | cs.AI | 0 | Open |
| LogSigma at SemEval-2026 Task 3: Uncertainty-Weighted Multitask Learning for Dimensional Aspect-Based Sentiment Analysis | cs.AI | 0 | Open |
| Sovereign AI at the Front Door of Care: A Physically Unidirectional Architecture for Secure Clinical Intelligence | cs.AI | 0 | Open |
| On the Foundations of Trustworthy Artificial Intelligence | cs.AI | 0 | Open |
| Explaining, Verifying, and Aligning Semantic Hierarchies in Vision-Language Model Embeddings | cs.AI | 0 | Open |
| Integrated Multi-Drone Task Allocation, Sequencing, and Optimal Trajectory Generation in Obstacle-Rich 3D Environments | cs.AI | 0 | Open |
| Shaping the Future of Mathematics in the Age of AI | cs.AI | 0 | Open |
| LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics | cs.AI | 0 | Open |
| Decoding Market Emotions in Cryptocurrency Tweets via Predictive Statement Classification with Machine Learning and Transformers | cs.AI | 0 | Open |