πŸ”΄ High Significance

Developer Tools

πŸ”΄ Unified Latents (UL): How to train your latents β€” score 95 Sources: huggingface

We present Unified Latents (UL), a framework for learning latent representations that are jointly regularized by a diffusion prior and decoded by a diffusion model. By linking the encoder's output noise to the prior's minimum noise level, we obtain a simple training objective that provides a tight u

πŸ”΄ Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents β€” score 85 Sources: huggingface

The paper introduces GUI-Owl-1.5, the latest native GUI agent model that features instruct/thinking variants in multiple sizes (2B/4B/8B/32B/235B) and supports a range of platforms (desktop, mobile, browser, and more) to enable cloud-edge collaboration and real-time interaction. GUI-Owl-1.5 achieves

πŸ”΄ SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning β€” score 75 Sources: huggingface

Many training-free sparse attention methods are effective for accelerating diffusion models. Recently, several works suggest that making sparse attention trainable can further increase sparsity while preserving generation quality. We study three key questions: (1) when do the two common masking rule

🟑 Notable

Model Releases

🟑 Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5 β€” score 65 Sources: huggingface

To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, Frontier AI Risk Management Framework in Practice presents a comprehensive assessment of their frontier risks. As Large Language Models (LLMs) general capabilities rapidly evolve and th

Developer Tools

🟑 Computer-Using World Model β€” score 50 Sources: huggingface

Agents operating in complex software environments benefit from reasoning about the consequences of their actions, as even a single incorrect user interface (UI) operation can derail long, artifact-preserving workflows. This challenge is particularly acute for computer-using scenarios, where real exe

Infrastructure & Compute

🟑 Arcee Trinity Large Technical Report β€” score 50 Sources: huggingface

We present the technical report for Arcee Trinity Large, a sparse Mixture-of-Experts model with 400B total parameters and 13B activated per token. Additionally, we report on Trinity Nano and Trinity Mini, with Trinity Nano having 6B total parameters with 1B activated per token, Trinity Mini having 2

Other Signals

🟑 Our First Proof submissions β€” score 50 Sources: lab_blog/OpenAI

We share our AI model’s proof attempts for the First Proof math challenge, testing research-grade reasoning on expert-level problems.

🟒 Incremental

Developer Tools

🟒 Discovering Multiagent Learning Algorithms with Large Language Models β€” score 35 Sources: huggingface

Much of the advancement of Multi-Agent Reinforcement Learning (MARL) in imperfect-information games has historically depended on manual iterative refinement of baselines. While foundational families like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO) rest on solid

🟒 Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents β€” score 25 Sources: huggingface

LLMs are increasingly being used for complex problems which are not necessarily resolved in a single response, but require interacting with an environment to acquire information. In these scenarios, LLMs must reason about inherent cost-uncertainty tradeoffs in when to stop exploring and commit to an

🟒 "What Are You Doing?": Effects of Intermediate Feedback from Agentic LLM In-Car Assistants During Multi-Step Processing β€” score 15 Sources: huggingface

Agentic AI assistants that autonomously perform multi-step tasks raise open questions for user experience: how should such systems communicate progress and reasoning during extended operations, especially in attention-critical contexts such as driving? We investigate feedback timing and verbosity fr

🟒 DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers β€” score 5 Sources: huggingface

Diffusion Transformers (DiTs) have achieved state-of-the-art performance in image and video generation, but their success comes at the cost of heavy computation. This inefficiency is largely due to the fixed tokenization process, which uses constant-sized patches throughout the entire denoising phas

πŸ“„ New Papers

TitleCategoryScoreLink
Unified Latents (UL): How to train your latentsdeveloper_tool64Open
Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agentsdeveloper_tool54Open
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuningdeveloper_tool50Open
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5model_release33Open
Arcee Trinity Large Technical Reportinfrastructure22Open
Computer-Using World Modeldeveloper_tool22Open
Games That Teach, Chats That Convince: Comparing Interactive and Static Formats for Persuasive Learningcs.AI0Open
Improving Neural Topic Modeling with Semantically-Grounded Soft Label Distributionscs.AI0Open
Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systemscs.AI0Open
Condition-Gated Reasoning for Context-Dependent Biomedical Question Answeringcs.AI0Open
From Lossy to Verified: A Provenance-Aware Tiered Memory for Agentscs.AI0Open
MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidancecs.AI0Open
Memory-Based Advantage Shaping for LLM-Guided Reinforcement Learningcs.AI0Open
Causal Neighbourhood Learning for Invariant Graph Representationscs.AI0Open
Optimizing Graph Causal Classification Models: Estimating Causal Effects and Addressing Confounderscs.AI0Open

🏒 Lab Blog Posts