AW · AI Watchtower

🔴 High Significance

Developer Tools

🔴 Unified Latents (UL): How to train your latents — score 95 Sources: huggingface

We present Unified Latents (UL), a framework for learning latent representations that are jointly regularized by a diffusion prior and decoded by a diffusion model. By linking the encoder's output noise to the prior's minimum noise level, we obtain a simple training objective that provides a tight u

🔴 Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents — score 85 Sources: huggingface

The paper introduces GUI-Owl-1.5, the latest native GUI agent model that features instruct/thinking variants in multiple sizes (2B/4B/8B/32B/235B) and supports a range of platforms (desktop, mobile, browser, and more) to enable cloud-edge collaboration and real-time interaction. GUI-Owl-1.5 achieves

🔴 SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning — score 75 Sources: huggingface

Many training-free sparse attention methods are effective for accelerating diffusion models. Recently, several works suggest that making sparse attention trainable can further increase sparsity while preserving generation quality. We study three key questions: (1) when do the two common masking rule

🟡 Notable

Model Releases

🟡 Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5 — score 65 Sources: huggingface

To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, Frontier AI Risk Management Framework in Practice presents a comprehensive assessment of their frontier risks. As Large Language Models (LLMs) general capabilities rapidly evolve and th

Developer Tools

🟡 Computer-Using World Model — score 50 Sources: huggingface

Agents operating in complex software environments benefit from reasoning about the consequences of their actions, as even a single incorrect user interface (UI) operation can derail long, artifact-preserving workflows. This challenge is particularly acute for computer-using scenarios, where real exe

Infrastructure & Compute

🟡 Arcee Trinity Large Technical Report — score 50 Sources: huggingface

We present the technical report for Arcee Trinity Large, a sparse Mixture-of-Experts model with 400B total parameters and 13B activated per token. Additionally, we report on Trinity Nano and Trinity Mini, with Trinity Nano having 6B total parameters with 1B activated per token, Trinity Mini having 2

Other Signals

🟡 Our First Proof submissions — score 50 Sources: lab_blog/OpenAI

We share our AI model’s proof attempts for the First Proof math challenge, testing research-grade reasoning on expert-level problems.

🟢 Incremental

Developer Tools

🟢 Discovering Multiagent Learning Algorithms with Large Language Models — score 35 Sources: huggingface

Much of the advancement of Multi-Agent Reinforcement Learning (MARL) in imperfect-information games has historically depended on manual iterative refinement of baselines. While foundational families like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO) rest on solid

🟢 Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents — score 25 Sources: huggingface

LLMs are increasingly being used for complex problems which are not necessarily resolved in a single response, but require interacting with an environment to acquire information. In these scenarios, LLMs must reason about inherent cost-uncertainty tradeoffs in when to stop exploring and commit to an

🟢 "What Are You Doing?": Effects of Intermediate Feedback from Agentic LLM In-Car Assistants During Multi-Step Processing — score 15 Sources: huggingface

Agentic AI assistants that autonomously perform multi-step tasks raise open questions for user experience: how should such systems communicate progress and reasoning during extended operations, especially in attention-critical contexts such as driving? We investigate feedback timing and verbosity fr

🟢 DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers — score 5 Sources: huggingface

Diffusion Transformers (DiTs) have achieved state-of-the-art performance in image and video generation, but their success comes at the cost of heavy computation. This inefficiency is largely due to the fixed tokenization process, which uses constant-sized patches throughout the entire denoising phas

📄 New Papers

Title	Category	Score	Link
Unified Latents (UL): How to train your latents	developer_tool	64	Open
Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents	developer_tool	54	Open
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning	developer_tool	50	Open
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5	model_release	33	Open
Arcee Trinity Large Technical Report	infrastructure	22	Open
Computer-Using World Model	developer_tool	22	Open
Games That Teach, Chats That Convince: Comparing Interactive and Static Formats for Persuasive Learning	cs.AI	0	Open
Improving Neural Topic Modeling with Semantically-Grounded Soft Label Distributions	cs.AI	0	Open
Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems	cs.AI	0	Open
Condition-Gated Reasoning for Context-Dependent Biomedical Question Answering	cs.AI	0	Open
From Lossy to Verified: A Provenance-Aware Tiered Memory for Agents	cs.AI	0	Open
MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance	cs.AI	0	Open
Memory-Based Advantage Shaping for LLM-Guided Reinforcement Learning	cs.AI	0	Open
Causal Neighbourhood Learning for Invariant Graph Representations	cs.AI	0	Open
Optimizing Graph Causal Classification Models: Estimating Causal Effects and Addressing Confounders	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: Our First Proof submissions

AI Watchtower Briefing — 2026-02-20

🔴 High Significance

Developer Tools

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Other Signals

🟢 Incremental

Developer Tools

📄 New Papers

🏢 Lab Blog Posts