AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration — score 95 Sources: huggingface

As high-quality public text approaches exhaustion, a phenomenon known as the Data Wall, pre-training is shifting from more tokens to better tokens. However, existing methods either rely on heuristic static filters that ignore training dynamics, or use dynamic yet optimizer-agnostic criteria based on

🔴 Code2World: A GUI World Model via Renderable Code Generation — score 85 Sources: huggingface

Autonomous GUI agents interact with environments by perceiving interfaces and executing actions. As a virtual sandbox, the GUI World model empowers agents with human-like foresight by enabling action-conditioned prediction. However, existing text- and pixel-based approaches struggle to simultaneousl

Developer Tools

🔴 UI-Venus-1.5 Technical Report — score 75 Sources: huggingface

GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world

🟡 Notable

Model Releases

🟡 Chain of Mindset: Reasoning with Adaptive Cognitive Modes — score 55 Sources: huggingface

Human problem-solving is never the repetition of a single mindset, by which we mean a distinct mode of cognitive processing. When tackling a specific task, we do not rely on a single mindset; instead, we integrate multiple mindsets within the single solution process. However, existing LLM reasoning

🟡 P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads — score 45 Sources: huggingface

The transition from symbolic manipulation to science-grade reasoning represents a pivotal frontier for Large Language Models (LLMs), with physics serving as the critical test anchor for binding abstract logic to physical reality. Physics demands that a model maintain physical consistency with the la

Developer Tools

🟡 SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning — score 65 Sources: huggingface

Large Language Model (LLM) agents have shown stunning results in complex tasks, yet they often operate in isolation, failing to learn from past experiences. Existing memory-based methods primarily store raw trajectories, which are often redundant and noise-heavy. This prevents agents from extracting

🟡 Harness engineering: leveraging Codex in an agent-first world — score 50 Sources: lab_blog/OpenAI

By Ryan Lopopolo, Member of the Technical Staff

🟢 Incremental

Developer Tools

🟢 Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning — score 35 Sources: huggingface

Recent advances in large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments. However, scaling such agent training is limited by the lack of diverse and reliable environments. In this paper, we propose Agent

🟢 Prism: Spectral-Aware Block-Sparse Attention — score 25 Sources: huggingface

Block-sparse attention is promising for accelerating long-context LLM pre-filling, yet identifying relevant blocks efficiently remains a bottleneck. Existing methods typically employ coarse-grained attention as a proxy for block importance estimation, but often resort to expensive token-level search

🟢 Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling — score 15 Sources: huggingface

We study instruction-based image editing under professional workflows and identify three persistent challenges: (i) editors often over-edit, modifying content beyond the user's intent; (ii) existing models are largely single-turn, while multi-turn edits can alter object faithfulness; and (iii) evalu

🟢 DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents — score 5 Sources: huggingface

Recently, Diffusion Large Language Models (dLLMs) have demonstrated unique efficiency advantages, enabled by their inherently parallel decoding mechanism and flexible generation paradigm. Meanwhile, despite the rapid advancement of Search Agents, their practical deployment is constrained by a fundam

📄 New Papers

Title	Category	Score	Link
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration	model_release	355	Open
Code2World: A GUI World Model via Renderable Code Generation	model_release	205	Open
UI-Venus-1.5 Technical Report	developer_tool	161	Open
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning	developer_tool	78	Open
Chain of Mindset: Reasoning with Adaptive Cognitive Modes	model_release	77	Open
The Alignment Bottleneck in Decomposition-Based Claim Verification	cs.AI	0	Open
Capture Timing-Attention of Events in Clinical Time Series	cs.AI	0	Open
Making Databases Faster with LLM Evolutionary Sampling	cs.AI	0	Open
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs	cs.AI	0	Open
Affordances Enable Partial World Modeling with LLMs	cs.AI	0	Open
Modular Multi-Task Learning for Chemical Reaction Prediction	cs.AI	0	Open
AI-rithmetic	cs.AI	0	Open
Equivariant Evidential Deep Learning for Interatomic Potentials	cs.AI	0	Open
AIvilization v0: Toward Large-Scale Artificial Social Simulation with a Unified Agent Architecture and Adaptive Agent Profiles	cs.AI	0	Open
Breaking the Curse of Repulsion: Optimistic Distributionally Robust Policy Optimization for Off-Policy Generative Recommendation	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: Harness engineering: leveraging Codex in an agent-first world

AI Watchtower Briefing — 2026-02-11

🔴 High Significance

Model Releases

Developer Tools

🟡 Notable

Model Releases

Developer Tools

🟢 Incremental

Developer Tools

📄 New Papers

🏢 Lab Blog Posts