AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Multimodal OCR: Parse Anything from Documents — score 85 Sources: huggingface

We present Multimodal OCR (MOCR), a document parsing paradigm that jointly parses text and graphics into unified textual representations. Unlike conventional OCR systems that focus on text recognition and leave graphical regions as cropped pixels, our method, termed dots.mocr, treats visual elements

Developer Tools

🔴 LMEB: Long-horizon Memory Embedding Benchmark — score 95 Sources: huggingface

Memory embeddings are crucial for memory-augmented systems, such as OpenClaw, but their evaluation is underexplored in current text embedding benchmarks, which narrowly focus on traditional passage retrieval and fail to assess models' ability to handle long-horizon memory retrieval tasks involving f

🔴 Can Vision-Language Models Solve the Shell Game? — score 75 Sources: huggingface

Visual entity tracking is an innate cognitive ability in humans, yet it remains a critical bottleneck for Vision-Language Models (VLMs). This deficit is often obscured in existing video benchmarks by visual shortcuts. We introduce VET-Bench, a synthetic diagnostic testbed featuring visually identica

🟡 Notable

Model Releases

🟡 Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation — score 65 Sources: huggingface

A recent cutting-edge topic in multimodal modeling is to unify visual comprehension and generation within a single model. However, the two tasks demand mismatched decoding regimes and visual representations, making it non-trivial to jointly optimize within a shared feature space. In this work, we pr

🟡 OmniForcing: Unleashing Real-time Joint Audio-Visual Generation — score 55 Sources: huggingface

Recent joint audio-visual diffusion models achieve remarkable generation quality but suffer from high latency due to their bidirectional attention dependencies, hindering real-time applications. We propose OmniForcing, the first framework to distill an offline, dual-stream bidirectional diffusion mo

🟡 Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously — score 40 Sources: huggingface

Online Video Large Language Models (VideoLLMs) play a critical role in supporting responsive, real-time interaction. Existing methods focus on streaming perception, lacking a synchronized logical reasoning stream. However, directly applying test-time scaling methods incurs unacceptable response late

🟡 daVinci-Env: Open SWE Environment Synthesis at Scale — score 40 Sources: huggingface

Training capable software engineering (SWE) agents demands large-scale, executable, and verifiable environments that provide dynamic feedback loops for iterative code editing, test execution, and solution refinement. However, existing open-source datasets remain limited in scale and repository diver

Developer Tools

🟡 Why Codex Security Doesn’t Include a SAST Report — score 50 Sources: lab_blog/OpenAI

A deep dive into why Codex Security doesn’t rely on traditional SAST, instead using AI-driven constraint reasoning and validation to find real vulnerabilities with fewer false positives.

🟢 Incremental

Model Releases

🟢 Visual-ERM: Reward Modeling for Visual Equivalence — score 25 Sources: huggingface

Vision-to-code tasks require models to reconstruct structured visual inputs, such as charts, tables, and SVGs, into executable or structured representations with high visual fidelity. While recent Large Vision Language Models (LVLMs) achieve strong results via supervised fine-tuning, reinforcement l

Developer Tools

🟢 MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning — score 15 Sources: huggingface

Multimodal Large Language Models (MLLMs) are increasingly used to carry out visual workflows such as navigating GUIs, where the next step depends on verified visual compositional conditions (e.g., "if a permission dialog appears and the color of the interface is green, click Allow") and the process

🟢 Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents — score 5 Sources: huggingface

Test-time scaling has become a dominant paradigm for improving LLM agent reliability, yet current approaches treat compute as an abundant resource, allowing agents to exhaust token and tool budgets on redundant steps or dead-end trajectories. Existing budget-aware methods either require expensive fi

📄 New Papers

Title	Category	Score	Link
LMEB: Long-horizon Memory Embedding Benchmark	developer_tool	79	Open
Multimodal OCR: Parse Anything from Documents	model_release	49	Open
Can Vision-Language Models Solve the Shell Game?	developer_tool	42	Open
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation	model_release	41	Open
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation	model_release	35	Open
MVHOI: Bridge Multi-view Condition to Complex Human-Object Interaction Video Reenactment via 3D Foundation Model	cs.AI	0	Open
AgentTrace: Causal Graph Tracing for Root Cause Analysis in Deployed Multi-Agent Systems	cs.AI	0	Open
Applications of Intuitionistic Temporal Logic to Temporal Answer Set Programming	cs.AI	0	Open
Robust Building Damage Detection in Cross-Disaster Settings Using Domain Adaptation	cs.AI	0	Open
Beyond Local Code Optimization: Multi-Agent Reasoning for Software System Optimization	cs.AI	0	Open
AdapterTune: Zero-Initialized Low-Rank Adapters for Frozen Vision Transformers	cs.AI	0	Open
Transition Flow Matching	cs.AI	0	Open
GameUIAgent: An LLM-Powered Framework for Automated Game UI Design with Structured Intermediate Representation	cs.AI	0	Open
Loosely-Structured Software: Engineering Context, Structure, and Evolution Entropy in Runtime-Rewired Multi-Agent Systems	cs.AI	0	Open
Gauge-Equivariant Intrinsic Neural Operators for Geometry-Consistent Learning of Elliptic PDE Maps	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: Why Codex Security Doesn’t Include a SAST Report

AI Watchtower Briefing — 2026-03-16

🔴 High Significance

Model Releases

Developer Tools

🟡 Notable

Model Releases

Developer Tools

🟢 Incremental

Model Releases

Developer Tools

📄 New Papers

🏢 Lab Blog Posts