AW · AI Watchtower

🔴 High Significance

Developer Tools

🔴 Demystifing Video Reasoning — score 95 Sources: huggingface

Recent advances in video generation have revealed an unexpected phenomenon: diffusion-based video models exhibit non-trivial reasoning capabilities. Prior work attributes this to a Chain-of-Frames (CoF) mechanism, where reasoning is assumed to unfold sequentially across video frames. In this work, w

🔴 InCoder-32B: Code Foundation Model for Industrial Scenarios — score 85 Sources: huggingface

Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To addre

🔴 SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models — score 75 Sources: huggingface

Omni-modal large language models (OLMs) redefine human-machine interaction by natively integrating audio, vision, and text. However, existing OLM benchmarks remain anchored to static, accuracy-centric tasks, leaving a critical gap in assessing social interactivity, the fundamental capacity to naviga

🟡 Notable

Model Releases

🟡 MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification — score 65 Sources: huggingface

We present MiroThinker-1.7, a new research agent designed for complex long-horizon reasoning tasks. Building on this foundation, we further introduce MiroThinker-H1, which extends the agent with heavy-duty reasoning capabilities for more reliable multi-step problem solving. In particular, MiroThinke

🟡 Qianfan-OCR: A Unified End-to-End Model for Document Intelligence — score 55 Sources: huggingface

We present Qianfan-OCR, a 4B-parameter end-to-end vision-language model that unifies document parsing, layout analysis, and document understanding within a single architecture. It performs direct image-to-Markdown conversion and supports diverse prompt-driven tasks including table extraction, chart

Developer Tools

🟡 Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding — score 45 Sources: huggingface

Recent advancements in multimodal large reasoning models (MLRMs) have significantly improved performance in visual question answering. However, we observe that transition words (e.g., because, however, and wait) are closely associated with hallucinations and tend to exhibit high-entropy states. We a

🟢 Incremental

Developer Tools

🟢 Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation — score 35 Sources: huggingface

Simulating robot-world interactions is a cornerstone of Embodied AI. Recently, a few works have shown promise in leveraging video generations to transcend the rigid visual/physical constraints of traditional simulators. However, they primarily operate in 2D space or are guided by static environmenta

🟢 WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation — score 20 Sources: huggingface

Recent advances in video diffusion transformers have enabled interactive gaming world models that allow users to explore generated environments over extended horizons. However, existing approaches struggle with precise action control and long-horizon 3D consistency. Most prior works treat user actio

🟢 Online Experiential Learning for Language Models — score 20 Sources: huggingface

The prevailing paradigm for improving large language models relies on offline training with human annotations or simulated environments, leaving the rich experience accumulated during real-world deployment entirely unexploited. We propose Online Experiential Learning (OEL), a framework that enables

🟢 TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas — score 5 Sources: huggingface

Text-to-SQL parsing has achieved remarkable progress under the Full Schema Assumption. However, this premise fails in real-world enterprise environments where databases contain hundreds of tables with massive noisy metadata. Rather than injecting the full schema upfront, an agent must actively ident

📄 New Papers

Title	Category	Score	Link
Demystifing Video Reasoning	developer_tool	378	Open
InCoder-32B: Code Foundation Model for Industrial Scenarios	developer_tool	315	Open
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models	developer_tool	250	Open
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification	model_release	189	Open
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence	model_release	158	Open
From Drop-off to Recovery: A Mechanistic Analysis of Segmentation in MLLMs	cs.AI	0	Open
KANtize: Exploring Low-bit Quantization of Kolmogorov-Arnold Networks for Efficient Inference	cs.AI	0	Open
Draft-and-Prune: Improving the Reliability of Auto-formalization for Logical Reasoning	cs.AI	0	Open
Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models	cs.AI	0	Open
Deployment and Evaluation of an EHR-integrated, Large Language Model-Powered Tool to Triage Surgical Patients	cs.AI	0	Open
Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architectures	cs.AI	0	Open
Pathology-Aware Multi-View Contrastive Learning for Patient-Independent ECG Reconstruction	cs.AI	0	Open
On the Fragility of AI Agent Collusion	cs.AI	0	Open
DANCE: Dynamic 3D CNN Pruning: Joint Frame, Channel, and Feature Adaptation for Energy Efficiency on the Edge	cs.AI	0	Open
CentaurTA Studio: A Self-Improving Human-Agent Collaboration System for Thematic Analysis	cs.AI	0	Open

AI Watchtower Briefing — 2026-03-18

🔴 High Significance

Developer Tools

🟡 Notable

Model Releases

Developer Tools

🟢 Incremental

Developer Tools

📄 New Papers