AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills — score 75 Sources: huggingface

Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow paramet

Developer Tools

🔴 ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling — score 95 Sources: huggingface

Multi-shot video generation is crucial for long narrative storytelling, yet current bidirectional architectures suffer from limited interactivity and high latency. We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame gener

🔴 Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models — score 85 Sources: huggingface

Video world models have shown immense potential in simulating the physical world, yet existing memory mechanisms primarily treat environments as static canvases. When dynamic subjects hide out of sight and later re-emerge, current methods often struggle, leading to frozen, distorted, or vanishing su

🟡 Notable

Model Releases

🟡 MedOpenClaw: Auditable Medical Imaging Agents Reasoning over Uncurated Full Studies — score 45 Sources: huggingface

Currently, evaluating vision-language models (VLMs) in medical imaging tasks oversimplifies clinical reality by relying on pre-selected 2D images that demand significant manual labor to curate. This setup misses the core challenge of realworld diagnostics: a true clinical agent must actively navigat

Developer Tools

🟡 PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference — score 65 Sources: huggingface

Autoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that eff

🟡 Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models — score 55 Sources: huggingface

As the paradigm of AI shifts from text-based LLMs to Speech Language Models (SLMs), there is a growing demand for full-duplex systems capable of real-time, natural human-computer interaction. However, the development of such models is constrained by the scarcity of high-quality, multi-speaker conver

🟢 Incremental

Model Releases

🟢 RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation — score 35 Sources: huggingface

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To address this gap, we introduce \texttt{RealChart2Code}, a new

Developer Tools

🟢 Natural-Language Agent Harnesses — score 25 Sources: huggingface

Agent performance increasingly depends on harness engineering, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific object. We ask whether the high-level control logic of an agent harness can instead

🟢 Know3D: Prompting 3D Generation with Knowledge from Vision-Language Models — score 10 Sources: huggingface

Recent advances in 3D generation have improved the fidelity and geometric details of synthesized 3D assets. However, due to the inherent ambiguity of single-view observations and the lack of robust global structural priors caused by limited 3D training data, the unseen regions generated by existing

🟢 LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset — score 10 Sources: huggingface

In real-world domains such as self-driving, generalization to rare scenarios remains a fundamental challenge. To address this, we introduce a new dataset designed for end-to-end driving that focuses on long-tail driving events. We provide multi-view video data, trajectories, high-level instructions,

📄 New Papers

Title	Category	Score	Link
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling	developer_tool	161	Open
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models	developer_tool	160	Open
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills	model_release	66	Open
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference	developer_tool	56	Open
Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models	developer_tool	38	Open
ITQ3_S: High-Fidelity 3-bit LLM Inference via Interleaved Ternary Quantization with Rotation-Domain Smoothing	cs.AI	0	Open
Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey	cs.AI	0	Open
A Learning-Based Cooperative Coevolution Framework for Heterogeneous Large-Scale Global Optimization	cs.AI	0	Open
Beyond Message Passing: A Semantic View of Agent Communication Protocols	cs.AI	0	Open
GEAKG: Generative Executable Algorithm Knowledge Graphs	cs.AI	0	Open
Physics-Guided Transformer (PGT): Physics-Aware Attention Mechanism for PINNs	cs.AI	0	Open
JaWildText: A Benchmark for Vision-Language Models on Japanese Scene Text Understanding	cs.AI	0	Open
CSAttention: Centroid-Scoring Attention for Accelerating LLM Inference	cs.AI	0	Open
CARV: A Diagnostic Benchmark for Compositional Analogical Reasoning in Multimodal LLMs	cs.AI	0	Open
SARL: Label-Free Reinforcement Learning by Rewarding Reasoning Topology	cs.AI	0	Open

AI Watchtower Briefing — 2026-03-30

🔴 High Significance

Model Releases

Developer Tools

🟡 Notable

Model Releases

Developer Tools

🟢 Incremental

Model Releases

Developer Tools

📄 New Papers