AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Helios: Real Real-Time Long Video Generation Model — score 85 Sources: huggingface

We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We make breakthroughs along three key dimensions: (1) robustness to long-video drifting without commonly u

🔴 T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning — score 75 Sources: huggingface

Think about how human handles complex reading tasks: marking key points, inferring their relationships, and structuring information to guide understanding and responses. Likewise, can a large language model benefit from text structure to enhance text-processing performance? To explore it, in this wo

Developer Tools

🔴 Heterogeneous Agent Collaborative Reinforcement Learning — score 95 Sources: huggingface

We introduce Heterogeneous Agent Collaborative Reinforcement Learning (HACRL), a new learning paradigm that addresses the inefficiencies of isolated on-policy optimization. HACRL enables collaborative optimization with independent execution: heterogeneous agents share verified rollouts during traini

🟡 Notable

Model Releases

🟡 Introducing GPT-5.4 — score 50 Sources: lab_blog/OpenAI

Introducing GPT-5.4, OpenAI’s most most capable and efficient frontier model for professional work, with state-of-the-art coding, computer use, tool search, and 1M-token context.

🟡 GPT-5.4 Thinking System Card — score 50 Sources: lab_blog/OpenAI

🟡 Introducing the Adoption news channel — score 50 Sources: lab_blog/OpenAI

Practical insights and frameworks to turn AI progress into business advantage

🟡 Introducing ChatGPT for Excel and new financial data integrations — score 50 Sources: lab_blog/OpenAI

OpenAI introduces ChatGPT for Excel and new financial app integrations, powered by GPT-5.4 to accelerate modeling, research, and analysis in regulated environments.

🟡 VfL Wolfsburg turns ChatGPT into a club-wide capability — score 50 Sources: lab_blog/OpenAI

By focusing on people, not pilots, the Bundesliga club is scaling efficiency, creativity, and knowledge—without losing its football identity.

Developer Tools

🟡 Proact-VL: A Proactive VideoLLM for Real-Time AI Companions — score 65 Sources: huggingface

Proactive and real-time interactive experiences are essential for human-like AI companions, yet face three key challenges: (1) achieving low-latency inference under continuous streaming inputs, (2) autonomously deciding when to respond, and (3) controlling both quality and quantity of generated cont

🟡 MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning — score 55 Sources: huggingface

As Large Language Models (LLMs) are increasingly used for long-duration tasks, maintaining effective long-term memory has become a critical challenge. Current methods often face a trade-off between cost and accuracy. Simple storage methods often fail to retrieve relevant information, while complex i

🟡 ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors — score 45 Sources: huggingface

Synthesizing physically plausible articulated human-object interactions (HOI) without 3D/4D supervision remains a fundamental challenge. While recent zero-shot approaches leverage video diffusion models to synthesize human-object interactions, they are largely confined to rigid-object manipulation a

Research Papers

🟡 Reasoning Models Struggle to Control their Chains of Thought — score 60 Sources: arxiv/cs.AI · lab_blog/OpenAI

Chain-of-thought (CoT) monitoring is a promising tool for detecting misbehaviors and understanding the motivations of modern reasoning models. However, if models can control what they verbalize in their CoT, it could undermine CoT monitorability. To measure this undesirable capability -- CoT control

Other Signals

🟡 Ensuring AI use in education leads to opportunity — score 50 Sources: lab_blog/OpenAI

OpenAI shares new tools, certifications, and measurement resources to help schools and universities close AI capability gaps and expand opportunity.

🟡 The five AI value models driving business reinvention — score 50 Sources: lab_blog/OpenAI

Five AI value models show how leaders can sequence AI from workforce fluency to process reinvention and build durable business advantage.

🟢 Incremental

Model Releases

🟢 Phi-4-reasoning-vision-15B Technical Report — score 30 Sources: huggingface

We present Phi-4-reasoning-vision-15B, a compact open-weight multimodal reasoning model, and share the motivations, design choices, experiments, and learnings that informed its development. Our goal is to contribute practical insight to the research community on building smaller, efficient multimoda

Developer Tools

🟢 Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory — score 30 Sources: huggingface

Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool outputs and intermediate reasoning in-context quickly becomes infeasible: the working context becomes prohibitively long, eventually exceeds the cont

🟢 Specificity-aware reinforcement learning for fine-grained open-world classification — score 15 Sources: huggingface

Classifying fine-grained visual concepts under open-world settings, i.e., without a predefined label set, demands models to be both accurate and specific. Recent reasoning Large Multimodal Models (LMMs) exhibit strong visual understanding capability but tend to produce overly generic predictions whe

🟢 V_1: Unifying Generation and Self-Verification for Parallel Reasoners — score 5 Sources: huggingface

Test-time scaling for complex reasoning tasks shows that leveraging inference-time compute, by methods such as independently sampling and aggregating multiple solutions, results in significantly better task outcomes. However, a critical bottleneck is verification: sampling is only effective if corre

📄 New Papers

Title	Category	Score	Link
Heterogeneous Agent Collaborative Reinforcement Learning	developer_tool	202	Open
Helios: Real Real-Time Long Video Generation Model	model_release	192	Open
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning	model_release	125	Open
Proact-VL: A Proactive VideoLLM for Real-Time AI Companions	developer_tool	45	Open
Reasoning Models Struggle to Control their Chains of Thought	cs.AI	100	Open
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning	developer_tool	35	Open
Why the Brain Consolidates: Predictive Forgetting for Optimal Generalisation	cs.AI	0	Open
Spatial Competence Benchmark	cs.AI	0	Open
Hate Speech Detection using Large Language Models with Data Augmentation and Feature Enhancement	cs.AI	0	Open
Detection of Illicit Content on Online Marketplaces using Large Language Models	cs.AI	0	Open
When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper	cs.AI	0	Open
Probabilistic Dreaming for World Models	cs.AI	0	Open
AI-Assisted Moot Courts: Simulating Justice-Specific Questioning in Oral Arguments	cs.AI	0	Open
Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating AI Models	cs.AI	0	Open
From Offline to Periodic Adaptation for Pose-Based Shoplifting Detection in Real-world Retail Security	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: Introducing GPT-5.4
OpenAI: GPT-5.4 Thinking System Card
OpenAI: Ensuring AI use in education leads to opportunity
OpenAI: Introducing the Adoption news channel
OpenAI: Introducing ChatGPT for Excel and new financial data integrations
OpenAI: The five AI value models driving business reinvention
OpenAI: VfL Wolfsburg turns ChatGPT into a club-wide capability

AI Watchtower Briefing — 2026-03-05

🔴 High Significance

Model Releases

Developer Tools

🟡 Notable

Model Releases

Developer Tools

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

📄 New Papers

🏢 Lab Blog Posts