AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 VLANeXt: Recipes for Building Strong VLA Models — score 75 Sources: huggingface

Following the rise of large foundation models, Vision-Language-Action models (VLAs) emerged, leveraging strong visual and language understanding for general-purpose policy learning. Yet, the current VLA landscape remains fragmented and exploratory. Although many groups have proposed their own VLA mo

Developer Tools

🔴 A Very Big Video Reasoning Suite — score 95 Sources: huggingface

Rapid progress in video models has largely focused on visual quality, leaving their reasoning capabilities underexplored. Video reasoning grounds intelligence in spatiotemporally consistent visual environments that go beyond what text can naturally capture, enabling intuitive reasoning over spatiote

🔴 SkillOrchestra: Learning to Route Agents via Skill Transfer — score 85 Sources: huggingface

Compound AI systems promise capabilities beyond those of individual models, yet their success depends critically on effective orchestration. Existing routing approaches face two limitations: (1) input-level routers make coarse query-level decisions that ignore evolving task requirements; (2) RL-trai

🟡 Notable

Model Releases

🟡 TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics — score 45 Sources: huggingface

While Vision-Language-Action (VLA) models have seen rapid progress in pretraining, their advancement in Reinforcement Learning (RL) remains hampered by low sample efficiency and sparse rewards in real-world settings. Developing generalizable process reward models is essential for providing the fine-

Developer Tools

🟡 Agents of Chaos — score 65 Sources: huggingface

We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under b

🟡 ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation — score 55 Sources: huggingface

Sequential recommendation increasingly employs latent multi-step reasoning to enhance test-time computation. Despite empirical gains, existing approaches largely drive intermediate reasoning states via target-dominant objectives without imposing explicit feasibility constraints. This results in late

Other Signals

🟡 Arvind KC appointed Chief People Officer — score 50 Sources: lab_blog/OpenAI

OpenAI appoints Arvind KC as Chief People Officer to help scale the company, strengthen its culture, and lead how work evolves in the age of AI.

🟢 Incremental

Developer Tools

🟢 Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device — score 35 Sources: huggingface

Unified multimodal models can both understand and generate visual content within a single architecture. Existing models, however, remain data-hungry and too heavy for deployment on edge devices. We present Mobile-O, a compact vision-language-diffusion model that brings unified multimodal intelligenc

🟢 Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction — score 15 Sources: huggingface

We study the task of establishing object-level visual correspondence across different viewpoints in videos, focusing on the challenging egocentric-to-exocentric and exocentric-to-egocentric scenarios. We propose a simple yet effective framework based on conditional binary segmentation, where an obje

🟢 DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning — score 5 Sources: huggingface

Reinforcement learning with verifiers (RLVR) is a central paradigm for improving large language model (LLM) reasoning, yet existing methods often suffer from limited exploration. Policies tend to collapse onto a few reasoning patterns and prematurely stop deep exploration, while conventional entropy

Infrastructure & Compute

🟢 SimToolReal: An Object-Centric Policy for Zero-Shot Dexterous Tool Manipulation — score 25 Sources: huggingface

The ability to manipulate tools significantly expands the set of tasks a robot can perform. Yet, tool manipulation represents a challenging class of dexterity, requiring grasping thin objects, in-hand object rotations, and forceful interactions. Since collecting teleoperation data for these behavior

📄 New Papers

Title	Category	Score	Link
A Very Big Video Reasoning Suite	developer_tool	525	Open
SkillOrchestra: Learning to Route Agents via Skill Transfer	developer_tool	64	Open
VLANeXt: Recipes for Building Strong VLA Models	model_release	56	Open
Agents of Chaos	developer_tool	38	Open
ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation	developer_tool	33	Open
Can AI be a Teaching Partner? Evaluating ChatGPT, Gemini, and DeepSeek across Three Teaching Strategies	cs.AI	0	Open
Imputation of Unknown Missingness in Sparse Electronic Health Records	cs.AI	0	Open
Protein Language Models Diverge from Natural Language: Comparative Analysis and Improved Inference	cs.AI	0	Open
PreScience: A Benchmark for Forecasting Scientific Contributions	cs.AI	0	Open
ERA: Evidence-based Reliability Alignment for Honest Retrieval-Augmented Generation	cs.AI	0	Open
Elimination-compensation pruning for fully-connected neural networks	cs.AI	0	Open
VINA: Variational Invertible Neural Architectures	cs.AI	0	Open
Hybrid LLM-Embedded Dialogue Agents for Learner Reflection: Designing Responsive and Theory-Driven Interactions	cs.AI	0	Open
Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA	cs.AI	0	Open
KairosVL: Orchestrating Time Series and Semantics for Unified Reasoning	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: Arvind KC appointed Chief People Officer

AI Watchtower Briefing — 2026-02-24

🔴 High Significance

Model Releases

Developer Tools

🟡 Notable

Model Releases

Developer Tools

Other Signals

🟢 Incremental

Developer Tools

Infrastructure & Compute

📄 New Papers

🏢 Lab Blog Posts