AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents — score 85 Sources: huggingface

Executing complex terminal tasks remains a significant challenge for open-weight LLMs, constrained by two fundamental limitations. First, high-fidelity, executable training environments are scarce: environments synthesized from real-world repositories are not diverse and scalable, while trajectories

🔴 QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining — score 75 Sources: huggingface

Financial markets are noisy and non-stationary, making alpha mining highly sensitive to noise in backtesting results and sudden market regime shifts. While recent agentic frameworks improve alpha mining automation, they often lack controllable multi-round search and reliable reuse of validated exper

Developer Tools

🔴 Weak-Driven Learning: How Weak Agents make Strong Agents Stronger — score 95 Sources: huggingface

As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing methods continue to reinforce target predictions, we find that informative s

🟡 Notable

Model Releases

🟡 MOVA: Towards Scalable and Synchronized Video-Audio Generation — score 65 Sources: huggingface

Audio is indispensable for real-world video, yet generation models have largely overlooked audio components. Current approaches to producing audio-visual content often rely on cascaded pipelines, which increase cost, accumulate errors, and degrade overall quality. While systems such as Veo 3 and Sor

Developer Tools

🟡 Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models — score 55 Sources: huggingface

Despite the success of multimodal contrastive learning in aligning visual and linguistic representations, a persistent geometric anomaly, the Modality Gap, remains: embeddings of distinct modalities expressing identical semantics occupy systematically offset regions. Prior approaches to bridge this

🟡 AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents — score 45 Sources: huggingface

LLM agents hold significant promise for advancing scientific research. To accelerate this progress, we introduce AIRS-Bench (the AI Research Science Benchmark), a suite of 20 tasks sourced from state-of-the-art machine learning papers. These tasks span diverse domains, including language modeling, m

🟢 Incremental

Developer Tools

🟢 InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery — score 35 Sources: huggingface

We introduce InternAgent-1.5, a unified system designed for end-to-end scientific discovery across computational and empirical domains. The system is built on a structured architecture composed of three coordinated subsystems for generation, verification, and evolution. These subsystems are supporte

🟢 LLaDA2.1: Speeding Up Text Diffusion via Token Editing — score 25 Sources: huggingface

While LLaDA2.0 showcased the scaling potential of 100B-level block-diffusion models and their inherent parallelization, the delicate equilibrium between decoding speed and generation quality has remained an elusive frontier. Today, we unveil LLaDA2.1, a paradigm shift designed to transcend this trad

🟢 Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning — score 15 Sources: huggingface

Current Vision-Language-Action (VLA) models rely on fixed computational depth, expending the same amount of compute on simple adjustments and complex multi-step manipulation. While Chain-of-Thought (CoT) prompting enables variable computation, it scales memory linearly and is ill-suited for continuo

🟢 RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI — score 5 Sources: huggingface

Online policy learning directly in the physical world is a promising yet challenging direction for embodied intelligence. Unlike simulation, real-world systems cannot be arbitrarily accelerated, cheaply reset, or massively replicated, which makes scalable data collection, heterogeneous deployment, a

📄 New Papers

Title	Category	Score	Link
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger	developer_tool	296	Open
TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents	model_release	212	Open
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining	model_release	193	Open
MOVA: Towards Scalable and Synchronized Video-Audio Generation	model_release	164	Open
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models	developer_tool	148	Open
X-Mark: Saliency-Guided Robust Dataset Ownership Verification for Medical Imaging	cs.AI	0	Open
Human Control Is the Anchor, Not the Answer: Early Divergence of Oversight in Agentic AI Communities	cs.AI	0	Open
Empowering Contrastive Federated Sequential Recommendation with LLMs	cs.AI	0	Open
Don't Shoot The Breeze: Topic Continuity Model Using Nonlinear Naive Bayes With Attention	cs.AI	0	Open
Clarifying Shampoo: Adapting Spectral Descent to Stochasticity and the Parameter Trajectory	cs.AI	0	Open
Learning to Select Like Humans: Explainable Active Learning for Medical Imaging	cs.AI	0	Open
A Deep Multi-Modal Method for Patient Wound Healing Assessment	cs.AI	0	Open
SnareNet: Flexible Repair Layers for Neural Networks with Hard Constraints	cs.AI	0	Open
GAFR-Net: A Graph Attention and Fuzzy-Rule Network for Interpretable Breast Cancer Image Classification	cs.AI	0	Open
Beyond Uniform Credit: Causal Credit Assignment for Policy Optimization	cs.AI	0	Open

AI Watchtower Briefing — 2026-02-10

🔴 High Significance

Model Releases

Developer Tools

🟡 Notable

Model Releases

Developer Tools

🟢 Incremental

Developer Tools

📄 New Papers