AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Kimi K2.5: Visual Agentic Intelligence — score 85 Sources: huggingface

We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. K2.5 emphasizes the joint optimization of text and vision so that two modalities enhance each other. This includes a series of techniques such as joint text-vision pre-training, zero-vis

🔴 Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models — score 75 Sources: huggingface

Multimodal large language models (MLLMs) have achieved remarkable success across a broad range of vision tasks. However, constrained by the capacity of their internal world knowledge, prior work has proposed augmenting MLLMs by ``reasoning-then-tool-call'' for visual and textual search engines to ob

Developer Tools

🔴 Green-VLA: Staged Vision-Language-Action Model for Generalist Robots — score 95 Sources: huggingface

We introduce Green-VLA, a staged Vision-Language-Action (VLA) framework for real-world deployment on the Green humanoid robot while maintaining generalization across diverse embodiments. Green-VLA follows a five stage curriculum: (L0) foundational VLMs, (L1) multimodal grounding, (R0) multi-embodime

🟡 Notable

Model Releases

🟡 Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models — score 65 Sources: huggingface

Multimodal Large Language Models (MLLMs) have advanced VQA and now support Vision-DeepResearch systems that use search engines for complex visual-textual fact-finding. However, evaluating these visual and textual search abilities is still difficult, and existing benchmarks have two major limitations

🟡 The Sora feed philosophy — score 50 Sources: lab_blog/OpenAI

Discover the Sora feed philosophy—built to spark creativity, foster connections, and keep experiences safe with personalized recommendations, parental controls, and strong guardrails.

Developer Tools

🟡 Closing the Loop: Universal Repository Representation with RPG-Encoder — score 55 Sources: huggingface

Current repository agents encounter a reasoning disconnect due to fragmented representations, as existing methods rely on isolated API documentation or dependency graphs that lack semantic depth. We consider repository comprehension and generation to be inverse processes within a unified cycle: gene

🟡 UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing — score 45 Sources: huggingface

Unified multimodal models often struggle with complex synthesis tasks that demand deep reasoning, and typically treat text-to-image generation and image editing as isolated capabilities rather than interconnected reasoning steps. To address this, we propose UniReason, a unified framework that harmon

🟢 Incremental

Model Releases

🟢 SWE-Universe: Scale Real-World Verifiable Environments to Millions — score 35 Sources: huggingface

We propose SWE-Universe, a scalable and efficient framework for automatically constructing real-world software engineering (SWE) verifiable environments from GitHub pull requests (PRs). To overcome the prevalent challenges of automatic building, such as low production yield, weak verifiers, and proh

Developer Tools

🟢 FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents — score 25 Sources: huggingface

Deep research is emerging as a representative long-horizon task for large language model (LLM) agents. However, long trajectories in deep research often exceed model context limits, compressing token budgets for both evidence collection and report writing, and preventing effective test-time scaling.

🟢 SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning — score 15 Sources: huggingface

Progressive Learning (PL) reduces pre-training computational overhead by gradually increasing model scale. While prior work has extensively explored depth expansion, width expansion remains significantly understudied, with the few existing methods limited to the early stages of training. However, ex

🟢 PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss — score 5 Sources: huggingface

Pixel diffusion generates images directly in pixel space in an end-to-end manner, avoiding the artifacts and bottlenecks introduced by VAEs in two-stage latent diffusion. However, it is challenging to optimize high-dimensional pixel manifolds that contain many perceptually irrelevant signals, leavin

📄 New Papers

Title	Category	Score	Link
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots	developer_tool	332	Open
Kimi K2.5: Visual Agentic Intelligence	model_release	273	Open
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models	model_release	160	Open
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models	model_release	121	Open
Closing the Loop: Universal Repository Representation with RPG-Encoder	developer_tool	87	Open
RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly Detection	cs.AI	0	Open
Equal Access, Unequal Interaction: A Counterfactual Audit of LLM Fairness	cs.AI	0	Open
Nüwa: Mending the Spatial Integrity Torn by VLM Token Pruning	cs.AI	0	Open
UAT-LITE: Inference-Time Uncertainty-Aware Attention for Pretrained Transformers	cs.AI	0	Open
Synthetic Data Augmentation for Medical Audio Classification: A Preliminary Evaluation	cs.AI	0	Open
Embodiment-Aware Generalist Specialist Distillation for Unified Humanoid Whole-Body Control	cs.AI	0	Open
Generative Engine Optimization: A VLM and Agent Framework for Pinterest Acquisition Growth	cs.AI	0	Open
DDL2PropBank Agent: Benchmarking Multi-Agent Frameworks' Developer Experience Through a Novel Relational Schema Mapping Task	cs.AI	0	Open
Where Norms and References Collide: Evaluating LLMs on Normative Reasoning	cs.AI	0	Open
Aligning Forest and Trees in Images and Long Captions for Visually Grounded Understanding	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: The Sora feed philosophy

AI Watchtower Briefing — 2026-02-03

🔴 High Significance

Model Releases

Developer Tools

🟡 Notable

Model Releases

Developer Tools

🟢 Incremental

Model Releases

Developer Tools

📄 New Papers

🏢 Lab Blog Posts