AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing — score 85 Sources: huggingface

Current instruction-guided video editing models struggle to simultaneously balance precise semantic modifications with faithful motion preservation. While existing approaches rely on injecting explicit external priors (e.g., VLM features or structural conditions) to mitigate these issues, this relia

🔴 Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation — score 75 Sources: huggingface

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight

Developer Tools

🔴 Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding — score 95 Sources: huggingface

While Multimodal Large Language Models demonstrate impressive semantic capabilities, they often suffer from spatial blindness, struggling with fine-grained geometric reasoning and physical dynamics. Existing solutions typically rely on explicit 3D modalities or complex geometric scaffolding, which a

🟡 Notable

Model Releases

🟡 FASTER: Rethinking Real-Time Flow VLAs — score 55 Sources: huggingface

Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in act

Developer Tools

🟡 Memento-Skills: Let Agents Design Agents — score 55 Sources: huggingface

We introduce Memento-Skills, a generalist, continually-learnable LLM agent system that functions as an agent-designing agent: it autonomously constructs, adapts, and improves task-specific agents through experience. The system is built on a memory-based reinforcement learning framework with stateful

🟡 3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model — score 55 Sources: huggingface

Creating dynamic, view-consistent videos of customized subjects is highly sought after for a wide range of emerging applications, including immersive VR/AR, virtual production, and next-generation e-commerce. However, despite rapid progress in subject-driven video generation, existing methods predom

🟢 Incremental

Model Releases

🟢 F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World — score 5 Sources: huggingface

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on p

Developer Tools

🟢 Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer — score 35 Sources: huggingface

Prior motion generation largely follows two paradigms: continuous diffusion models that excel at kinematic control, and discrete token-based generators that are effective for semantic conditioning. To combine their strengths, we propose a three-stage framework comprising condition feature extraction

🟢 MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction — score 25 Sources: huggingface

Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articulation regression uns

🟢 Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens — score 15 Sources: huggingface

Visual generation with discrete tokens has gained significant attention as it enables a unified token prediction paradigm shared with language models, promising seamless multimodal architectures. However, current discrete generation methods remain limited to low-dimensional latent tokens (typically

📄 New Papers

Title	Category	Score	Link
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding	developer_tool	100	Open
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing	model_release	72	Open
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation	model_release	68	Open
FASTER: Rethinking Real-Time Flow VLAs	model_release	61	Open
Memento-Skills: Let Agents Design Agents	developer_tool	61	Open
3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model	developer_tool	61	Open
Leveraging Machine Learning Techniques to Investigate Media and Information Literacy Competence in Tackling Disinformation	cs.AI	0	Open
FDARxBench: Benchmarking Regulatory and Clinical Reasoning on FDA Generic Drug Assessment	cs.AI	0	Open
When Agents Disagree: The Selection Bottleneck in Multi-Agent LLM Pipelines	cs.AI	0	Open
Subspace Kernel Learning on Tensor Sequences	cs.AI	0	Open
AI in Work-Based Learning: Understanding the Purposes and Effects of Intelligent Tools Among Student Interns	cs.AI	0	Open
Plagiarism or Productivity? Students Moral Disengagement and Behavioral Intentions to Use ChatGPT in Academic Writing	cs.AI	0	Open
Probing the Latent World: Emergent Discrete Symbols and Physical Structure in Latent Representations	cs.AI	0	Open
Optimal Scalar Quantization for Matrix Multiplication: Closed-Form Density and Phase Transition	cs.AI	0	Open
Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Search	cs.AI	0	Open

AI Watchtower Briefing — 2026-03-20

🔴 High Significance

Model Releases

Developer Tools

🟡 Notable

Model Releases

Developer Tools

🟢 Incremental

Model Releases

Developer Tools

📄 New Papers