🔴 High Significance

Developer Tools

🔴 From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company — score 95 Sources: huggingface

Individual agent capabilities have advanced rapidly through modular skills and tool integrations, yet multi-agent systems remain constrained by fixed team structures, tightly coupled coordination logic, and session-bound learning. We argue that this reflects a deeper absence: a principled organisati

🔴 World-R1: Reinforcing 3D Constraints for Text-to-Video Generation — score 85 Sources: huggingface

Recent video foundation models demonstrate impressive visual synthesis but frequently suffer from geometric inconsistencies. While existing methods attempt to inject 3D priors via architectural modifications, they often incur high computational costs and limit scalability. We propose World-R1, a fra

Infrastructure & Compute

🔴 Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation — score 75 Sources: huggingface

Unified multimodal models typically rely on pretrained vision encoders and use separate visual representations for understanding and generation, creating misalignment between the two tasks and preventing fully end-to-end optimization from raw pixels. We introduce Tuna-2, a native unified multimodal

🟡 Notable

Model Releases

🟡 ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning — score 65 Sources: huggingface

Current evaluations of spatial intelligence can be systematically invalid under modern vision-language model (VLM) settings. First, many benchmarks derive question-answer (QA) pairs from point-cloud-based 3D annotations originally curated for traditional 3D perception. When such annotations are trea

🟡 Apr 28, 2026 Announcements Claude for Creative Work — score 50 Sources: lab_blog/Anthropic

Apr 27, 2026 Announcements Anthropic names Theo Hourmouzis General Manager of Australia & New Zealand and officially opens Sydney office Apr 24, 2026 Announcements An update on our election safeguards Apr 24, 2026 Announcements Anthropic and NEC collaborate to build Japan’s largest AI engineering wo

🟡 Apr 27, 2026 Announcements Anthropic names Theo Hourmouzis General Manager of Australia & New Zealand and officially opens Sydney office — score 50 Sources: lab_blog/Anthropic

Apr 28, 2026 Announcements Claude for Creative Work Apr 24, 2026 Announcements An update on our election safeguards Apr 24, 2026 Announcements Anthropic and NEC collaborate to build Japan’s largest AI engineering workforce Apr 20, 2026 Announcements Anthropic and Amazon expand collaboration for up

🟡 Apr 24, 2026 Announcements An update on our election safeguards — score 50 Sources: lab_blog/Anthropic

Apr 28, 2026 Announcements Claude for Creative Work Apr 27, 2026 Announcements Anthropic names Theo Hourmouzis General Manager of Australia & New Zealand and officially opens Sydney office Apr 24, 2026 Announcements Anthropic and NEC collaborate to build Japan’s largest AI engineering workforce Apr

🟡 Apr 24, 2026 Announcements Anthropic and NEC collaborate to build Japan’s largest AI engineering workforce — score 50 Sources: lab_blog/Anthropic

Apr 28, 2026 Announcements Claude for Creative Work Apr 27, 2026 Announcements Anthropic names Theo Hourmouzis General Manager of Australia & New Zealand and officially opens Sydney office Apr 24, 2026 Announcements An update on our election safeguards Apr 20, 2026 Announcements Anthropic and Amazo

Omitted 10 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

🟡 Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms — score 55 Sources: huggingface

Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety challenges, stemming from the embodied nature of VLA systems, including irreversible physical consequences, a multimodal attack surface across vision, language,

🟢 Incremental

Developer Tools

🟢 Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis — score 10 Sources: huggingface

Process Reward Models (PRMs) have achieved remarkable success in augmenting the reasoning capabilities of Large Language Models (LLMs) within static domains such as mathematics. However, their potential in dynamic data analysis tasks remains underexplored. In this work, we first present a empirical

🟢 Sapiens2 — score 10 Sources: huggingface

We present Sapiens2, a model family of high-resolution transformers for human-centric vision focused on generalization, versatility, and high-fidelity outputs. Our model sizes range from 0.4 to 5 billion parameters, with native 1K resolution and hierarchical variants that support 4K. Sapiens2 substa

Infrastructure & Compute

🟢 Why Fine-Tuning Encourages Hallucinations and How to Fix It — score 25 Sources: huggingface

Large language models are prone to hallucinating factually incorrect statements. A key source of these errors is exposure to new factual information through supervised fine-tuning (SFT), which can increase hallucinations w.r.t. knowledge acquired during pre-training. In this work, we explore whether

📄 New Papers

TitleCategoryScoreLink
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Companydeveloper_tool121Open
World-R1: Reinforcing 3D Constraints for Text-to-Video Generationdeveloper_tool117Open
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generationinfrastructure70Open
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoningmodel_release65Open
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanismsdeveloper_tool46Open
Evaluating Risks in Weak-to-Strong Alignment: A Bias-Variance Perspectivecs.AI0Open
Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimizationcs.AI0Open
Optimally Auditing Adversarial Agentscs.AI0Open
Cooperate to Compete: Strategic Coordination in Multi-Agent Conquestcs.AI0Open
Doing More With Less: Revisiting the Effectiveness of LLM Pruning for Test-Time Scalingcs.AI0Open
Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skillscs.AI0Open
Knowledge Distillation Must Account for What It Losescs.AI0Open
M$^3$-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answeringcs.AI0Open
Towards Unified Multi-task EEG Analysis with Low-Rank Adaptationcs.AI0Open
Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective Alignmentcs.AI0Open

🏢 Lab Blog Posts