πŸ”΄ High Significance

Developer Tools

πŸ”΄ SLA2: Sparse-Linear Attention with Learnable Routing and QAT β€” score 95 Sources: huggingface

Sparse-Linear Attention (SLA) combines sparse and linear attention to accelerate diffusion models and has shown strong performance in video generation. However, (i) SLA relies on a heuristic split that assigns computations to the sparse or linear branch based on attention-weight magnitude, which can

πŸ”΄ AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines β€” score 85 Sources: huggingface

The performance of autonomous Web GUI agents heavily relies on the quality and quantity of their training data. However, a fundamental bottleneck persists: collecting interaction trajectories from real-world websites is expensive and difficult to verify. The underlying state transitions are hidden,

πŸ”΄ RynnBrain: Open Embodied Foundation Models β€” score 75 Sources: huggingface

Despite rapid progress in multimodal foundation models, embodied intelligence community still lacks a unified, physically grounded foundation model that integrates perception, reasoning, and planning within real-world spatial-temporal dynamics. We introduce RynnBrain, an open-source spatiotemporal f

🟑 Notable

Model Releases

🟑 Gemini 3.1 Pro: A smarter model for your most complex tasks β€” score 50 Sources: lab_blog/DeepMind

3.1 Pro is designed for tasks where a simple answer isn’t enough.

Developer Tools

🟑 CADEvolve: Creating Realistic CAD via Program Evolution β€” score 65 Sources: huggingface

Computer-Aided Design (CAD) delivers rapid, editable modeling for engineering and manufacturing. Recent AI progress now makes full automation feasible for various CAD tasks. However, progress is bottlenecked by data: public corpora mostly contain sketch-extrude sequences, lack complex operations, mu

🟑 Multi-agent cooperation through in-context co-player inference β€” score 45 Sources: huggingface

Achieving cooperation among self-interested agents remains a fundamental challenge in multi-agent reinforcement learning. Recent work showed that mutual cooperation can be induced between "learning-aware" agents that account for and shape the learning dynamics of their co-players. However, existing

Infrastructure & Compute

🟑 Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation β€” score 55 Sources: huggingface

Visual loco-manipulation of arbitrary objects in the wild with humanoid robots requires accurate end-effector (EE) control and a generalizable understanding of the scene via visual inputs (e.g., RGB-D images). Existing approaches are based on real-world imitation learning and exhibit limited general

Business & Funding

🟑 Advancing independent research on AI alignment β€” score 50 Sources: lab_blog/OpenAI

OpenAI commits $7.5M to The Alignment Project to fund independent AI alignment research, strengthening global efforts to address AGI safety and security risks.

🟒 Incremental

Model Releases

🟒 MAEB: Massive Audio Embedding Benchmark β€” score 35 Sources: huggingface

We introduce the Massive Audio Embedding Benchmark (MAEB), a large-scale benchmark covering 30 tasks across speech, music, environmental sounds, and cross-modal audio-text reasoning in 100+ languages. We evaluate 50+ models and find that no single model dominates across all tasks: contrastive audio-

🟒 Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality β€” score 25 Sources: huggingface

Standard factuality evaluations of LLMs treat all errors alike, obscuring whether failures arise from missing knowledge (empty shelves) or from limited access to encoded facts (lost keys). We propose a behavioral framework that profiles factual knowledge at the level of facts rather than questions,

Developer Tools

🟒 World Action Models are Zero-shot Policies β€” score 15 Sources: huggingface

State-of-the-art Vision-Language-Action (VLA) models excel at semantic generalization but struggle to generalize to unseen physical motions in novel environments. We introduce DreamZero, a World Action Model (WAM) built upon a pretrained video diffusion backbone. Unlike VLAs, WAMs learn physical dyn

🟒 Towards a Science of AI Agent Reliability β€” score 5 Sources: huggingface

AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice. This discrepancy highlights a fundamental limitation of current evaluations: compressing agent behavior into a s

πŸ“„ New Papers

TitleCategoryScoreLink
SLA2: Sparse-Linear Attention with Learnable Routing and QATdeveloper_tool63Open
AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machinesdeveloper_tool53Open
RynnBrain: Open Embodied Foundation Modelsdeveloper_tool49Open
CADEvolve: Creating Realistic CAD via Program Evolutiondeveloper_tool32Open
Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulationinfrastructure29Open
A Unified Framework for Locality in Scalable MARLcs.AI0Open
Early-Warning Signals of Grokking via Loss-Landscape Geometrycs.AI0Open
DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformerscs.AI0Open
HQFS: Hybrid Quantum Classical Financial Security with VQC Forecasting, QUBO Annealing, and Audit-Ready Post-Quantum Signingcs.AI0Open
Fundamental Limits of Black-Box Safety Evaluation: Information-Theoretic and Computational Barriers from Latent Context Conditioningcs.AI0Open
Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendationcs.AI0Open
Exploring LLMs for User Story Extraction from Mockupscs.AI0Open
Sonar-TS: Search-Then-Verify Natural Language Querying for Time Series Databasescs.AI0Open
Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User Historycs.AI0Open
OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Explorationcs.AI0Open

🏒 Lab Blog Posts