π΄ High Significance
Developer Tools
π΄ SLA2: Sparse-Linear Attention with Learnable Routing and QAT β score 95
Sources: huggingface
Sparse-Linear Attention (SLA) combines sparse and linear attention to accelerate diffusion models and has shown strong performance in video generation. However, (i) SLA relies on a heuristic split that assigns computations to the sparse or linear branch based on attention-weight magnitude, which can
π΄ AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines β score 85
Sources: huggingface
The performance of autonomous Web GUI agents heavily relies on the quality and quantity of their training data. However, a fundamental bottleneck persists: collecting interaction trajectories from real-world websites is expensive and difficult to verify. The underlying state transitions are hidden,
π΄ RynnBrain: Open Embodied Foundation Models β score 75
Sources: huggingface
Despite rapid progress in multimodal foundation models, embodied intelligence community still lacks a unified, physically grounded foundation model that integrates perception, reasoning, and planning within real-world spatial-temporal dynamics. We introduce RynnBrain, an open-source spatiotemporal f
π‘ Notable
Model Releases
π‘ Gemini 3.1 Pro: A smarter model for your most complex tasks β score 50
Sources: lab_blog/DeepMind
3.1 Pro is designed for tasks where a simple answer isnβt enough.
Developer Tools
π‘ CADEvolve: Creating Realistic CAD via Program Evolution β score 65
Sources: huggingface
Computer-Aided Design (CAD) delivers rapid, editable modeling for engineering and manufacturing. Recent AI progress now makes full automation feasible for various CAD tasks. However, progress is bottlenecked by data: public corpora mostly contain sketch-extrude sequences, lack complex operations, mu
π‘ Multi-agent cooperation through in-context co-player inference β score 45
Sources: huggingface
Achieving cooperation among self-interested agents remains a fundamental challenge in multi-agent reinforcement learning. Recent work showed that mutual cooperation can be induced between "learning-aware" agents that account for and shape the learning dynamics of their co-players. However, existing
Infrastructure & Compute
π‘ Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation β score 55
Sources: huggingface
Visual loco-manipulation of arbitrary objects in the wild with humanoid robots requires accurate end-effector (EE) control and a generalizable understanding of the scene via visual inputs (e.g., RGB-D images). Existing approaches are based on real-world imitation learning and exhibit limited general
Business & Funding
π‘ Advancing independent research on AI alignment β score 50
Sources: lab_blog/OpenAI
OpenAI commits $7.5M to The Alignment Project to fund independent AI alignment research, strengthening global efforts to address AGI safety and security risks.
π’ Incremental
Model Releases
π’ MAEB: Massive Audio Embedding Benchmark β score 35
Sources: huggingface
We introduce the Massive Audio Embedding Benchmark (MAEB), a large-scale benchmark covering 30 tasks across speech, music, environmental sounds, and cross-modal audio-text reasoning in 100+ languages. We evaluate 50+ models and find that no single model dominates across all tasks: contrastive audio-
π’ Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality β score 25
Sources: huggingface
Standard factuality evaluations of LLMs treat all errors alike, obscuring whether failures arise from missing knowledge (empty shelves) or from limited access to encoded facts (lost keys). We propose a behavioral framework that profiles factual knowledge at the level of facts rather than questions,
Developer Tools
π’ World Action Models are Zero-shot Policies β score 15
Sources: huggingface
State-of-the-art Vision-Language-Action (VLA) models excel at semantic generalization but struggle to generalize to unseen physical motions in novel environments. We introduce DreamZero, a World Action Model (WAM) built upon a pretrained video diffusion backbone. Unlike VLAs, WAMs learn physical dyn
π’ Towards a Science of AI Agent Reliability β score 5
Sources: huggingface
AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice. This discrepancy highlights a fundamental limitation of current evaluations: compressing agent behavior into a s
π New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| SLA2: Sparse-Linear Attention with Learnable Routing and QAT | developer_tool | 63 | Open |
| AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines | developer_tool | 53 | Open |
| RynnBrain: Open Embodied Foundation Models | developer_tool | 49 | Open |
| CADEvolve: Creating Realistic CAD via Program Evolution | developer_tool | 32 | Open |
| Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation | infrastructure | 29 | Open |
| A Unified Framework for Locality in Scalable MARL | cs.AI | 0 | Open |
| Early-Warning Signals of Grokking via Loss-Landscape Geometry | cs.AI | 0 | Open |
| DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers | cs.AI | 0 | Open |
| HQFS: Hybrid Quantum Classical Financial Security with VQC Forecasting, QUBO Annealing, and Audit-Ready Post-Quantum Signing | cs.AI | 0 | Open |
| Fundamental Limits of Black-Box Safety Evaluation: Information-Theoretic and Computational Barriers from Latent Context Conditioning | cs.AI | 0 | Open |
| Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation | cs.AI | 0 | Open |
| Exploring LLMs for User Story Extraction from Mockups | cs.AI | 0 | Open |
| Sonar-TS: Search-Then-Verify Natural Language Querying for Time Series Databases | cs.AI | 0 | Open |
| Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History | cs.AI | 0 | Open |
| OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration | cs.AI | 0 | Open |