πŸ”΄ High Significance

Developer Tools

πŸ”΄ SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model β€” score 95 Sources: huggingface

SkyReels V4 is a unified multi modal video foundation model for joint video audio generation, inpainting, and editing. The model adopts a dual stream Multimodal Diffusion Transformer (MMDiT) architecture, where one branch synthesizes video and the other generates temporally aligned audio, while shar

πŸ”΄ MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models β€” score 85 Sources: huggingface

Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the discrete nature of 2D molecular graphs, existing models suffer from low chemical validity and struggle

Infrastructure & Compute

πŸ”΄ HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation β€” score 75 Sources: huggingface

Modeling long sequences of user behaviors has emerged as a critical frontier in generative recommendation. However, existing solutions face a dilemma: linear attention mechanisms achieve efficiency at the cost of retrieval precision due to limited state capacity, while softmax attention suffers from

🟑 Notable

Model Releases

🟑 DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference β€” score 65 Sources: huggingface

The performance of multi-turn, agentic LLM inference is increasingly dominated by KV-Cache storage I/O rather than computation. In prevalent disaggregated architectures, loading the massive KV-Cache from external storage creates a fundamental imbalance: storage NICs on prefill engines become bandwid

🟑 DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation β€” score 55 Sources: huggingface

Recent advancements in foundation models have revolutionized joint audio-video generation. However, existing approaches typically treat human-centric tasks including reference-based audio-video generation (R2AV), video editing (RV2AV) and audio-driven video animation (RA2V) as isolated objectives. F

🟑 OpenAI Codex and Figma launch seamless code-to-design experience β€” score 50 Sources: lab_blog/OpenAI

OpenAI and Figma launch a new Codex integration that connects code and design, enabling teams to move between implementation and the Figma canvas to iterate and ship faster.

Developer Tools

🟑 Pacific Northwest National Laboratory and OpenAI partner to accelerate federal permitting β€” score 50 Sources: lab_blog/OpenAI

OpenAI and Pacific Northwest National Laboratory introduce DraftNEPABench, a new benchmark evaluating how AI coding agents can accelerate federal permittingβ€”showing potential to reduce NEPA drafting time by up to 15% and modernize infrastructure reviews.

🟑 Solaris: Building a Multiplayer Video World Model in Minecraft β€” score 45 Sources: huggingface

Existing action-conditioned video generation models (video world models) are limited to single-agent perspectives, failing to capture the multi-agent interactions of real-world environments. We introduce Solaris, a multiplayer video world model that simulates consistent multi-view observations. To e

Enterprise Adoption

🟑 Nano Banana 2: Combining Pro capabilities with lightning-fast speed β€” score 50 Sources: lab_blog/DeepMind

Our latest image generation model offers advanced world knowledge, production ready specs, subject consistency and more, all at Flash speed.

🟒 Incremental

Model Releases

🟒 GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL β€” score 15 Sources: huggingface

Open-source native GUI agents still lag behind closed-source systems on long-horizon navigation tasks. This gap stems from two limitations: a shortage of high-quality, action-aligned reasoning data, and the direct adoption of generic post-training pipelines that overlook the unique challenges of GUI

Developer Tools

🟒 ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning β€” score 35 Sources: huggingface

Agentic reinforcement learning (ARL) has rapidly gained attention as a promising paradigm for training agents to solve complex, multi-step interactive tasks. Despite encouraging early results, ARL remains highly unstable, often leading to training collapse. This instability limits scalability to lar

🟒 Image Generation with a Sphere Encoder β€” score 25 Sources: huggingface

We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent spac

🟒 World Guidance: World Modeling in Condition Space for Action Generation β€” score 5 Sources: huggingface

Leveraging future observation modeling to facilitate action generation presents a promising avenue for enhancing the capabilities of Vision-Language-Action (VLA) models. However, existing approaches struggle to strike a balance between maintaining efficient, predictable future representations and pr

πŸ“„ New Papers

TitleCategoryScoreLink
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing modeldeveloper_tool63Open
MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Modelsdeveloper_tool60Open
HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendationinfrastructure59Open
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inferencemodel_release58Open
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generationmodel_release43Open
From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inferencecs.AI0Open
A Survey on Quantitative Modeling of Trust in Online Social Networkscs.AI0Open
Reinforcement-aware Knowledge Distillation for LLM Reasoningcs.AI0Open
Mapping the Landscape of Artificial Intelligence in Life Cycle Assessment Using Large Language Modelscs.AI0Open
Mirroring the Mind: Distilling Human-Like Metacognitive Strategies into Large Language Modelscs.AI0Open
SignVLA: A Gloss-Free Vision-Language-Action Framework for Real-Time Sign Language-Guided Robotic Manipulationcs.AI0Open
A Mathematical Theory of Agency and Intelligencecs.AI0Open
Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processingcs.AI0Open
Cognitive Models and AI Algorithms Provide Templates for Designing Language Agentscs.AI0Open
Iterative Prompt Refinement for Dyslexia-Friendly Text Summarization Using GPT-4ocs.AI0Open

🏒 Lab Blog Posts