๐Ÿ”ด High Significance

Model Releases

๐Ÿ”ด Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models โ€” score 95 Sources: huggingface

Video--based world models have emerged along two dominant paradigms: video generation and 3D reconstruction. However, existing evaluation benchmarks either focus narrowly on visual fidelity and text--video alignment for generative models, or rely on static 3D reconstruction metrics that fundamentall

๐Ÿ”ด OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis โ€” score 75 Sources: huggingface

Training deep research agents requires long-horizon trajectories that interleave search, evidence aggregation, and multi-step reasoning. However, existing data collection pipelines typically rely on proprietary web APIs, making large-scale trajectory synthesis costly, unstable, and difficult to repr

Developer Tools

๐Ÿ”ด Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model โ€” score 85 Sources: huggingface

We present daVinci-MagiHuman, an open-source audio-video generative foundation model for human-centric generation. daVinci-MagiHuman jointly generates synchronized video and audio using a single-stream Transformer that processes text, video, and audio within a unified token sequence via self-attenti

๐ŸŸก Notable

Model Releases

๐ŸŸก Helping developers build safer AI experiences for teens โ€” score 50 Sources: lab_blog/OpenAI

OpenAI releases prompt-based teen safety policies for developers using gpt-oss-safeguard, helping moderate age-specific risks in AI systems.

๐ŸŸก Powering product discovery in ChatGPT โ€” score 50 Sources: lab_blog/OpenAI

ChatGPT introduces richer, visually immersive shopping powered by the Agentic Commerce Protocol, enabling product discovery, side-by-side comparisons, and merchant integration.

Developer Tools

๐ŸŸก Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs โ€” score 65 Sources: huggingface

Vision-language models (VLMs) typically process images at a native high-resolution, forcing a trade-off between accuracy and computational efficiency: high-resolution inputs capture fine details but incur significant computational costs, while low-resolution inputs advocate for efficiency, they pote

๐ŸŸก LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning โ€” score 55 Sources: huggingface

We introduce LongCat-Flash-Prover, a flagship 560-billion-parameter open-source Mixture-of- Experts (MoE) model that advances Native Formal Reasoning in Lean4 through agentic tool-integrated reasoning (TIR). We decompose the native formal reasoning task into three independent formal capabilities, i.

๐ŸŸก VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding โ€” score 45 Sources: huggingface

Long video understanding remains challenging for multimodal large language models (MLLMs) due to limited context windows, which necessitate identifying sparse query-relevant video segments. However, existing methods predominantly localize clues based solely on the query, overlooking the video's intr

Business & Funding

๐ŸŸก Update on the OpenAI Foundation โ€” score 50 Sources: lab_blog/OpenAI

The OpenAI Foundation announces plans to invest at least $1 billion in curing diseases, economic opportunity, AI resilience, and community programs.

๐ŸŸข Incremental

Developer Tools

๐ŸŸข SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning โ€” score 35 Sources: huggingface

Despite the remarkable success of large-scale pre-trained image representation models (i.e., vision encoders) across various vision tasks, they are predominantly trained on 2D image data and therefore often fail to capture 3D spatial relationships between objects and backgrounds in the real world, c

๐ŸŸข Repurposing Geometric Foundation Models for Multi-view Diffusion โ€” score 25 Sources: huggingface

While recent advances in generative latent spaces have driven substantial progress in single-image generation, the optimal latent space for novel view synthesis (NVS) remains largely unexplored. In particular, NVS requires geometrically consistent generation across viewpoints, but existing approache

๐ŸŸข mSFT: Addressing Dataset Mixtures Overfiting Heterogeneously in Multi-task SFT โ€” score 15 Sources: huggingface

Current language model training commonly applies multi-task Supervised Fine-Tuning (SFT) using a homogeneous compute budget across all sub-datasets. This approach is fundamentally sub-optimal: heterogeneous learning dynamics cause faster-learning tasks to overfit early while slower ones remain under

๐ŸŸข Manifold-Aware Exploration for Reinforcement Learning in Video Generation โ€” score 5 Sources: huggingface

Group Relative Policy Optimization (GRPO) methods for video generation like FlowGRPO remain far less reliable than their counterparts for language models and images. This gap arises because video generation has a complex solution space, and the ODE-to-SDE conversion used for exploration can inject e

๐Ÿ“„ New Papers

TitleCategoryScoreLink
Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Modelsmodel_release136Open
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Modeldeveloper_tool131Open
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesismodel_release98Open
Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMsdeveloper_tool93Open
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learningdeveloper_tool81Open
Benchmarking Multi-Agent LLM Architectures for Financial Document Processing: A Comparative Study of Orchestration Patterns, Cost-Accuracy Tradeoffs and Production Scaling Strategiescs.AI0Open
Generalizing Dynamics Modeling More Easily from Representation Perspectivecs.AI0Open
Vision-based Deep Learning Analysis of Unordered Biomedical Tabular Datasets via Optimal Spatial Cartographycs.AI0Open
MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluationcs.AI0Open
Detecting Corporate AI-Washing via Cross-Modal Semantic Inconsistency Learningcs.AI0Open
WiFi2Cap: Semantic Action Captioning from Wi-Fi CSI via Limb-Level Semantic Alignmentcs.AI0Open
PopResume: Causal Fairness Evaluation of LLM/VLM Resume Screeners with Population-Representative Datasetcs.AI0Open
Bitboard version of Tetris AIcs.AI0Open
HyFI: Hyperbolic Feature Interpolation for Brain-Vision Alignmentcs.AI0Open
Beyond Binary Correctness: Scaling Evaluation of Long-Horizon Agents on Subjective Enterprise Taskscs.AI0Open

๐Ÿข Lab Blog Posts