๐ด High Significance
Model Releases
๐ด Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models โ score 95
Sources: huggingface
Video--based world models have emerged along two dominant paradigms: video generation and 3D reconstruction. However, existing evaluation benchmarks either focus narrowly on visual fidelity and text--video alignment for generative models, or rely on static 3D reconstruction metrics that fundamentall
๐ด OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis โ score 75
Sources: huggingface
Training deep research agents requires long-horizon trajectories that interleave search, evidence aggregation, and multi-step reasoning. However, existing data collection pipelines typically rely on proprietary web APIs, making large-scale trajectory synthesis costly, unstable, and difficult to repr
Developer Tools
๐ด Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model โ score 85
Sources: huggingface
We present daVinci-MagiHuman, an open-source audio-video generative foundation model for human-centric generation. daVinci-MagiHuman jointly generates synchronized video and audio using a single-stream Transformer that processes text, video, and audio within a unified token sequence via self-attenti
๐ก Notable
Model Releases
๐ก Helping developers build safer AI experiences for teens โ score 50
Sources: lab_blog/OpenAI
OpenAI releases prompt-based teen safety policies for developers using gpt-oss-safeguard, helping moderate age-specific risks in AI systems.
๐ก Powering product discovery in ChatGPT โ score 50
Sources: lab_blog/OpenAI
ChatGPT introduces richer, visually immersive shopping powered by the Agentic Commerce Protocol, enabling product discovery, side-by-side comparisons, and merchant integration.
Developer Tools
๐ก Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs โ score 65
Sources: huggingface
Vision-language models (VLMs) typically process images at a native high-resolution, forcing a trade-off between accuracy and computational efficiency: high-resolution inputs capture fine details but incur significant computational costs, while low-resolution inputs advocate for efficiency, they pote
๐ก LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning โ score 55
Sources: huggingface
We introduce LongCat-Flash-Prover, a flagship 560-billion-parameter open-source Mixture-of- Experts (MoE) model that advances Native Formal Reasoning in Lean4 through agentic tool-integrated reasoning (TIR). We decompose the native formal reasoning task into three independent formal capabilities, i.
๐ก VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding โ score 45
Sources: huggingface
Long video understanding remains challenging for multimodal large language models (MLLMs) due to limited context windows, which necessitate identifying sparse query-relevant video segments. However, existing methods predominantly localize clues based solely on the query, overlooking the video's intr
Business & Funding
๐ก Update on the OpenAI Foundation โ score 50
Sources: lab_blog/OpenAI
The OpenAI Foundation announces plans to invest at least $1 billion in curing diseases, economic opportunity, AI resilience, and community programs.
๐ข Incremental
Developer Tools
๐ข SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning โ score 35
Sources: huggingface
Despite the remarkable success of large-scale pre-trained image representation models (i.e., vision encoders) across various vision tasks, they are predominantly trained on 2D image data and therefore often fail to capture 3D spatial relationships between objects and backgrounds in the real world, c
๐ข Repurposing Geometric Foundation Models for Multi-view Diffusion โ score 25
Sources: huggingface
While recent advances in generative latent spaces have driven substantial progress in single-image generation, the optimal latent space for novel view synthesis (NVS) remains largely unexplored. In particular, NVS requires geometrically consistent generation across viewpoints, but existing approache
๐ข mSFT: Addressing Dataset Mixtures Overfiting Heterogeneously in Multi-task SFT โ score 15
Sources: huggingface
Current language model training commonly applies multi-task Supervised Fine-Tuning (SFT) using a homogeneous compute budget across all sub-datasets. This approach is fundamentally sub-optimal: heterogeneous learning dynamics cause faster-learning tasks to overfit early while slower ones remain under
๐ข Manifold-Aware Exploration for Reinforcement Learning in Video Generation โ score 5
Sources: huggingface
Group Relative Policy Optimization (GRPO) methods for video generation like FlowGRPO remain far less reliable than their counterparts for language models and images. This gap arises because video generation has a complex solution space, and the ODE-to-SDE conversion used for exploration can inject e
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models | model_release | 136 | Open |
| Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model | developer_tool | 131 | Open |
| OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis | model_release | 98 | Open |
| Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs | developer_tool | 93 | Open |
| LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning | developer_tool | 81 | Open |
| Benchmarking Multi-Agent LLM Architectures for Financial Document Processing: A Comparative Study of Orchestration Patterns, Cost-Accuracy Tradeoffs and Production Scaling Strategies | cs.AI | 0 | Open |
| Generalizing Dynamics Modeling More Easily from Representation Perspective | cs.AI | 0 | Open |
| Vision-based Deep Learning Analysis of Unordered Biomedical Tabular Datasets via Optimal Spatial Cartography | cs.AI | 0 | Open |
| MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation | cs.AI | 0 | Open |
| Detecting Corporate AI-Washing via Cross-Modal Semantic Inconsistency Learning | cs.AI | 0 | Open |
| WiFi2Cap: Semantic Action Captioning from Wi-Fi CSI via Limb-Level Semantic Alignment | cs.AI | 0 | Open |
| PopResume: Causal Fairness Evaluation of LLM/VLM Resume Screeners with Population-Representative Dataset | cs.AI | 0 | Open |
| Bitboard version of Tetris AI | cs.AI | 0 | Open |
| HyFI: Hyperbolic Feature Interpolation for Brain-Vision Alignment | cs.AI | 0 | Open |
| Beyond Binary Correctness: Scaling Evaluation of Long-Horizon Agents on Subjective Enterprise Tasks | cs.AI | 0 | Open |