๐Ÿ”ด High Significance

Model Releases

๐Ÿ”ด FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization โ€” score 95 Sources: huggingface

We present Future-KL Influenced Policy Optimization (FIPO), a reinforcement learning algorithm designed to overcome reasoning bottlenecks in large language models. While GRPO style training scales effectively, it typically relies on outcome-based rewards (ORM) that distribute a global advantage unif

๐Ÿ”ด CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence โ€” score 85 Sources: huggingface

The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and ground agents within a single physically coherent environment. Existing open-source platforms remain domain

Developer Tools

๐Ÿ”ด LongCat-Next: Lexicalizing Modalities as Discrete Tokens โ€” score 75 Sources: huggingface

The prevailing Next-Token Prediction (NTP) paradigm has driven the success of large language models through discrete autoregressive modeling. However, contemporary multimodal systems remain language-centric, often treating non-linguistic modalities as external attachments, leading to fragmented arch

๐ŸŸก Notable

Model Releases

๐ŸŸก GEMS: Agent-Native Multimodal Generation with Memory and Skills โ€” score 65 Sources: huggingface

Recent multimodal generation models have achieved remarkable progress on general-purpose generation tasks, yet continue to struggle with complex instructions and specialized downstream tasks. Inspired by the success of advanced agent frameworks such as Claude Code, we propose GEMS (Agent-Native Mult

๐ŸŸก Gradient Labs gives every bank customer an AI account manager โ€” score 50 Sources: lab_blog/OpenAI

Gradient Labs uses GPT-4.1 and GPT-5.4 mini and nano to power AI agents that automate banking support workflows with low latency and high reliability.

๐ŸŸก Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development โ€” score 45 Sources: huggingface

Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the re

Developer Tools

๐ŸŸก Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells โ€” score 55 Sources: huggingface

Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the dist

๐ŸŸข Incremental

Developer Tools

๐ŸŸข All Roads Lead to Rome: Incentivizing Divergent Thinking in Vision-Language Models โ€” score 35 Sources: huggingface

Recent studies have demonstrated that Reinforcement Learning (RL), notably Group Relative Policy Optimization (GRPO), can intrinsically elicit and enhance the reasoning capabilities of Vision-Language Models (VLMs). However, despite the promise, the underlying mechanisms that drive the effectiveness

๐ŸŸข VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward โ€” score 25 Sources: huggingface

Large-scale video diffusion models achieve impressive visual quality, yet often fail to preserve geometric consistency. Prior approaches improve consistency either by augmenting the generator with additional modules or applying geometry-aware alignment. However, architectural modifications can compr

๐ŸŸข CutClaw: Agentic Hours-Long Video Editing via Music Synchronization โ€” score 15 Sources: huggingface

Editing the video content with audio alignment forms a digital human-made art in current social media. However, the time-consuming and repetitive nature of manual video editing has long been a challenge for filmmakers and professional content creators alike. In this paper, we introduce CutClaw, an a

๐ŸŸข Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis โ€” score 5 Sources: huggingface

Unified multimodal models provide a natural and promising architecture for understanding diverse and complex real-world knowledge while generating high-quality images. However, they still rely primarily on frozen parametric knowledge, which makes them struggle with real-world image generation involv

๐Ÿ“„ New Papers

TitleCategoryScoreLink
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimizationmodel_release356Open
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligencemodel_release344Open
LongCat-Next: Lexicalizing Modalities as Discrete Tokensdeveloper_tool150Open
GEMS: Agent-Native Multimodal Generation with Memory and Skillsmodel_release89Open
Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cellsdeveloper_tool84Open
Go Big or Go Home: Simulating Mobbing Behavior with Braitenbergian Robotscs.AI0Open
Signals: Trajectory Sampling and Triage for Agentic Interactionscs.AI0Open
In harmony with gpt-osscs.AI0Open
Grading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMscs.AI0Open
RAGShield: Detecting Numerical Claim Manipulation in Government RAG Systemscs.AI0Open
EvolveTool-Bench: Evaluating the Quality of LLM-Generated Tool Libraries as Software Artifactscs.AI0Open
Deep Networks Favor Simple Datacs.AI0Open
Improving Generalization of Deep Learning for Brain Metastases Segmentation Across Institutionscs.AI0Open
COTTA: Context-Aware Transfer Adaptation for Trajectory Prediction in Autonomous Drivingcs.AI0Open
Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base modelcs.AI0Open

๐Ÿข Lab Blog Posts