๐Ÿ”ด High Significance

Developer Tools

๐Ÿ”ด Demystifing Video Reasoning โ€” score 95 Sources: huggingface

Recent advances in video generation have revealed an unexpected phenomenon: diffusion-based video models exhibit non-trivial reasoning capabilities. Prior work attributes this to a Chain-of-Frames (CoF) mechanism, where reasoning is assumed to unfold sequentially across video frames. In this work, w

๐Ÿ”ด InCoder-32B: Code Foundation Model for Industrial Scenarios โ€” score 85 Sources: huggingface

Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To addre

๐Ÿ”ด SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models โ€” score 75 Sources: huggingface

Omni-modal large language models (OLMs) redefine human-machine interaction by natively integrating audio, vision, and text. However, existing OLM benchmarks remain anchored to static, accuracy-centric tasks, leaving a critical gap in assessing social interactivity, the fundamental capacity to naviga

๐ŸŸก Notable

Model Releases

๐ŸŸก MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification โ€” score 65 Sources: huggingface

We present MiroThinker-1.7, a new research agent designed for complex long-horizon reasoning tasks. Building on this foundation, we further introduce MiroThinker-H1, which extends the agent with heavy-duty reasoning capabilities for more reliable multi-step problem solving. In particular, MiroThinke

๐ŸŸก Qianfan-OCR: A Unified End-to-End Model for Document Intelligence โ€” score 55 Sources: huggingface

We present Qianfan-OCR, a 4B-parameter end-to-end vision-language model that unifies document parsing, layout analysis, and document understanding within a single architecture. It performs direct image-to-Markdown conversion and supports diverse prompt-driven tasks including table extraction, chart

Developer Tools

๐ŸŸก Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding โ€” score 45 Sources: huggingface

Recent advancements in multimodal large reasoning models (MLRMs) have significantly improved performance in visual question answering. However, we observe that transition words (e.g., because, however, and wait) are closely associated with hallucinations and tend to exhibit high-entropy states. We a

๐ŸŸข Incremental

Developer Tools

๐ŸŸข Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation โ€” score 35 Sources: huggingface

Simulating robot-world interactions is a cornerstone of Embodied AI. Recently, a few works have shown promise in leveraging video generations to transcend the rigid visual/physical constraints of traditional simulators. However, they primarily operate in 2D space or are guided by static environmenta

๐ŸŸข WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation โ€” score 20 Sources: huggingface

Recent advances in video diffusion transformers have enabled interactive gaming world models that allow users to explore generated environments over extended horizons. However, existing approaches struggle with precise action control and long-horizon 3D consistency. Most prior works treat user actio

๐ŸŸข Online Experiential Learning for Language Models โ€” score 20 Sources: huggingface

The prevailing paradigm for improving large language models relies on offline training with human annotations or simulated environments, leaving the rich experience accumulated during real-world deployment entirely unexploited. We propose Online Experiential Learning (OEL), a framework that enables

๐ŸŸข TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas โ€” score 5 Sources: huggingface

Text-to-SQL parsing has achieved remarkable progress under the Full Schema Assumption. However, this premise fails in real-world enterprise environments where databases contain hundreds of tables with massive noisy metadata. Rather than injecting the full schema upfront, an agent must actively ident

๐Ÿ“„ New Papers

TitleCategoryScoreLink
Demystifing Video Reasoningdeveloper_tool378Open
InCoder-32B: Code Foundation Model for Industrial Scenariosdeveloper_tool315Open
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Modelsdeveloper_tool250Open
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verificationmodel_release189Open
Qianfan-OCR: A Unified End-to-End Model for Document Intelligencemodel_release158Open
From Drop-off to Recovery: A Mechanistic Analysis of Segmentation in MLLMscs.AI0Open
KANtize: Exploring Low-bit Quantization of Kolmogorov-Arnold Networks for Efficient Inferencecs.AI0Open
Draft-and-Prune: Improving the Reliability of Auto-formalization for Logical Reasoningcs.AI0Open
Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Modelscs.AI0Open
Deployment and Evaluation of an EHR-integrated, Large Language Model-Powered Tool to Triage Surgical Patientscs.AI0Open
Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architecturescs.AI0Open
Pathology-Aware Multi-View Contrastive Learning for Patient-Independent ECG Reconstructioncs.AI0Open
On the Fragility of AI Agent Collusioncs.AI0Open
DANCE: Dynamic 3D CNN Pruning: Joint Frame, Channel, and Feature Adaptation for Energy Efficiency on the Edgecs.AI0Open
CentaurTA Studio: A Self-Improving Human-Agent Collaboration System for Thematic Analysiscs.AI0Open