๐ด High Significance
Developer Tools
๐ด Demystifing Video Reasoning โ score 95
Sources: huggingface
Recent advances in video generation have revealed an unexpected phenomenon: diffusion-based video models exhibit non-trivial reasoning capabilities. Prior work attributes this to a Chain-of-Frames (CoF) mechanism, where reasoning is assumed to unfold sequentially across video frames. In this work, w
๐ด InCoder-32B: Code Foundation Model for Industrial Scenarios โ score 85
Sources: huggingface
Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To addre
๐ด SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models โ score 75
Sources: huggingface
Omni-modal large language models (OLMs) redefine human-machine interaction by natively integrating audio, vision, and text. However, existing OLM benchmarks remain anchored to static, accuracy-centric tasks, leaving a critical gap in assessing social interactivity, the fundamental capacity to naviga
๐ก Notable
Model Releases
๐ก MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification โ score 65
Sources: huggingface
We present MiroThinker-1.7, a new research agent designed for complex long-horizon reasoning tasks. Building on this foundation, we further introduce MiroThinker-H1, which extends the agent with heavy-duty reasoning capabilities for more reliable multi-step problem solving. In particular, MiroThinke
๐ก Qianfan-OCR: A Unified End-to-End Model for Document Intelligence โ score 55
Sources: huggingface
We present Qianfan-OCR, a 4B-parameter end-to-end vision-language model that unifies document parsing, layout analysis, and document understanding within a single architecture. It performs direct image-to-Markdown conversion and supports diverse prompt-driven tasks including table extraction, chart
Developer Tools
๐ก Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding โ score 45
Sources: huggingface
Recent advancements in multimodal large reasoning models (MLRMs) have significantly improved performance in visual question answering. However, we observe that transition words (e.g., because, however, and wait) are closely associated with hallucinations and tend to exhibit high-entropy states. We a
๐ข Incremental
Developer Tools
๐ข Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation โ score 35
Sources: huggingface
Simulating robot-world interactions is a cornerstone of Embodied AI. Recently, a few works have shown promise in leveraging video generations to transcend the rigid visual/physical constraints of traditional simulators. However, they primarily operate in 2D space or are guided by static environmenta
๐ข WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation โ score 20
Sources: huggingface
Recent advances in video diffusion transformers have enabled interactive gaming world models that allow users to explore generated environments over extended horizons. However, existing approaches struggle with precise action control and long-horizon 3D consistency. Most prior works treat user actio
๐ข Online Experiential Learning for Language Models โ score 20
Sources: huggingface
The prevailing paradigm for improving large language models relies on offline training with human annotations or simulated environments, leaving the rich experience accumulated during real-world deployment entirely unexploited. We propose Online Experiential Learning (OEL), a framework that enables
๐ข TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas โ score 5
Sources: huggingface
Text-to-SQL parsing has achieved remarkable progress under the Full Schema Assumption. However, this premise fails in real-world enterprise environments where databases contain hundreds of tables with massive noisy metadata. Rather than injecting the full schema upfront, an agent must actively ident
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Demystifing Video Reasoning | developer_tool | 378 | Open |
| InCoder-32B: Code Foundation Model for Industrial Scenarios | developer_tool | 315 | Open |
| SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models | developer_tool | 250 | Open |
| MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification | model_release | 189 | Open |
| Qianfan-OCR: A Unified End-to-End Model for Document Intelligence | model_release | 158 | Open |
| From Drop-off to Recovery: A Mechanistic Analysis of Segmentation in MLLMs | cs.AI | 0 | Open |
| KANtize: Exploring Low-bit Quantization of Kolmogorov-Arnold Networks for Efficient Inference | cs.AI | 0 | Open |
| Draft-and-Prune: Improving the Reliability of Auto-formalization for Logical Reasoning | cs.AI | 0 | Open |
| Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models | cs.AI | 0 | Open |
| Deployment and Evaluation of an EHR-integrated, Large Language Model-Powered Tool to Triage Surgical Patients | cs.AI | 0 | Open |
| Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architectures | cs.AI | 0 | Open |
| Pathology-Aware Multi-View Contrastive Learning for Patient-Independent ECG Reconstruction | cs.AI | 0 | Open |
| On the Fragility of AI Agent Collusion | cs.AI | 0 | Open |
| DANCE: Dynamic 3D CNN Pruning: Joint Frame, Channel, and Feature Adaptation for Energy Efficiency on the Edge | cs.AI | 0 | Open |
| CentaurTA Studio: A Self-Improving Human-Agent Collaboration System for Thematic Analysis | cs.AI | 0 | Open |