πŸ”΄ High Significance

Model Releases

πŸ”΄ EXAONE 4.5 Technical Report β€” score 75 Sources: huggingface

This technical report introduces EXAONE 4.5, the first open-weight vision language model released by LG AI Research. EXAONE 4.5 is architected by integrating a dedicated visual encoder into the existing EXAONE 4.0 framework, enabling native multimodal pretraining over both visual and textual modalit

Developer Tools

πŸ”΄ FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios β€” score 85 Sources: huggingface

The manufacturing sector is increasingly adopting Multimodal Large Language Models (MLLMs) to transition from simple perception to autonomous execution, yet current evaluations fail to reflect the rigorous demands of real-world manufacturing environments. Progress is hindered by data scarcity and a

Infrastructure & Compute

πŸ”΄ WildDet3D: Scaling Promptable 3D Detection in the Wild β€” score 95 Sources: huggingface

Understanding objects in 3D from a single image is a cornerstone of spatial intelligence. A key step toward this goal is monocular 3D object detection--recovering the extent, location, and orientation of objects from an input RGB image. To be practical in the open world, such a detector must general

🟑 Notable

Model Releases

🟑 Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI β€” score 50 Sources: lab_blog/OpenAI

Cloudflare brings OpenAI’s GPT-5.4 and Codex to Agent Cloud, enabling enterprises to build, deploy, and scale AI agents for real-world tasks with speed and security.

🟑 Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning β€” score 50 Sources: lab_blog/DeepMind

Gemini Robotics ER 1.6: Enhancing spatial reasoning and multi-view understanding for autonomous robotics.

Developer Tools

🟑 Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory β€” score 65 Sources: huggingface

With the advancement of interactive video generation, diffusion models have increasingly demonstrated their potential as world models. However, existing approaches still struggle to simultaneously achieve memory-enabled long-term temporal consistency and high-resolution real-time generation, limitin

🟑 RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details β€” score 55 Sources: huggingface

We introduce region-specific image refinement as a dedicated problem setting: given an input image and a user-specified region (e.g., a scribble mask or a bounding box), the goal is to restore fine-grained details while keeping all non-edited pixels strictly unchanged. Despite rapid progress in imag

🟑 Multi-User Large Language Model Agents β€” score 45 Sources: huggingface

Large language models (LLMs) and LLM-based agents are increasingly deployed as assistants in planning and decision making, yet most existing systems are implicitly optimized for a single-principal interaction paradigm, in which the model is designed to satisfy the objectives of one dominant user who

🟒 Incremental

Developer Tools

🟒 ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion β€” score 35 Sources: huggingface

Chest X-ray report generation (CXR-RG) has the potential to substantially alleviate radiologists' workload. However, conventional autoregressive vision--language models (VLMs) suffer from high inference latency due to sequential token decoding. Diffusion-based models offer a promising alternative th

🟒 ELT: Elastic Looped Transformers for Visual Generation β€” score 25 Sources: huggingface

We introduce Elastic Looped Transformers (ELT), a highly parameter-efficient class of visual generative models based on a recurrent transformer architecture. While conventional generative models rely on deep stacks of unique transformer layers, our approach employs iterative, weight-shared transform

🟒 AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents β€” score 15 Sources: huggingface

As large language models (LLMs) evolve into autonomous agents for long-horizon information-seeking, managing finite context capacity has become a critical bottleneck. Existing context management methods typically commit to a single fixed strategy throughout the entire trajectory. Such static designs

🟒 Envisioning the Future, One Step at a Time β€” score 5 Sources: huggingface

Accurately anticipating how complex, diverse scenes will evolve requires models that represent uncertainty, simulate along extended interaction chains, and efficiently explore many plausible futures. Yet most existing approaches rely on dense video or latent-space prediction, expending substantial c

πŸ“„ New Papers

TitleCategoryScoreLink
WildDet3D: Scaling Promptable 3D Detection in the Wildinfrastructure249Open
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenariosdeveloper_tool100Open
EXAONE 4.5 Technical Reportmodel_release72Open
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memorydeveloper_tool51Open
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Detailsdeveloper_tool46Open
Beyond Statistical Co-occurrence: Unlocking Intrinsic Semantics for Tabular Data Clusteringcs.AI0Open
A Quantitative Definition of Intelligencecs.AI0Open
AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysiscs.AI0Open
Compliant But Unsatisfactory: The Gap Between Auditing Standards and Practices for Probabilistic Genotyping Softwarecs.AI0Open
DIB-OD: Preserving the Invariant Core for Robust Heterogeneous Graph Adaptation via Decoupled Information Bottleneck and Online Distillationcs.AI0Open
Ambiguity Detection and Elimination in Automated Executable Process Modelingcs.AI0Open
Product Review Based on Optimized Facial Expression Detectioncs.AI0Open
Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Modelscs.AI0Open
ZoomR: Memory Efficient Reasoning through Multi-Granularity Key Value Retrievalcs.AI0Open
CASK: Core-Aware Selective KV Compression for Reasoning Tracescs.AI0Open

🏒 Lab Blog Posts