π΄ High Significance
Model Releases
π΄ EXAONE 4.5 Technical Report β score 75
Sources: huggingface
This technical report introduces EXAONE 4.5, the first open-weight vision language model released by LG AI Research. EXAONE 4.5 is architected by integrating a dedicated visual encoder into the existing EXAONE 4.0 framework, enabling native multimodal pretraining over both visual and textual modalit
Developer Tools
π΄ FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios β score 85
Sources: huggingface
The manufacturing sector is increasingly adopting Multimodal Large Language Models (MLLMs) to transition from simple perception to autonomous execution, yet current evaluations fail to reflect the rigorous demands of real-world manufacturing environments. Progress is hindered by data scarcity and a
Infrastructure & Compute
π΄ WildDet3D: Scaling Promptable 3D Detection in the Wild β score 95
Sources: huggingface
Understanding objects in 3D from a single image is a cornerstone of spatial intelligence. A key step toward this goal is monocular 3D object detection--recovering the extent, location, and orientation of objects from an input RGB image. To be practical in the open world, such a detector must general
π‘ Notable
Model Releases
π‘ Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI β score 50
Sources: lab_blog/OpenAI
Cloudflare brings OpenAIβs GPT-5.4 and Codex to Agent Cloud, enabling enterprises to build, deploy, and scale AI agents for real-world tasks with speed and security.
π‘ Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning β score 50
Sources: lab_blog/DeepMind
Gemini Robotics ER 1.6: Enhancing spatial reasoning and multi-view understanding for autonomous robotics.
Developer Tools
π‘ Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory β score 65
Sources: huggingface
With the advancement of interactive video generation, diffusion models have increasingly demonstrated their potential as world models. However, existing approaches still struggle to simultaneously achieve memory-enabled long-term temporal consistency and high-resolution real-time generation, limitin
π‘ RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details β score 55
Sources: huggingface
We introduce region-specific image refinement as a dedicated problem setting: given an input image and a user-specified region (e.g., a scribble mask or a bounding box), the goal is to restore fine-grained details while keeping all non-edited pixels strictly unchanged. Despite rapid progress in imag
π‘ Multi-User Large Language Model Agents β score 45
Sources: huggingface
Large language models (LLMs) and LLM-based agents are increasingly deployed as assistants in planning and decision making, yet most existing systems are implicitly optimized for a single-principal interaction paradigm, in which the model is designed to satisfy the objectives of one dominant user who
π’ Incremental
Developer Tools
π’ ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion β score 35
Sources: huggingface
Chest X-ray report generation (CXR-RG) has the potential to substantially alleviate radiologists' workload. However, conventional autoregressive vision--language models (VLMs) suffer from high inference latency due to sequential token decoding. Diffusion-based models offer a promising alternative th
π’ ELT: Elastic Looped Transformers for Visual Generation β score 25
Sources: huggingface
We introduce Elastic Looped Transformers (ELT), a highly parameter-efficient class of visual generative models based on a recurrent transformer architecture. While conventional generative models rely on deep stacks of unique transformer layers, our approach employs iterative, weight-shared transform
π’ AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents β score 15
Sources: huggingface
As large language models (LLMs) evolve into autonomous agents for long-horizon information-seeking, managing finite context capacity has become a critical bottleneck. Existing context management methods typically commit to a single fixed strategy throughout the entire trajectory. Such static designs
π’ Envisioning the Future, One Step at a Time β score 5
Sources: huggingface
Accurately anticipating how complex, diverse scenes will evolve requires models that represent uncertainty, simulate along extended interaction chains, and efficiently explore many plausible futures. Yet most existing approaches rely on dense video or latent-space prediction, expending substantial c
π New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| WildDet3D: Scaling Promptable 3D Detection in the Wild | infrastructure | 249 | Open |
| FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios | developer_tool | 100 | Open |
| EXAONE 4.5 Technical Report | model_release | 72 | Open |
| Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory | developer_tool | 51 | Open |
| RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details | developer_tool | 46 | Open |
| Beyond Statistical Co-occurrence: Unlocking Intrinsic Semantics for Tabular Data Clustering | cs.AI | 0 | Open |
| A Quantitative Definition of Intelligence | cs.AI | 0 | Open |
| AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis | cs.AI | 0 | Open |
| Compliant But Unsatisfactory: The Gap Between Auditing Standards and Practices for Probabilistic Genotyping Software | cs.AI | 0 | Open |
| DIB-OD: Preserving the Invariant Core for Robust Heterogeneous Graph Adaptation via Decoupled Information Bottleneck and Online Distillation | cs.AI | 0 | Open |
| Ambiguity Detection and Elimination in Automated Executable Process Modeling | cs.AI | 0 | Open |
| Product Review Based on Optimized Facial Expression Detection | cs.AI | 0 | Open |
| Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Models | cs.AI | 0 | Open |
| ZoomR: Memory Efficient Reasoning through Multi-Granularity Key Value Retrieval | cs.AI | 0 | Open |
| CASK: Core-Aware Selective KV Compression for Reasoning Traces | cs.AI | 0 | Open |