๐ด High Significance
Model Releases
๐ด DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models โ score 95
Sources: huggingface
Data-centric training has emerged as a promising direction for improving large language models (LLMs) by optimizing not only model parameters but also the selection, composition, and weighting of training data during optimization. However, existing approaches to data selection, data mixture optimiza
Developer Tools
๐ด The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook โ score 85
Sources: huggingface
Latent space is rapidly emerging as a native substrate for language-based models. While modern systems are still commonly understood through explicit token-level generation, an increasing body of work shows that many critical internal processes are more naturally carried out in continuous latent spa
๐ด Generative World Renderer โ score 75
Sources: huggingface
Scaling generative inverse and forward rendering to real-world scenarios is bottlenecked by the limited realism and temporal coherence of existing synthetic datasets. To bridge this persistent domain gap, we introduce a large-scale, dynamic dataset curated from visually complex AAA games. Using a no
๐ก Notable
Developer Tools
๐ก SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization โ score 65
Sources: huggingface
Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidanc
๐ก VOID: Video Object and Interaction Deletion โ score 55
Sources: huggingface
Existing video object removal methods excel at inpainting content "behind" the object and correcting appearance-level artifacts such as shadows and reflections. However, when the removed object has more significant interactions, such as collisions with other objects, current models fail to correct t
๐ก CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery โ score 45
Sources: huggingface
Large language model (LLM)-based evolution is a promising approach for open-ended discovery, where progress requires sustained search and knowledge accumulation. Existing methods still rely heavily on fixed heuristics and hard-coded exploration rules, which limit the autonomy of LLM agents. We prese
๐ข Incremental
Developer Tools
๐ข Steerable Visual Representations โ score 35
Sources: huggingface
Pretrained Vision Transformers (ViTs) such as DINOv2 and MAE provide generic image features that can be applied to a variety of downstream tasks such as retrieval, classification, and segmentation. However, such representations tend to focus on the most salient visual cues in the image, with no way
๐ข EgoSim: Egocentric World Simulator for Embodied Interaction Generation โ score 25
Sources: huggingface
We introduce EgoSim, a closed-loop egocentric world simulator that generates spatially consistent interaction videos and persistently updates the underlying 3D scene state for continuous simulation. Existing egocentric simulators either lack explicit 3D grounding, causing structural drift under view
๐ข Therefore I am. I Think โ score 15
Sources: huggingface
We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a
๐ข LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model โ score 5
Sources: huggingface
Unified models (UMs) hold promise for their ability to understand and generate content across heterogeneous modalities. Compared to merely generating visual content, the use of UMs for interleaved cross-modal reasoning is more promising and valuable, e.g., for solving understanding problems that req
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models | model_release | 368 | Open |
| The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook | developer_tool | 151 | Open |
| Generative World Renderer | developer_tool | 106 | Open |
| SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization | developer_tool | 104 | Open |
| VOID: Video Object and Interaction Deletion | developer_tool | 59 | Open |
| Moondream Segmentation: From Words to Masks | cs.AI | 0 | Open |
| Explorable Theorems: Making Written Theorems Explorable by Grounding Them in Formal Representations | cs.AI | 0 | Open |
| LitPivot: Developing Well-Situated Research Ideas Through Dynamic Contextualization and Critique within the Literature Landscape | cs.AI | 0 | Open |
| AICCE: AI Driven Compliance Checker Engine | cs.AI | 0 | Open |
| Do Audio-Visual Large Language Models Really See and Hear? | cs.AI | 0 | Open |
| AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models | cs.AI | 0 | Open |
| OntoKG: Ontology-Oriented Knowledge Graph Construction with Intrinsic-Relational Routing | cs.AI | 0 | Open |
| Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents | cs.AI | 0 | Open |
| Smart Transfer: Leveraging Vision Foundation Model for Rapid Building Damage Mapping with Post-Earthquake VHR Imagery | cs.AI | 0 | Open |
| Toys that listen, talk, and play: Understanding Children's Sensemaking and Interactions with AI Toys | cs.AI | 0 | Open |