๐Ÿ”ด High Significance

Model Releases

๐Ÿ”ด dLLM: Simple Diffusion Language Modeling โ€” score 95 Sources: huggingface

Although diffusion language models (DLMs) are evolving quickly, many recent models converge on a set of shared components. These components, however, are distributed across ad-hoc research codebases or lack transparent implementations, making them difficult to reproduce or extend. As the field accel

๐Ÿ”ด CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation โ€” score 85 Sources: huggingface

GPU kernel optimization is fundamental to modern deep learning but remains a highly specialized task requiring deep hardware expertise. Despite strong performance in general programming, large language models (LLMs) remain uncompetitive with compiler-based systems such as torch.compile for CUDA kern

Developer Tools

๐Ÿ”ด Enhancing Spatial Understanding in Image Generation via Reward Modeling โ€” score 75 Sources: huggingface

Recent progress in text-to-image generation has greatly advanced visual fidelity and creativity, but it has also imposed higher demands on prompt complexity-particularly in encoding intricate spatial relationships. In such cases, achieving satisfactory results often requires multiple sampling attemp

๐ŸŸก Notable

Model Releases

๐ŸŸก Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets โ€” score 65 Sources: huggingface

The reliability of multilingual Large Language Model (LLM) evaluation is currently compromised by the inconsistent quality of translated benchmarks. Existing resources often suffer from semantic drift and context loss, which can lead to misleading performance metrics. In this work, we present a full

Developer Tools

๐ŸŸก Mode Seeking meets Mean Seeking for Fast Long Video Generation โ€” score 55 Sources: huggingface

Scaling video generation from seconds to minutes faces a critical bottleneck: while short-video data is abundant and high-fidelity, coherent long-form data is scarce and limited to narrow domains. To address this, we propose a training paradigm where Mode Seeking meets Mean Seeking, decoupling local

๐ŸŸก LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding โ€” score 45 Sources: huggingface

Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate tokens that are then verified in parallel by the target model. The speedup is significantly determined by the acceptance rate, yet standard training minimizes

๐ŸŸข Incremental

Developer Tools

๐ŸŸข CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era โ€” score 35 Sources: huggingface

Scientific research relies on accurate citation for attribution and integrity, yet large language models (LLMs) introduce a new risk: fabricated references that appear plausible but correspond to no real publications. Such hallucinated citations have already been observed in submissions and accepted

๐ŸŸข How to Take a Memorable Picture? Empowering Users with Actionable Feedback โ€” score 25 Sources: huggingface

Image memorability, i.e., how likely an image is to be remembered, has traditionally been studied in computer vision either as a passive prediction task, with models regressing a scalar score, or with generative methods altering the visual input to boost the image likelihood of being remembered. Yet

๐ŸŸข Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models โ€” score 15 Sources: huggingface

Compositional generalization, the ability to recognize familiar parts in novel contexts, is a defining property of intelligent systems. Although modern models are trained on massive datasets, they still cover only a tiny fraction of the combinatorial space of possible inputs, raising the question of

๐ŸŸข InfoNCE Induces Gaussian Distribution โ€” score 5 Sources: huggingface

Contrastive learning has become a cornerstone of modern representation learning, allowing training with massive unlabeled data for both task-specific and general (foundation) models. A prototypical loss in contrastive training is InfoNCE and its variants. In this work, we show that the InfoNCE objec

๐Ÿ“„ New Papers

TitleCategoryScoreLink
dLLM: Simple Diffusion Language Modelingmodel_release159Open
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generationmodel_release101Open
Enhancing Spatial Understanding in Image Generation via Reward Modelingdeveloper_tool63Open
Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasetsmodel_release46Open
Mode Seeking meets Mean Seeking for Fast Long Video Generationdeveloper_tool45Open
MetaState: Persistent Working Memory Enhances Reasoning in Discrete Diffusion Language Modelscs.AI0Open
Provable and Practical In-Context Policy Optimization for Self-Improvementcs.AI0Open
URAG: A Benchmark for Uncertainty Quantification in Retrieval-Augmented Large Language Modelscs.AI0Open
SubstratumGraphEnv: Reinforcement Learning Environment (RLE) for Modeling System Attack Pathscs.AI0Open
PanCanBench: A Comprehensive Benchmark for Evaluating Large Language Models in Pancreatic Oncologycs.AI0Open
UTICA: Multi-Objective Self-Distllation Foundation Model Pretraining for Time Series Classificationcs.AI0Open
Constructing Synthetic Instruction Datasets for Improving Reasoning in Domain-Specific LLMs: A Case Study in the Japanese Financial Domaincs.AI0Open
ASTRA-bench: Evaluating Tool-Use Agent Reasoning and Action Planning with Personal User Contextcs.AI0Open
MixerCSeg: An Efficient Mixer Architecture for Crack Segmentation via Decoupled Mamba Attentioncs.AI0Open
Align and Filter: Improving Performance in Asynchronous On-Policy RLcs.AI0Open