๐ด High Significance
Model Releases
๐ด dLLM: Simple Diffusion Language Modeling โ score 95
Sources: huggingface
Although diffusion language models (DLMs) are evolving quickly, many recent models converge on a set of shared components. These components, however, are distributed across ad-hoc research codebases or lack transparent implementations, making them difficult to reproduce or extend. As the field accel
๐ด CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation โ score 85
Sources: huggingface
GPU kernel optimization is fundamental to modern deep learning but remains a highly specialized task requiring deep hardware expertise. Despite strong performance in general programming, large language models (LLMs) remain uncompetitive with compiler-based systems such as torch.compile for CUDA kern
Developer Tools
๐ด Enhancing Spatial Understanding in Image Generation via Reward Modeling โ score 75
Sources: huggingface
Recent progress in text-to-image generation has greatly advanced visual fidelity and creativity, but it has also imposed higher demands on prompt complexity-particularly in encoding intricate spatial relationships. In such cases, achieving satisfactory results often requires multiple sampling attemp
๐ก Notable
Model Releases
๐ก Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets โ score 65
Sources: huggingface
The reliability of multilingual Large Language Model (LLM) evaluation is currently compromised by the inconsistent quality of translated benchmarks. Existing resources often suffer from semantic drift and context loss, which can lead to misleading performance metrics. In this work, we present a full
Developer Tools
๐ก Mode Seeking meets Mean Seeking for Fast Long Video Generation โ score 55
Sources: huggingface
Scaling video generation from seconds to minutes faces a critical bottleneck: while short-video data is abundant and high-fidelity, coherent long-form data is scarce and limited to narrow domains. To address this, we propose a training paradigm where Mode Seeking meets Mean Seeking, decoupling local
๐ก LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding โ score 45
Sources: huggingface
Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate tokens that are then verified in parallel by the target model. The speedup is significantly determined by the acceptance rate, yet standard training minimizes
๐ข Incremental
Developer Tools
๐ข CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era โ score 35
Sources: huggingface
Scientific research relies on accurate citation for attribution and integrity, yet large language models (LLMs) introduce a new risk: fabricated references that appear plausible but correspond to no real publications. Such hallucinated citations have already been observed in submissions and accepted
๐ข How to Take a Memorable Picture? Empowering Users with Actionable Feedback โ score 25
Sources: huggingface
Image memorability, i.e., how likely an image is to be remembered, has traditionally been studied in computer vision either as a passive prediction task, with models regressing a scalar score, or with generative methods altering the visual input to boost the image likelihood of being remembered. Yet
๐ข Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models โ score 15
Sources: huggingface
Compositional generalization, the ability to recognize familiar parts in novel contexts, is a defining property of intelligent systems. Although modern models are trained on massive datasets, they still cover only a tiny fraction of the combinatorial space of possible inputs, raising the question of
๐ข InfoNCE Induces Gaussian Distribution โ score 5
Sources: huggingface
Contrastive learning has become a cornerstone of modern representation learning, allowing training with massive unlabeled data for both task-specific and general (foundation) models. A prototypical loss in contrastive training is InfoNCE and its variants. In this work, we show that the InfoNCE objec
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| dLLM: Simple Diffusion Language Modeling | model_release | 159 | Open |
| CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation | model_release | 101 | Open |
| Enhancing Spatial Understanding in Image Generation via Reward Modeling | developer_tool | 63 | Open |
| Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets | model_release | 46 | Open |
| Mode Seeking meets Mean Seeking for Fast Long Video Generation | developer_tool | 45 | Open |
| MetaState: Persistent Working Memory Enhances Reasoning in Discrete Diffusion Language Models | cs.AI | 0 | Open |
| Provable and Practical In-Context Policy Optimization for Self-Improvement | cs.AI | 0 | Open |
| URAG: A Benchmark for Uncertainty Quantification in Retrieval-Augmented Large Language Models | cs.AI | 0 | Open |
| SubstratumGraphEnv: Reinforcement Learning Environment (RLE) for Modeling System Attack Paths | cs.AI | 0 | Open |
| PanCanBench: A Comprehensive Benchmark for Evaluating Large Language Models in Pancreatic Oncology | cs.AI | 0 | Open |
| UTICA: Multi-Objective Self-Distllation Foundation Model Pretraining for Time Series Classification | cs.AI | 0 | Open |
| Constructing Synthetic Instruction Datasets for Improving Reasoning in Domain-Specific LLMs: A Case Study in the Japanese Financial Domain | cs.AI | 0 | Open |
| ASTRA-bench: Evaluating Tool-Use Agent Reasoning and Action Planning with Personal User Context | cs.AI | 0 | Open |
| MixerCSeg: An Efficient Mixer Architecture for Crack Segmentation via Decoupled Mamba Attention | cs.AI | 0 | Open |
| Align and Filter: Improving Performance in Asynchronous On-Policy RL | cs.AI | 0 | Open |