๐ด High Significance
Model Releases
๐ด Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration โ score 75
Sources: huggingface
In this paper, we uncover the hidden potential of Diffusion Transformers (DiTs) to significantly enhance generative tasks. Through an in-depth analysis of the denoising process, we demonstrate that introducing a single learned scaling parameter can significantly improve the performance of DiT blocks
Developer Tools
๐ด Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale โ score 95
Sources: huggingface
We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its
๐ด PixelSmile: Toward Fine-Grained Facial Expression Editing โ score 85
Sources: huggingface
Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability,
๐ก Notable
Model Releases
๐ก Voxtral TTS โ score 60
Sources: huggingface
We introduce Voxtral TTS, an expressive multilingual text-to-speech model that generates natural speech from as little as 3 seconds of reference audio. Voxtral TTS adopts a hybrid architecture that combines auto-regressive generation of semantic speech tokens with flow-matching for acoustic tokens.
๐ก STADLER reshapes knowledge work at a 230-year-old company โ score 50
Sources: lab_blog/OpenAI
Learn how STADLER uses ChatGPT to transform knowledge work, saving time and accelerating productivity across 650 employees.
Developer Tools
๐ก RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models โ score 60
Sources: huggingface
Image restoration under real-world degradations is critical for downstream tasks such as autonomous driving and object detection. However, existing restoration models are often limited by the scale and distribution of their training data, resulting in poor generalization to real-world scenarios. Rec
๐ก MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens โ score 45
Sources: huggingface
Long-term memory is a cornerstone of human intelligence. Enabling AI to process lifetime-scale information remains a long-standing pursuit in the field. Due to the constraints of full-attention architectures, the effective context length of large language models (LLMs) is typically limited to 1M
๐ข Incremental
Model Releases
๐ข MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data โ score 35
Sources: huggingface
Generating images conditioned on multiple visual references is critical for real-world applications such as multi-subject composition, narrative illustration, and novel view synthesis, yet current models suffer from severe performance degradation as the number of input references grows. We identify
๐ข AVControl: Efficient Framework for Training Audio-Visual Controls โ score 15
Sources: huggingface
Controlling video and audio generation requires diverse modalities, from depth and pose to camera trajectories and audio transformations, yet existing approaches either train a single monolithic model for a fixed set of controls or introduce costly architectural changes for each new modality. We int
๐ข VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models โ score 5
Sources: huggingface
Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized ver
Developer Tools
๐ข SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks โ score 25
Sources: huggingface
Software development is iterative, yet agentic coding benchmarks overwhelmingly evaluate single-shot solutions against complete specifications. Code can pass the test suite but become progressively harder to extend. Recent iterative benchmarks attempt to close this gap, but constrain the agent's des
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale | developer_tool | 140 | Open |
| PixelSmile: Toward Fine-Grained Facial Expression Editing | developer_tool | 121 | Open |
| Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration | model_release | 73 | Open |
| Voxtral TTS | model_release | 62 | Open |
| RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models | developer_tool | 62 | Open |
| Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling Innovations | cs.LG | 0 | Open |
| Clinical Reasoning AI for Oncology Treatment Planning: A Multi-Specialty Case-Based Evaluation | cs.LG | 0 | Open |
| QuitoBench: A High-Quality Open Time Series Forecasting Benchmark | cs.LG | 0 | Open |
| Central-to-Local Adaptive Generative Diffusion Framework for Improving Gene Expression Prediction in Data-Limited Spatial Transcriptomics | cs.LG | 0 | Open |
| GLU: Global-Local-Uncertainty Fusion for Scalable Spatiotemporal Reconstruction and Forecasting | cs.LG | 0 | Open |
| Identification of Bivariate Causal Directionality Based on Anticipated Asymmetric Geometries | cs.LG | 0 | Open |
| Constitutive parameterized deep energy method for solid mechanics problems with random material parameters | cs.LG | 0 | Open |
| Accelerating PayPal's Commerce Agent with Speculative Decoding: An Empirical Study on EAGLE3 with Fine-Tuned Nemotron Models | cs.LG | 0 | Open |
| H-Node Attack and Defense in Large Language Models | cs.LG | 0 | Open |
| Asymptotic Optimism for Tensor Regression Models with Applications to Neural Network Compression | cs.LG | 0 | Open |