๐ด High Significance
Model Releases
๐ด Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text โ score 85
Sources: huggingface
Reinforcement Learning with Verifiable Rewards (RLVR) has become a cornerstone for unlocking complex reasoning in Large Language Models (LLMs). Yet, scaling up RL is bottlenecked by limited existing verifiable data, where improvements increasingly saturate over prolonged training. To overcome this,
๐ด ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas โ score 70
Sources: huggingface
Large language models (LLMs) are increasingly used as tool-augmented agents for multi-step decision making, yet training robust tool-using agents remains challenging. Existing methods still require manual intervention, depend on non-verifiable simulated environments, rely exclusively on either super
Developer Tools
๐ด PaperBanana: Automating Academic Illustration for AI Scientists โ score 95
Sources: huggingface
Despite rapid advances in autonomous AI scientists powered by language models, generating publication-ready illustrations remains a labor-intensive bottleneck in the research workflow. To lift this burden, we introduce PaperBanana, an agentic framework for automated generation of publication-ready a
๐ด Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation โ score 70
Sources: huggingface
The NVFP4 lower-precision format, supported in hardware by NVIDIA Blackwell GPUs, promises to allow, for the first time, end-to-end fully-quantized pre-training of massive models such as LLMs. Yet, existing quantized training methods still sacrifice some of the representation capacity of this format
๐ก Notable
Model Releases
๐ก THINKSAFE: Self-Generated Safety Alignment for Reasoning Models โ score 55
Sources: huggingface
Large reasoning models (LRMs) achieve remarkable performance by leveraging reinforcement learning (RL) on reasoning tasks to generate long chain-of-thought (CoT) reasoning. However, this over-optimization often prioritizes compliance, making models vulnerable to harmful prompts. To mitigate this saf
๐ก Introducing the Codex app โ score 50
Sources: lab_blog/OpenAI
Introducing the Codex app for macOSโa command center for AI coding and software development with multiple agents, parallel workflows, and long-running tasks.
Developer Tools
๐ก Snowflake and OpenAI partner to bring frontier intelligence to enterprise data โ score 50
Sources: lab_blog/OpenAI
OpenAI and Snowflake partner in a $200M agreement to bring frontier intelligence into enterprise data, enabling AI agents and insights directly in Snowflake.
๐ก ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought โ score 40
Sources: huggingface
While Chain-of-Thought (CoT) significantly enhances the performance of Large Language Models (LLMs), explicit reasoning chains introduce substantial computational redundancy. Recent latent reasoning methods attempt to mitigate this by compressing reasoning processes into latent space, but often suff
๐ก TTCS: Test-Time Curriculum Synthesis for Self-Evolving โ score 40
Sources: huggingface
Test-Time Training offers a promising way to improve the reasoning ability of large language models (LLMs) by adapting the model using only the test questions. However, existing methods struggle with difficult reasoning problems for two reasons: raw test questions are often too difficult to yield hi
๐ข Incremental
Developer Tools
๐ข Causal World Modeling for Robot Control โ score 25
Sources: huggingface
This work highlights that video world modeling, alongside vision-language pre-training, establishes a fresh and independent foundation for robot learning. Intuitively, video world models provide the ability to imagine the near future by understanding the causality between actions and visual dynamics
๐ข Do Reasoning Models Enhance Embedding Models? โ score 15
Sources: huggingface
State-of-the-art embedding models are increasingly derived from decoder-only Large Language Model (LLM) backbones adapted via contrastive learning. Given the emergence of reasoning models trained via Reinforcement Learning with Verifiable Rewards (RLVR), a natural question arises: do enhanced reason
๐ข MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning โ score 5
Sources: huggingface
Long-horizon agentic reasoning necessitates effectively compressing growing interaction histories into a limited context window. Most existing memory systems serialize history as text, where token-level cost is uniform and scales linearly with length, often spending scarce budget on low-value detail
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| PaperBanana: Automating Academic Illustration for AI Scientists | developer_tool | 240 | Open |
| Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text | model_release | 117 | Open |
| Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation | developer_tool | 65 | Open |
| ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas | model_release | 65 | Open |
| THINKSAFE: Self-Generated Safety Alignment for Reasoning Models | model_release | 42 | Open |
| OpInf-LLM: Parametric PDE Solving with LLMs via Operator Inference | cs.AI | 0 | Open |
| Draw2Learn: A Human-AI Collaborative Tool for Drawing-Based Science Learning | cs.AI | 0 | Open |
| Governance at the Edge of Architecture: Regulating NeuroAI and Neuromorphic Systems | cs.AI | 0 | Open |
| Harnessing Flexible Spatial and Temporal Data Center Workloads for Grid Regulation Services | cs.AI | 0 | Open |
| MarkCleaner: High-Fidelity Watermark Removal via Imperceptible Micro-Geometric Perturbation | cs.AI | 0 | Open |
| White-Box Neural Ensemble for Vehicular Plasticity: Quantifying the Efficiency Cost of Symbolic Auditability in Adaptive NMPC | cs.AI | 0 | Open |
| Qrita: High-performance Top-k and Top-p Algorithm for GPUs using Pivot-based Truncation and Selection | cs.AI | 0 | Open |
| You Need an Encoder for Native Position-Independent Caching | cs.AI | 0 | Open |
| A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning | cs.AI | 0 | Open |
| Toward a Machine Bertin: Why Visualization Needs Design Principles for Machine Cognition | cs.AI | 0 | Open |