๐ด High Significance
Developer Tools
๐ด Recursive Multi-Agent Systems โ score 95
Sources: huggingface
Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend such scaling principle from a single model to multi-agent systems, and ask: Can agent collaboration itself be scaled th
๐ด DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios โ score 75
Sources: huggingface
Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks often suffer from code-sandbox confinement, single-language creation-only tasks, and assumption of perfect intent. To bridge these gaps, we i
Infrastructure & Compute
๐ด Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora โ score 85
Sources: huggingface
Reliably transferring specialized human knowledge from text into large language models remains a fundamental challenge in artificial intelligence. Fine-tuning on domain corpora has enabled substantial capability gains, but the process operates without feedback: when a model fails on a domain task, t
๐ก Notable
Model Releases
๐ก AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery โ score 65
Sources: huggingface
Autonomous scientific research is significantly advanced thanks to the development of AI agents. One key step in this process is finding the right scientific literature, whether to explore existing knowledge for a research problem, or to acquire evidence for verifying assumptions and supporting clai
๐ก Meta-CoT: Enhancing Granularity and Generalization in Image Editing โ score 55
Sources: huggingface
Unified multi-modal understanding/generative models have shown improved image editing performance by incorporating fine-grained understanding into their Chain-of-Thought (CoT) process. However, a critical question remains underexplored: what forms of CoT and training strategy can jointly enhance bot
๐ก Where the goblins came from โ score 50
Sources: lab_blog/OpenAI
How goblin outputs spread in AI models: timeline, root cause, and fixes behind personality-driven quirks in GPT-5 behavior.
Developer Tools
๐ก Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models โ score 45
Sources: huggingface
Unified multimodal models (UMMs) integrate visual understanding and generation within a single framework. For text-to-image (T2I) tasks, this unified capability allows UMMs to refine outputs after their initial generation, potentially extending the performance upper bound. Current UMM-based refineme
Infrastructure & Compute
๐ก Building the compute infrastructure for the Intelligence Age โ score 50
Sources: lab_blog/OpenAI
OpenAI scales Stargate to build the compute infrastructure powering AGI, adding new data center capacity to meet growing AI demand.
Other Signals
๐ก Cybersecurity in the Intelligence Age โ score 50
Sources: lab_blog/OpenAI
OpenAI outlines a five-part action plan for strengthening cybersecurity in the Intelligence Age, focused on democratizing AI-powered cyber defense and protecting critical systems.
๐ข Incremental
Developer Tools
๐ข Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation โ score 35
Sources: huggingface
In this work, we propose Mutual Forcing, a framework for fast autoregressive audio-video generation with long-horizon audio-video synchronization. Our approach addresses two key challenges: joint audio-video modeling and fast autoregressive generation. To ease joint audio-video optimization, we adop
๐ข Co-Director: Agentic Generative Video Storytelling โ score 25
Sources: huggingface
While diffusion models generate high-fidelity video clips, transforming them into coherent storytelling engines remains challenging. Current agentic pipelines automate this via chained modules but suffer from semantic drift and cascading failures due to independent, handcrafted prompting. We present
๐ข Step-Audio-R1.5 Technical Report โ score 15
Sources: huggingface
Recent advancements in large audio language models have extended Chain-of-Thought (CoT) reasoning into the auditory domain, enabling models to tackle increasingly complex acoustic and spoken tasks. To elicit and sustain these extended reasoning chains, the prevailing paradigm -- driven by the succes
๐ข Toward Scalable Terminal Task Synthesis via Skill Graphs โ score 5
Sources: huggingface
Terminal agents have demonstrated strong potential for autonomous command-line execution, yet their training remains constrained by the scarcity of high-quality and diverse execution trajectories. Existing approaches mitigate this bottleneck by synthesizing large-scale terminal task instances for tr
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Recursive Multi-Agent Systems | developer_tool | 239 | Open |
| Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora | infrastructure | 86 | Open |
| DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios | developer_tool | 45 | Open |
| AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery | model_release | 29 | Open |
| Meta-CoT: Enhancing Granularity and Generalization in Image Editing | model_release | 28 | Open |
| Option-Order Randomisation Reveals a Distributional Position Attractor in Prompted Sandbagging | cs.AI | 0 | Open |
| Agent Name Service (ANS): A Proof-of-Concept Trust Layer for Secure AI Agent Discovery, Identity, and Governance in Kubernetes | cs.AI | 0 | Open |
| Breaking the Autoregressive Chain: Hyper-Parallel Decoding for Efficient LLM-Based Attribute Value Extraction | cs.AI | 0 | Open |
| OMEGA: Optimizing Machine Learning by Evaluating Generated Algorithms | cs.AI | 0 | Open |
| Qvine: Vine Structured Quantum Circuits for Loading High Dimensional Distributions | cs.AI | 0 | Open |
| Seeking Consensus: Geometric-Semantic On-the-Fly Recalibration for Open-Vocabulary Remote Sensing Semantic Segmentation | cs.AI | 0 | Open |
| DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation | cs.AI | 0 | Open |
| Persuadability and LLMs as Legal Decision Tools | cs.AI | 0 | Open |
| LATTICE: Evaluating Decision Support Utility of Crypto Agents | cs.AI | 0 | Open |
| Apriori-based Analysis of Learned Helplessness in Mathematics Tutoring: Behavioral Patterns by Level, Intervention, and Outcome | cs.AI | 0 | Open |