π΄ High Significance
Model Releases
π΄ AI Can Learn Scientific Taste β score 95
Sources: huggingface
Great scientists have strong judgement and foresight, closely tied to what we call scientific taste. Here, we use the term to refer to the capacity to judge and propose research ideas with high potential impact. However, most relative research focuses on improving an AI scientist's executive capabil
Developer Tools
π΄ Grounding World Simulation Models in a Real-World Metropolis β score 75
Sources: huggingface
What if a world simulation model could render not an imagined environment but a city that actually exists? Prior generative world models synthesize visually plausible yet artificial environments by imagining all content. We present Seoul World Model (SWM), a city-scale world model grounded in the re
Infrastructure & Compute
π΄ Attention Residuals β score 85
Sources: huggingface
Residual connections with PreNorm are standard in modern LLMs, yet they accumulate all layer outputs with fixed unit weights. This uniform aggregation causes uncontrolled hidden-state growth with depth, progressively diluting each layer's contribution. We propose Attention Residuals (AttnRes), which
π‘ Notable
Model Releases
π‘ Introducing GPT-5.4 mini and nano β score 50
Sources: lab_blog/OpenAI
GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.
π‘ OpenAI Japan announces Japan Teen Safety Blueprint to put teen safety first β score 50
Sources: lab_blog/OpenAI
OpenAI Japan announces the Japan Teen Safety Blueprint, introducing stronger age protections, parental controls, and well-being safeguards for teens using generative AI.
π‘ Equipping workers with insights about compensation β score 50
Sources: lab_blog/OpenAI
New research shows Americans send nearly 3 million daily messages to ChatGPT asking about compensation and earnings, helping close the wage information gap.
π‘ Measuring progress toward AGI: A cognitive framework β score 50
Sources: lab_blog/DeepMind
Weβre introducing a framework to measure progress toward AGI, and launching a Kaggle hackathon to build the relevant evaluations.
π‘ EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings β score 45
Sources: huggingface
Large language models are shifting from passive information providers to active agents intended for complex workflows. However, their deployment as reliable AI workers in enterprise is stalled by benchmarks that fail to capture the intricacies of professional environments, specifically, the need for
Developer Tools
π‘ OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data β score 65
Sources: huggingface
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet the development of high-performance search agents remains dominated by industrial giants due to a lack of transparent, high-quality training data. This persistent data scarcity has fu
π‘ HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions β score 55
Sources: huggingface
We present HSImul3R, a unified framework for simulation-ready 3D reconstruction of human-scene interactions (HSI) from casual captures, including sparse-view images and monocular videos. Existing methods suffer from a perception-simulation gap: visually plausible reconstructions often violate physic
π’ Incremental
Model Releases
π’ Mixture-of-Depths Attention β score 35
Sources: huggingface
Scaling depth is a key driver for large language models (LLMs). Yet, as LLMs become deeper, they often suffer from signal degradation: informative features formed in shallow layers are gradually diluted by repeated residual updates, making them harder to recover in deeper layers. We introduce mixtur
π’ Effective Distillation to Hybrid xLSTM Architectures β score 25
Sources: huggingface
There have been numerous attempts to distill quadratic attention-based large language models (LLMs) into sub-quadratic linearized architectures. However, despite extensive research, such distilled models often fail to match the performance of their teacher LLMs on various downstream tasks. We set ou
π’ Safe and Scalable Web Agent Learning via Recreated Websites β score 5
Sources: huggingface
Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning re
Developer Tools
π’ Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models β score 15
Sources: huggingface
Vision-Language Models (VLMs) frequently "hallucinate" - generate plausible yet factually incorrect statements - posing a critical barrier to their trustworthy deployment. In this work, we propose a new paradigm for diagnosing hallucinations, recasting them from static output errors into dynamic pat
π New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| AI Can Learn Scientific Taste | model_release | 437 | Open |
| Attention Residuals | infrastructure | 189 | Open |
| Grounding World Simulation Models in a Real-World Metropolis | developer_tool | 157 | Open |
| OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data | developer_tool | 155 | Open |
| HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions | developer_tool | 154 | Open |
| Interpretable Context Methodology: Folder Structure as Agentic Architecture | cs.AI | 0 | Open |
| EMA Is Not All You Need: Mapping the Boundary Between Structure and Content in Recurrent Context | cs.AI | 0 | Open |
| Residual Stream Duality in Modern Transformer Architectures | cs.AI | 0 | Open |
| Collaborative Temporal Feature Generation via Critic-Free Reinforcement Learning for Cross-User Sensor-Based Activity Recognition | cs.AI | 0 | Open |
| Enhancing Linguistic Generalization of VLA: Fine-Tuning OpenVLA via Synthetic Instruction Augmentation | cs.AI | 0 | Open |
| POaaS: Minimal-Edit Prompt Optimization as a Service to Lift Accuracy and Cut Hallucinations on On-Device sLLMs | cs.AI | 0 | Open |
| A Context Alignment Pre-processor for Enhancing the Coherence of Human-LLM Dialog | cs.AI | 0 | Open |
| ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning | cs.AI | 0 | Open |
| Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models | cs.AI | 0 | Open |
| Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models | cs.AI | 0 | Open |