πŸ”΄ High Significance

Model Releases

πŸ”΄ Near-Future Policy Optimization β€” score 85 Sources: huggingface

Reinforcement learning with verifiable rewards (RLVR) has become a core post-training recipe. Introducing suitable off-policy trajectories into on-policy exploration accelerates RLVR convergence and raises the performance ceiling, yet finding a source of such trajectories remains the key challenge.

πŸ”΄ DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data β€” score 75 Sources: huggingface

Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utiliz

Developer Tools

πŸ”΄ LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model β€” score 95 Sources: huggingface

We present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal understanding and generation within a natively integrated framework. Its architecture combines a fully semantic discrete tokenizer, a MoE-based dLLM backbone, and a diffusion decoder. By discre

🟑 Notable

Model Releases

🟑 OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis β€” score 55 Sources: huggingface

Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achieving a marked performance leap, e.g., nearly 70% success on AndroidWorld. However, these systems keep their training data closed and remain opaque abo

🟑 Introducing GPT-5.5 β€” score 50 Sources: lab_blog/OpenAI

Introducing GPT-5.5, our smartest model yetβ€”faster, more capable, and built for complex tasks like coding, research, and data analysis across tools.

🟑 GPT-5.5 System Card β€” score 50 Sources: lab_blog/OpenAI

🟑 GPT-5.5 Bio Bug Bounty β€” score 50 Sources: lab_blog/OpenAI

Explore the GPT-5.5 Bio Bug Bounty: a red-teaming challenge to find universal jailbreaks for bio safety risks, with rewards up to $25,000.

Developer Tools

🟑 Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges β€” score 65 Sources: huggingface

Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and multimodal large language models (MLLMs) toward human-preferred behaviors. However, these approaches introduce a systemic vulnerability: reward hacking,

🟑 What is Codex? β€” score 50 Sources: lab_blog/OpenAI

Learn how Codex helps you go beyond chat by automating tasks, connecting tools, and producing real outputs like docs and dashboards.

🟑 How to get started with Codex β€” score 50 Sources: lab_blog/OpenAI

Learn how to get started with Codex by setting up projects, creating threads, and completing your first tasks with step-by-step guidance.

🟑 Codex settings β€” score 50 Sources: lab_blog/OpenAI

Learn how to configure Codex settings, including personalization, detail level, and permissions, to run tasks smoothly and customize your workflow.

🟑 Working with Codex β€” score 50 Sources: lab_blog/OpenAI

Learn how to set up your Codex workspace, create threads and projects, manage files, and start completing tasks with step-by-step guidance.

Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.

Business & Funding

🟑 From buildingApplied IntuitionfromYC-eraautonomy tooling into a$15B physical AI company,Qasar YounisandPeter Ludwighave spent the last decade living through the full arc of autonomy: fromsimulationand β€” score 65 Sources: newsletter/Latent Space

From buildingApplied IntuitionfromYC-eraautonomy tooling into a$15B physical AI company,Qasar YounisandPeter Ludwighave spent the last decade living through the full arc of autonomy: fromsimulationanddata infrastructurefor robotaxi companies, to operating systems for safety-critical machines, to dep

Other Signals

🟑 Applied Intuition’s mission:building physical AI for a safer, more prosperous world, powering cars, trucks, construction and mining equipment, agriculture, defense, and other moving machines β€” score 65 Sources: newsletter/Latent Space

Why physical AI is different from screen-based AI:learned systems can make mistakes in chat or coding, but safety-critical machines like driverless trucks, autonomous vehicles, and robots need much higher reliability

🟒 Incremental

Developer Tools

🟒 Exploring Spatial Intelligence from a Generative Perspective β€” score 35 Sources: huggingface

Spatial intelligence is essential for multimodal large language models, yet current benchmarks largely assess it only from an understanding perspective. We ask whether modern generative or unified multimodal models also possess generative spatial intelligence (GSI), the ability to respect and manipu

🟒 A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression β€” score 20 Sources: huggingface

As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback introduces substantia

🟒 Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts β€” score 20 Sources: huggingface

Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert routing. Scaling laws show that under fixed active computation, model quality scales predictably with

🟒 Image Generators are Generalist Vision Learners β€” score 5 Sources: huggingface

Recent works show that image and video generators exhibit zero-shot visual understanding behaviors, in a way reminiscent of how LLMs develop emergent capabilities of language understanding and reasoning from generative pretraining. While it has long been conjectured that the ability to create visual

πŸ“„ New Papers

TitleCategoryScoreLink
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Modeldeveloper_tool240Open
Near-Future Policy Optimizationmodel_release76Open
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Datamodel_release54Open
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challengesdeveloper_tool34Open
OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesismodel_release30Open
Feedback Over Form: Why Execution Feedback Matters More Than Pipeline Topology in 1-3B Code Generationcs.AI0Open
TAPO-Description Logic for Information Behavior: Refined OBoxes, Inference, and Categorical Semanticscs.AI0Open
Scaling of Gaussian Kolmogorov--Arnold Networkscs.AI0Open
Doubly Saturated Ramsey Graphs: A Case Study in Computer-Assisted Mathematical Discoverycs.AI0Open
How VLAs (Really) Work In Open-World Environmentscs.AI0Open
Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Modelscs.AI0Open
On Reasoning Behind Next Occupation Recommendationcs.AI0Open
IntrAgent: An LLM Agent for Content-Grounded Information Retrieval through Literature Reviewcs.AI0Open
Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Managementcs.AI0Open
A Demonstration of SQLyzr: A Platform for Fine-Grained Text-to-SQL Evaluation and Analysiscs.AI0Open

🏒 Lab Blog Posts

πŸ“° Newsletter Roundup

Items surfaced by newsletter editors that were not merged with primary sources: