๐Ÿ”ด High Significance

Model Releases

๐Ÿ”ด SkillClaw: Let Skills Evolve Collectively with Agentic Evolver โ€” score 85 Sources: huggingface

Large language model (LLM) agents such as OpenClaw rely on reusable skills to perform complex tasks, yet these skills remain largely static after deployment. As a result, similar workflows, tool usage patterns, and failure modes are repeatedly rediscovered across users, preventing the system from im

๐Ÿ”ด ClawBench: Can AI Agents Complete Everyday Online Tasks? โ€” score 75 Sources: huggingface

AI agents may be able to automate your inbox, but can they automate other routine aspects of your life? Everyday online tasks offer a realistic yet unsolved testbed for evaluating the next generation of AI agents. To this end, we introduce ClawBench, an evaluation framework of 153 simple tasks that

Infrastructure & Compute

๐Ÿ”ด Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability โ€” score 95 Sources: huggingface

A prevailing narrative in LLM post-training holds that supervised finetuning (SFT) memorizes while reinforcement learning (RL) generalizes. We revisit this claim for reasoning SFT with long chain-of-thought (CoT) supervision and find that cross-domain generalization is not absent but conditional, jo

๐ŸŸก Notable

Model Releases

๐ŸŸก HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents โ€” score 65 Sources: huggingface

We introduce HY-Embodied-0.5, a family of foundation models specifically designed for real-world embodied agents. To bridge the gap between general Vision-Language Models (VLMs) and the demands of embodied agents, our models are developed to enhance the core capabilities required by embodied intelli

๐ŸŸก ChatGPT for customer success teams โ€” score 50 Sources: lab_blog/OpenAI

Learn how customer success teams use ChatGPT to manage accounts, improve communication, reduce churn, and drive adoption and renewals.

๐ŸŸก Using skills โ€” score 50 Sources: lab_blog/OpenAI

Learn how to create and use ChatGPT skills to build reusable workflows, automate recurring tasks, and ensure consistent, high-quality outputs.

๐ŸŸก Healthcare โ€” score 50 Sources: lab_blog/OpenAI

Explore how clinicians use ChatGPT to support diagnosis, documentation, and patient care with secure, HIPAA-compliant AI tools.

๐ŸŸก Using projects in ChatGPT โ€” score 50 Sources: lab_blog/OpenAI

Learn how to use projects in ChatGPT to organize chats, files, and instructions, manage ongoing work, and collaborate more effectively.

Omitted 19 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

๐ŸŸก When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models โ€” score 55 Sources: huggingface

Text-to-video diffusion models have enabled open-ended video synthesis, but often struggle with generating the correct number of objects specified in a prompt. We introduce NUMINA , a training-free identify-then-guide framework for improved numerical alignment. NUMINA identifies prompt-layout incons

๐ŸŸก MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping โ€” score 45 Sources: huggingface

In this paper, we introduce MegaStyle, a novel and scalable data curation pipeline that constructs an intra-style consistent, inter-style diverse and high-quality style dataset. We achieve this by leveraging the consistent text-to-image style mapping capability of current large generative models, wh

Other Signals

๐ŸŸก Our response to the Axios developer tool compromise โ€” score 50 Sources: lab_blog/OpenAI

OpenAI responds to the Axios supply chain attack by rotating macOS code signing certificates, updating apps, and confirming no user data was compromised.

๐ŸŸข Incremental

Developer Tools

๐ŸŸข LPM 1.0: Video-based Character Performance Model โ€” score 35 Sources: huggingface

Performance, the externalization of intent, emotion, and personality through visual, vocal, and temporal behavior, is what makes a character alive. Learning such performance from video is a promising alternative to traditional 3D pipelines. However, existing video models struggle to jointly achieve

๐ŸŸข DMax: Aggressive Parallel Decoding for dLLMs โ€” score 25 Sources: huggingface

We present DMax, a new paradigm for efficient diffusion language models (dLLMs). It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked dLLMs that decode through a binary mask-to-token transition,

๐ŸŸข Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering โ€” score 15 Sources: huggingface

Large language model (LLM) agents are increasingly built less by changing model weights than by reorganizing the runtime around them. Capabilities that earlier systems expected the model to recover internally are now externalized into memory stores, reusable skills, interaction protocols, and the su

๐ŸŸข OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks โ€” score 5 Sources: huggingface

Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal Large Language Models. However, extending this success to open-source multimodal generalist models remains heavily constrained by two primary challeng

๐Ÿ“„ New Papers

TitleCategoryScoreLink
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capabilityinfrastructure330Open
SkillClaw: Let Skills Evolve Collectively with Agentic Evolvermodel_release293Open
ClawBench: Can AI Agents Complete Everyday Online Tasks?model_release266Open
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agentsmodel_release192Open
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Modelsdeveloper_tool119Open
Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMscs.AI0Open
MPAC: A Multi-Principal Agent Coordination Protocol for Interoperable Multi-Agent Collaborationcs.AI0Open
Scalable High-Recall Constraint-Satisfaction-Based Information Retrieval for Clinical Trials Matchingcs.AI0Open
ECM Contracts: Contract-Aware, Versioned, and Governable Capability Interfaces for Embodied Agentscs.AI0Open
Hidden in Plain Sight: Visual-to-Symbolic Analytical Solution Inference from Field Visualizationscs.AI0Open
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Taskscs.AI0Open
AI-Induced Human Responsibility (AIHR) in AI-Human teamscs.AI0Open
AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Modelscs.AI0Open
MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classificationcs.AI0Open
Semantic Channel Theory: Deductive Compression and Structural Fidelity for Multi-Agent Communicationcs.AI0Open

๐Ÿข Lab Blog Posts