๐ด High Significance
Model Releases
๐ด SkillClaw: Let Skills Evolve Collectively with Agentic Evolver โ score 85
Sources: huggingface
Large language model (LLM) agents such as OpenClaw rely on reusable skills to perform complex tasks, yet these skills remain largely static after deployment. As a result, similar workflows, tool usage patterns, and failure modes are repeatedly rediscovered across users, preventing the system from im
๐ด ClawBench: Can AI Agents Complete Everyday Online Tasks? โ score 75
Sources: huggingface
AI agents may be able to automate your inbox, but can they automate other routine aspects of your life? Everyday online tasks offer a realistic yet unsolved testbed for evaluating the next generation of AI agents. To this end, we introduce ClawBench, an evaluation framework of 153 simple tasks that
Infrastructure & Compute
๐ด Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability โ score 95
Sources: huggingface
A prevailing narrative in LLM post-training holds that supervised finetuning (SFT) memorizes while reinforcement learning (RL) generalizes. We revisit this claim for reasoning SFT with long chain-of-thought (CoT) supervision and find that cross-domain generalization is not absent but conditional, jo
๐ก Notable
Model Releases
๐ก HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents โ score 65
Sources: huggingface
We introduce HY-Embodied-0.5, a family of foundation models specifically designed for real-world embodied agents. To bridge the gap between general Vision-Language Models (VLMs) and the demands of embodied agents, our models are developed to enhance the core capabilities required by embodied intelli
๐ก ChatGPT for customer success teams โ score 50
Sources: lab_blog/OpenAI
Learn how customer success teams use ChatGPT to manage accounts, improve communication, reduce churn, and drive adoption and renewals.
๐ก Using skills โ score 50
Sources: lab_blog/OpenAI
Learn how to create and use ChatGPT skills to build reusable workflows, automate recurring tasks, and ensure consistent, high-quality outputs.
๐ก Healthcare โ score 50
Sources: lab_blog/OpenAI
Explore how clinicians use ChatGPT to support diagnosis, documentation, and patient care with secure, HIPAA-compliant AI tools.
๐ก Using projects in ChatGPT โ score 50
Sources: lab_blog/OpenAI
Learn how to use projects in ChatGPT to organize chats, files, and instructions, manage ongoing work, and collaborate more effectively.
Omitted 19 additional model releases items from the main section; see raw data and source-specific sections below.
Developer Tools
๐ก When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models โ score 55
Sources: huggingface
Text-to-video diffusion models have enabled open-ended video synthesis, but often struggle with generating the correct number of objects specified in a prompt. We introduce NUMINA , a training-free identify-then-guide framework for improved numerical alignment. NUMINA identifies prompt-layout incons
๐ก MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping โ score 45
Sources: huggingface
In this paper, we introduce MegaStyle, a novel and scalable data curation pipeline that constructs an intra-style consistent, inter-style diverse and high-quality style dataset. We achieve this by leveraging the consistent text-to-image style mapping capability of current large generative models, wh
Other Signals
๐ก Our response to the Axios developer tool compromise โ score 50
Sources: lab_blog/OpenAI
OpenAI responds to the Axios supply chain attack by rotating macOS code signing certificates, updating apps, and confirming no user data was compromised.
๐ข Incremental
Developer Tools
๐ข LPM 1.0: Video-based Character Performance Model โ score 35
Sources: huggingface
Performance, the externalization of intent, emotion, and personality through visual, vocal, and temporal behavior, is what makes a character alive. Learning such performance from video is a promising alternative to traditional 3D pipelines. However, existing video models struggle to jointly achieve
๐ข DMax: Aggressive Parallel Decoding for dLLMs โ score 25
Sources: huggingface
We present DMax, a new paradigm for efficient diffusion language models (dLLMs). It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked dLLMs that decode through a binary mask-to-token transition,
๐ข Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering โ score 15
Sources: huggingface
Large language model (LLM) agents are increasingly built less by changing model weights than by reorganizing the runtime around them. Capabilities that earlier systems expected the model to recover internally are now externalized into memory stores, reusable skills, interaction protocols, and the su
๐ข OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks โ score 5
Sources: huggingface
Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal Large Language Models. However, extending this success to open-source multimodal generalist models remains heavily constrained by two primary challeng
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability | infrastructure | 330 | Open |
| SkillClaw: Let Skills Evolve Collectively with Agentic Evolver | model_release | 293 | Open |
| ClawBench: Can AI Agents Complete Everyday Online Tasks? | model_release | 266 | Open |
| HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents | model_release | 192 | Open |
| When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models | developer_tool | 119 | Open |
| Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs | cs.AI | 0 | Open |
| MPAC: A Multi-Principal Agent Coordination Protocol for Interoperable Multi-Agent Collaboration | cs.AI | 0 | Open |
| Scalable High-Recall Constraint-Satisfaction-Based Information Retrieval for Clinical Trials Matching | cs.AI | 0 | Open |
| ECM Contracts: Contract-Aware, Versioned, and Governable Capability Interfaces for Embodied Agents | cs.AI | 0 | Open |
| Hidden in Plain Sight: Visual-to-Symbolic Analytical Solution Inference from Field Visualizations | cs.AI | 0 | Open |
| SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks | cs.AI | 0 | Open |
| AI-Induced Human Responsibility (AIHR) in AI-Human teams | cs.AI | 0 | Open |
| AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models | cs.AI | 0 | Open |
| MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classification | cs.AI | 0 | Open |
| Semantic Channel Theory: Deductive Compression and Structural Fidelity for Multi-Agent Communication | cs.AI | 0 | Open |
๐ข Lab Blog Posts
- OpenAI: ChatGPT for customer success teams
- OpenAI: Using skills
- OpenAI: Healthcare
- OpenAI: Our response to the Axios developer tool compromise
- OpenAI: Using projects in ChatGPT
- OpenAI: ChatGPT for research
- OpenAI: Personalizing ChatGPT
- OpenAI: Brainstorming with ChatGPT
- OpenAI: Writing with ChatGPT
- OpenAI: Responsible and safe use of AI
- OpenAI: Research with ChatGPT
- OpenAI: Getting started with ChatGPT
- OpenAI: Creating images with ChatGPT
- OpenAI: ChatGPT for operations teams
- OpenAI: Analyzing data with ChatGPT
- OpenAI: ChatGPT for managers
- OpenAI: Financial services
- OpenAI: AI fundamentals
- OpenAI: Applications of AI at OpenAI
- OpenAI: Prompting fundamentals
- OpenAI: ChatGPT for sales teams
- OpenAI: ChatGPT for finance teams
- OpenAI: Working with files in ChatGPT
- OpenAI: Using custom GPTs