π΄ High Significance
Model Releases
π΄ Helios: Real Real-Time Long Video Generation Model β score 85
Sources: huggingface
We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We make breakthroughs along three key dimensions: (1) robustness to long-video drifting without commonly u
π΄ T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning β score 75
Sources: huggingface
Think about how human handles complex reading tasks: marking key points, inferring their relationships, and structuring information to guide understanding and responses. Likewise, can a large language model benefit from text structure to enhance text-processing performance? To explore it, in this wo
Developer Tools
π΄ Heterogeneous Agent Collaborative Reinforcement Learning β score 95
Sources: huggingface
We introduce Heterogeneous Agent Collaborative Reinforcement Learning (HACRL), a new learning paradigm that addresses the inefficiencies of isolated on-policy optimization. HACRL enables collaborative optimization with independent execution: heterogeneous agents share verified rollouts during traini
π‘ Notable
Model Releases
π‘ Introducing GPT-5.4 β score 50
Sources: lab_blog/OpenAI
Introducing GPT-5.4, OpenAIβs most most capable and efficient frontier model for professional work, with state-of-the-art coding, computer use, tool search, and 1M-token context.
π‘ GPT-5.4 Thinking System Card β score 50
Sources: lab_blog/OpenAI
π‘ Introducing the Adoption news channel β score 50
Sources: lab_blog/OpenAI
Practical insights and frameworks to turn AI progress into business advantage
π‘ Introducing ChatGPT for Excel and new financial data integrations β score 50
Sources: lab_blog/OpenAI
OpenAI introduces ChatGPT for Excel and new financial app integrations, powered by GPT-5.4 to accelerate modeling, research, and analysis in regulated environments.
π‘ VfL Wolfsburg turns ChatGPT into a club-wide capability β score 50
Sources: lab_blog/OpenAI
By focusing on people, not pilots, the Bundesliga club is scaling efficiency, creativity, and knowledgeβwithout losing its football identity.
Developer Tools
π‘ Proact-VL: A Proactive VideoLLM for Real-Time AI Companions β score 65
Sources: huggingface
Proactive and real-time interactive experiences are essential for human-like AI companions, yet face three key challenges: (1) achieving low-latency inference under continuous streaming inputs, (2) autonomously deciding when to respond, and (3) controlling both quality and quantity of generated cont
π‘ MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning β score 55
Sources: huggingface
As Large Language Models (LLMs) are increasingly used for long-duration tasks, maintaining effective long-term memory has become a critical challenge. Current methods often face a trade-off between cost and accuracy. Simple storage methods often fail to retrieve relevant information, while complex i
π‘ ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors β score 45
Sources: huggingface
Synthesizing physically plausible articulated human-object interactions (HOI) without 3D/4D supervision remains a fundamental challenge. While recent zero-shot approaches leverage video diffusion models to synthesize human-object interactions, they are largely confined to rigid-object manipulation a
Research Papers
π‘ Reasoning Models Struggle to Control their Chains of Thought β score 60
Sources: arxiv/cs.AI Β· lab_blog/OpenAI
Chain-of-thought (CoT) monitoring is a promising tool for detecting misbehaviors and understanding the motivations of modern reasoning models. However, if models can control what they verbalize in their CoT, it could undermine CoT monitorability. To measure this undesirable capability -- CoT control
Other Signals
π‘ Ensuring AI use in education leads to opportunity β score 50
Sources: lab_blog/OpenAI
OpenAI shares new tools, certifications, and measurement resources to help schools and universities close AI capability gaps and expand opportunity.
π‘ The five AI value models driving business reinvention β score 50
Sources: lab_blog/OpenAI
Five AI value models show how leaders can sequence AI from workforce fluency to process reinvention and build durable business advantage.
π’ Incremental
Model Releases
π’ Phi-4-reasoning-vision-15B Technical Report β score 30
Sources: huggingface
We present Phi-4-reasoning-vision-15B, a compact open-weight multimodal reasoning model, and share the motivations, design choices, experiments, and learnings that informed its development. Our goal is to contribute practical insight to the research community on building smaller, efficient multimoda
Developer Tools
π’ Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory β score 30
Sources: huggingface
Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool outputs and intermediate reasoning in-context quickly becomes infeasible: the working context becomes prohibitively long, eventually exceeds the cont
π’ Specificity-aware reinforcement learning for fine-grained open-world classification β score 15
Sources: huggingface
Classifying fine-grained visual concepts under open-world settings, i.e., without a predefined label set, demands models to be both accurate and specific. Recent reasoning Large Multimodal Models (LMMs) exhibit strong visual understanding capability but tend to produce overly generic predictions whe
π’ V_1: Unifying Generation and Self-Verification for Parallel Reasoners β score 5
Sources: huggingface
Test-time scaling for complex reasoning tasks shows that leveraging inference-time compute, by methods such as independently sampling and aggregating multiple solutions, results in significantly better task outcomes. However, a critical bottleneck is verification: sampling is only effective if corre
π New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Heterogeneous Agent Collaborative Reinforcement Learning | developer_tool | 202 | Open |
| Helios: Real Real-Time Long Video Generation Model | model_release | 192 | Open |
| T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning | model_release | 125 | Open |
| Proact-VL: A Proactive VideoLLM for Real-Time AI Companions | developer_tool | 45 | Open |
| Reasoning Models Struggle to Control their Chains of Thought | cs.AI | 100 | Open |
| MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning | developer_tool | 35 | Open |
| Why the Brain Consolidates: Predictive Forgetting for Optimal Generalisation | cs.AI | 0 | Open |
| Spatial Competence Benchmark | cs.AI | 0 | Open |
| Hate Speech Detection using Large Language Models with Data Augmentation and Feature Enhancement | cs.AI | 0 | Open |
| Detection of Illicit Content on Online Marketplaces using Large Language Models | cs.AI | 0 | Open |
| When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper | cs.AI | 0 | Open |
| Probabilistic Dreaming for World Models | cs.AI | 0 | Open |
| AI-Assisted Moot Courts: Simulating Justice-Specific Questioning in Oral Arguments | cs.AI | 0 | Open |
| Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating AI Models | cs.AI | 0 | Open |
| From Offline to Periodic Adaptation for Pose-Based Shoplifting Detection in Real-world Retail Security | cs.AI | 0 | Open |
π’ Lab Blog Posts
- OpenAI: Introducing GPT-5.4
- OpenAI: GPT-5.4 Thinking System Card
- OpenAI: Ensuring AI use in education leads to opportunity
- OpenAI: Introducing the Adoption news channel
- OpenAI: Introducing ChatGPT for Excel and new financial data integrations
- OpenAI: The five AI value models driving business reinvention
- OpenAI: VfL Wolfsburg turns ChatGPT into a club-wide capability