๐Ÿ”ด High Significance

Developer Tools

๐Ÿ”ด Lost in Stories: Consistency Bugs in Long Story Generation by LLMs โ€” score 95 Sources: huggingface

What happens when a storyteller forgets its own story? Large Language Models (LLMs) can now generate narratives spanning tens of thousands of words, but they often fail to maintain consistency throughout. When generating long-form narratives, these models can contradict their own established facts,

๐Ÿ”ด Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence โ€” score 85 Sources: huggingface

The pursuit of spatial intelligence fundamentally relies on access to large-scale, fine-grained 3D data. However, existing approaches predominantly construct spatial understanding benchmarks by generating question-answer (QA) pairs from a limited number of manually annotated datasets, rather than sy

๐Ÿ”ด LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory โ€” score 75 Sources: huggingface

Feedforward geometric foundation models achieve strong short-window reconstruction, yet scaling them to minutes-long videos is bottlenecked by quadratic attention complexity or limited effective memory in recurrent designs. We present LoGeR (Long-context Geometric Reconstruction), a novel architectu

๐ŸŸก Notable

Model Releases

๐ŸŸก How Far Can Unsupervised RLVR Scale LLM Training? โ€” score 65 Sources: huggingface

Unsupervised reinforcement learning with verifiable rewards (URLVR) offers a pathway to scale LLM training beyond the supervision bottleneck by deriving rewards without ground truth labels. Recent works leverage model intrinsic signals, showing promising early gains, yet their potential and limitati

๐ŸŸก New ways to learn math and science in ChatGPT โ€” score 50 Sources: lab_blog/OpenAI

ChatGPT introduces interactive visual explanations for math and science, helping students explore formulas, variables, and concepts in real time.

Developer Tools

๐ŸŸก Believe Your Model: Distribution-Guided Confidence Calibration โ€” score 55 Sources: huggingface

Large Reasoning Models have demonstrated remarkable performance with the advancement of test-time scaling techniques, which enhances prediction accuracy by generating multiple candidate responses and selecting the most reliable answer. While prior work has analyzed that internal model signals like c

๐ŸŸก CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation โ€” score 45 Sources: huggingface

Recent advancements in Unified Multimodal Models (UMMs) have significantly advanced text-to-image (T2I) generation, particularly through the integration of Chain-of-Thought (CoT) reasoning. However, existing CoT-based T2I methods largely rely on abstract natural-language planning, which lacks the pr

Other Signals

๐ŸŸก Improving instruction hierarchy in frontier LLMs โ€” score 50 Sources: lab_blog/OpenAI

IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.

๐ŸŸข Incremental

Developer Tools

๐ŸŸข CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing โ€” score 35 Sources: huggingface

Unified diffusion editors often rely on a fixed, shared backbone for diverse tasks, suffering from task interference and poor adaptation to heterogeneous demands (e.g., local vs global, semantic vs photometric). In particular, prevalent ControlNet and OmniControl variants combine multiple conditioni

๐ŸŸข HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising โ€” score 25 Sources: huggingface

Autoregressive (AR) diffusion offers a promising framework for generating videos of theoretically infinite length. However, a major challenge is maintaining temporal continuity while preventing the progressive quality degradation caused by error accumulation. To ensure continuity, existing methods t

๐ŸŸข $OneMillion-Bench: How Far are Language Agents from Human Experts? โ€” score 15 Sources: huggingface

As language models (LMs) evolve from chat assistants to long-horizon agents capable of multi-step reasoning and tool use, existing benchmarks remain largely confined to structured or exam-style tasks that fall short of real-world professional demands. To this end, we introduce \OneMillion-Bench OneM

๐ŸŸข NLE: Non-autoregressive LLM-based ASR by Transcript Editing โ€” score 5 Sources: huggingface

While autoregressive (AR) LLM-based ASR systems achieve strong accuracy, their sequential decoding limits parallelism and incurs high latency. We propose NLE, a non-autoregressive (NAR) approach that formulates speech recognition as conditional transcript editing, enabling fully parallel prediction.

๐Ÿ“„ New Papers

TitleCategoryScoreLink
Lost in Stories: Consistency Bugs in Long Story Generation by LLMsdeveloper_tool98Open
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligencedeveloper_tool91Open
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memorydeveloper_tool69Open
How Far Can Unsupervised RLVR Scale LLM Training?model_release63Open
Believe Your Model: Distribution-Guided Confidence Calibrationdeveloper_tool44Open
WS-Net: Weak-Signal Representation Learning and Gated Abundance Reconstruction for Hyperspectral Unmixing via State-Space and Weak Signal Attention Fusioncs.AI0Open
The Epistemic Support-Point Filter: Jaynesian Maximum Entropy Meets Popperian Falsificationcs.AI0Open
Time, Identity and Consciousness in Language Model Agentscs.AI0Open
Quantifying Gender Bias in Large Language Models: When ChatGPT Becomes a Hiring Managercs.AI0Open
EPOCH: An Agentic Protocol for Multi-Round System Optimizationcs.AI0Open
From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoringcs.AI0Open
Sim2Act: Robust Simulation-to-Decision Learning via Adversarial Calibration and Group-Relative Perturbationcs.AI0Open
From Scalars to Tensors: Declared Losses Recover Epistemic Distinctions That Neutrosophic Scalars Cannot Expresscs.AI0Open
A Text-Native Interface for Generative Video Authoringcs.AI0Open
GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Modelscs.AI0Open

๐Ÿข Lab Blog Posts