๐ด High Significance
Developer Tools
๐ด Lost in Stories: Consistency Bugs in Long Story Generation by LLMs โ score 95
Sources: huggingface
What happens when a storyteller forgets its own story? Large Language Models (LLMs) can now generate narratives spanning tens of thousands of words, but they often fail to maintain consistency throughout. When generating long-form narratives, these models can contradict their own established facts,
๐ด Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence โ score 85
Sources: huggingface
The pursuit of spatial intelligence fundamentally relies on access to large-scale, fine-grained 3D data. However, existing approaches predominantly construct spatial understanding benchmarks by generating question-answer (QA) pairs from a limited number of manually annotated datasets, rather than sy
๐ด LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory โ score 75
Sources: huggingface
Feedforward geometric foundation models achieve strong short-window reconstruction, yet scaling them to minutes-long videos is bottlenecked by quadratic attention complexity or limited effective memory in recurrent designs. We present LoGeR (Long-context Geometric Reconstruction), a novel architectu
๐ก Notable
Model Releases
๐ก How Far Can Unsupervised RLVR Scale LLM Training? โ score 65
Sources: huggingface
Unsupervised reinforcement learning with verifiable rewards (URLVR) offers a pathway to scale LLM training beyond the supervision bottleneck by deriving rewards without ground truth labels. Recent works leverage model intrinsic signals, showing promising early gains, yet their potential and limitati
๐ก New ways to learn math and science in ChatGPT โ score 50
Sources: lab_blog/OpenAI
ChatGPT introduces interactive visual explanations for math and science, helping students explore formulas, variables, and concepts in real time.
Developer Tools
๐ก Believe Your Model: Distribution-Guided Confidence Calibration โ score 55
Sources: huggingface
Large Reasoning Models have demonstrated remarkable performance with the advancement of test-time scaling techniques, which enhances prediction accuracy by generating multiple candidate responses and selecting the most reliable answer. While prior work has analyzed that internal model signals like c
๐ก CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation โ score 45
Sources: huggingface
Recent advancements in Unified Multimodal Models (UMMs) have significantly advanced text-to-image (T2I) generation, particularly through the integration of Chain-of-Thought (CoT) reasoning. However, existing CoT-based T2I methods largely rely on abstract natural-language planning, which lacks the pr
Other Signals
๐ก Improving instruction hierarchy in frontier LLMs โ score 50
Sources: lab_blog/OpenAI
IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.
๐ข Incremental
Developer Tools
๐ข CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing โ score 35
Sources: huggingface
Unified diffusion editors often rely on a fixed, shared backbone for diverse tasks, suffering from task interference and poor adaptation to heterogeneous demands (e.g., local vs global, semantic vs photometric). In particular, prevalent ControlNet and OmniControl variants combine multiple conditioni
๐ข HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising โ score 25
Sources: huggingface
Autoregressive (AR) diffusion offers a promising framework for generating videos of theoretically infinite length. However, a major challenge is maintaining temporal continuity while preventing the progressive quality degradation caused by error accumulation. To ensure continuity, existing methods t
๐ข $OneMillion-Bench: How Far are Language Agents from Human Experts? โ score 15
Sources: huggingface
As language models (LMs) evolve from chat assistants to long-horizon agents capable of multi-step reasoning and tool use, existing benchmarks remain largely confined to structured or exam-style tasks that fall short of real-world professional demands. To this end, we introduce \OneMillion-Bench OneM
๐ข NLE: Non-autoregressive LLM-based ASR by Transcript Editing โ score 5
Sources: huggingface
While autoregressive (AR) LLM-based ASR systems achieve strong accuracy, their sequential decoding limits parallelism and incurs high latency. We propose NLE, a non-autoregressive (NAR) approach that formulates speech recognition as conditional transcript editing, enabling fully parallel prediction.
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Lost in Stories: Consistency Bugs in Long Story Generation by LLMs | developer_tool | 98 | Open |
| Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence | developer_tool | 91 | Open |
| LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory | developer_tool | 69 | Open |
| How Far Can Unsupervised RLVR Scale LLM Training? | model_release | 63 | Open |
| Believe Your Model: Distribution-Guided Confidence Calibration | developer_tool | 44 | Open |
| WS-Net: Weak-Signal Representation Learning and Gated Abundance Reconstruction for Hyperspectral Unmixing via State-Space and Weak Signal Attention Fusion | cs.AI | 0 | Open |
| The Epistemic Support-Point Filter: Jaynesian Maximum Entropy Meets Popperian Falsification | cs.AI | 0 | Open |
| Time, Identity and Consciousness in Language Model Agents | cs.AI | 0 | Open |
| Quantifying Gender Bias in Large Language Models: When ChatGPT Becomes a Hiring Manager | cs.AI | 0 | Open |
| EPOCH: An Agentic Protocol for Multi-Round System Optimization | cs.AI | 0 | Open |
| From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring | cs.AI | 0 | Open |
| Sim2Act: Robust Simulation-to-Decision Learning via Adversarial Calibration and Group-Relative Perturbation | cs.AI | 0 | Open |
| From Scalars to Tensors: Declared Losses Recover Epistemic Distinctions That Neutrosophic Scalars Cannot Express | cs.AI | 0 | Open |
| A Text-Native Interface for Generative Video Authoring | cs.AI | 0 | Open |
| GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models | cs.AI | 0 | Open |