๐Ÿ”ด High Significance

Model Releases

๐Ÿ”ด WorldMark: A Unified Benchmark Suite for Interactive Video World Models โ€” score 85 Sources: huggingface

Interactive video generation models such as Genie, YUME, HY-World, and Matrix-Game are advancing rapidly, yet every model is evaluated on its own benchmark with private scenes and trajectories, making fair cross-model comparison impossible. Existing public benchmarks offer useful metrics such as tra

๐Ÿ”ด UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling โ€” score 75 Sources: huggingface

Scaling humanoid foundation models is bottlenecked by the scarcity of robotic data. While massive egocentric human data offers a scalable alternative, bridging the cross-embodiment chasm remains a fundamental challenge due to kinematic mismatches. We introduce UniT (Unified Latent Action Tokenizer v

Developer Tools

๐Ÿ”ด LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics โ€” score 95 Sources: huggingface

Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Model

๐ŸŸก Notable

Developer Tools

๐ŸŸก StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition โ€” score 65 Sources: huggingface

Creative face stylization aims to render portraits in diverse visual idioms such as cartoons, sketches, and paintings while retaining recognizable identity. However, current identity encoders, which are typically trained and calibrated on natural photographs, exhibit severe brittleness under styliza

๐ŸŸก Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks โ€” score 55 Sources: huggingface

Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under delayed rewards and partial observability. Games are a good testbed f

๐ŸŸก Seeing Fast and Slow: Learning the Flow of Time in Videos โ€” score 45 Sources: huggingface

How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a l

๐ŸŸข Incremental

Model Releases

๐ŸŸข VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation โ€” score 35 Sources: huggingface

Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recovery. We present VLAA-GUI, a modular GUI agentic framework built around t

Developer Tools

๐ŸŸข TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale โ€” score 25 Sources: huggingface

Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user trust. While customer incidents serve as a vital signal for discovering risks missed by monitoring,

๐ŸŸข EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model โ€” score 15 Sources: huggingface

We propose EditCrafter, a high-resolution image editing method that operates without tuning, leveraging pretrained text-to-image (T2I) diffusion models to process images at resolutions significantly exceeding those used during training. Leveraging the generative priors of large-scale T2I diffusion m

๐ŸŸข Context Unrolling in Omni Models โ€” score 5 Sources: huggingface

We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing

๐Ÿ“„ New Papers

TitleCategoryScoreLink
LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semanticsdeveloper_tool87Open
WorldMark: A Unified Benchmark Suite for Interactive Video World Modelsmodel_release39Open
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modelingmodel_release32Open
StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognitiondeveloper_tool28Open
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasksdeveloper_tool23Open
When AI Speaks, Whose Values Does It Express? A Cross-Cultural Audit of Individualism-Collectivism Bias in Large Language Modelscs.AI0Open
Reliable Self-Harm Risk Screening via Adaptive Multi-Agent LLM Systemscs.AI0Open
PrivSTRUCT: Untangling Data Purpose Compliance of Privacy Policies in Google Play Storecs.AI0Open
GenMatter: Perceiving Physical Objects with Generative Matter Modelscs.AI0Open
Estimating Tail Risks in Language Model Output Distributionscs.AI0Open
ReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendationcs.AI0Open
Beyond Single-Agent Alignment: Preventing Context-Fragmented Violations in Multi-Agent Systemscs.AI0Open
ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compressioncs.AI0Open
From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identificationcs.AI0Open
MTServe: Efficient Serving for Generative Recommendation Models with Hierarchical Cachescs.AI0Open