๐ด High Significance
Model Releases
๐ด WorldMark: A Unified Benchmark Suite for Interactive Video World Models โ score 85
Sources: huggingface
Interactive video generation models such as Genie, YUME, HY-World, and Matrix-Game are advancing rapidly, yet every model is evaluated on its own benchmark with private scenes and trajectories, making fair cross-model comparison impossible. Existing public benchmarks offer useful metrics such as tra
๐ด UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling โ score 75
Sources: huggingface
Scaling humanoid foundation models is bottlenecked by the scarcity of robotic data. While massive egocentric human data offers a scalable alternative, bridging the cross-embodiment chasm remains a fundamental challenge due to kinematic mismatches. We introduce UniT (Unified Latent Action Tokenizer v
Developer Tools
๐ด LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics โ score 95
Sources: huggingface
Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Model
๐ก Notable
Developer Tools
๐ก StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition โ score 65
Sources: huggingface
Creative face stylization aims to render portraits in diverse visual idioms such as cartoons, sketches, and paintings while retaining recognizable identity. However, current identity encoders, which are typically trained and calibrated on natural photographs, exhibit severe brittleness under styliza
๐ก Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks โ score 55
Sources: huggingface
Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under delayed rewards and partial observability. Games are a good testbed f
๐ก Seeing Fast and Slow: Learning the Flow of Time in Videos โ score 45
Sources: huggingface
How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a l
๐ข Incremental
Model Releases
๐ข VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation โ score 35
Sources: huggingface
Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same failing actions without recovery. We present VLAA-GUI, a modular GUI agentic framework built around t
Developer Tools
๐ข TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale โ score 25
Sources: huggingface
Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user trust. While customer incidents serve as a vital signal for discovering risks missed by monitoring,
๐ข EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model โ score 15
Sources: huggingface
We propose EditCrafter, a high-resolution image editing method that operates without tuning, leveraging pretrained text-to-image (T2I) diffusion models to process images at resolutions significantly exceeding those used during training. Leveraging the generative priors of large-scale T2I diffusion m
๐ข Context Unrolling in Omni Models โ score 5
Sources: huggingface
We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics | developer_tool | 87 | Open |
| WorldMark: A Unified Benchmark Suite for Interactive Video World Models | model_release | 39 | Open |
| UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling | model_release | 32 | Open |
| StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition | developer_tool | 28 | Open |
| Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks | developer_tool | 23 | Open |
| When AI Speaks, Whose Values Does It Express? A Cross-Cultural Audit of Individualism-Collectivism Bias in Large Language Models | cs.AI | 0 | Open |
| Reliable Self-Harm Risk Screening via Adaptive Multi-Agent LLM Systems | cs.AI | 0 | Open |
| PrivSTRUCT: Untangling Data Purpose Compliance of Privacy Policies in Google Play Store | cs.AI | 0 | Open |
| GenMatter: Perceiving Physical Objects with Generative Matter Models | cs.AI | 0 | Open |
| Estimating Tail Risks in Language Model Output Distributions | cs.AI | 0 | Open |
| ReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation | cs.AI | 0 | Open |
| Beyond Single-Agent Alignment: Preventing Context-Fragmented Violations in Multi-Agent Systems | cs.AI | 0 | Open |
| ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression | cs.AI | 0 | Open |
| From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification | cs.AI | 0 | Open |
| MTServe: Efficient Serving for Generative Recommendation Models with Hierarchical Caches | cs.AI | 0 | Open |