๐Ÿ”ด High Significance

Model Releases

๐Ÿ”ด SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing โ€” score 85 Sources: huggingface

Current instruction-guided video editing models struggle to simultaneously balance precise semantic modifications with faithful motion preservation. While existing approaches rely on injecting explicit external priors (e.g., VLM features or structural conditions) to mitigate these issues, this relia

๐Ÿ”ด Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation โ€” score 75 Sources: huggingface

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight

Developer Tools

๐Ÿ”ด Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding โ€” score 95 Sources: huggingface

While Multimodal Large Language Models demonstrate impressive semantic capabilities, they often suffer from spatial blindness, struggling with fine-grained geometric reasoning and physical dynamics. Existing solutions typically rely on explicit 3D modalities or complex geometric scaffolding, which a

๐ŸŸก Notable

Model Releases

๐ŸŸก FASTER: Rethinking Real-Time Flow VLAs โ€” score 55 Sources: huggingface

Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in act

Developer Tools

๐ŸŸก Memento-Skills: Let Agents Design Agents โ€” score 55 Sources: huggingface

We introduce Memento-Skills, a generalist, continually-learnable LLM agent system that functions as an agent-designing agent: it autonomously constructs, adapts, and improves task-specific agents through experience. The system is built on a memory-based reinforcement learning framework with stateful

๐ŸŸก 3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model โ€” score 55 Sources: huggingface

Creating dynamic, view-consistent videos of customized subjects is highly sought after for a wide range of emerging applications, including immersive VR/AR, virtual production, and next-generation e-commerce. However, despite rapid progress in subject-driven video generation, existing methods predom

๐ŸŸข Incremental

Model Releases

๐ŸŸข F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World โ€” score 5 Sources: huggingface

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on p

Developer Tools

๐ŸŸข Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer โ€” score 35 Sources: huggingface

Prior motion generation largely follows two paradigms: continuous diffusion models that excel at kinematic control, and discrete token-based generators that are effective for semantic conditioning. To combine their strengths, we propose a three-stage framework comprising condition feature extraction

๐ŸŸข MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction โ€” score 25 Sources: huggingface

Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articulation regression uns

๐ŸŸข Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens โ€” score 15 Sources: huggingface

Visual generation with discrete tokens has gained significant attention as it enables a unified token prediction paradigm shared with language models, promising seamless multimodal architectures. However, current discrete generation methods remain limited to low-dimensional latent tokens (typically

๐Ÿ“„ New Papers

TitleCategoryScoreLink
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understandingdeveloper_tool100Open
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editingmodel_release72Open
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillationmodel_release68Open
FASTER: Rethinking Real-Time Flow VLAsmodel_release61Open
Memento-Skills: Let Agents Design Agentsdeveloper_tool61Open
3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Modeldeveloper_tool61Open
Leveraging Machine Learning Techniques to Investigate Media and Information Literacy Competence in Tackling Disinformationcs.AI0Open
FDARxBench: Benchmarking Regulatory and Clinical Reasoning on FDA Generic Drug Assessmentcs.AI0Open
When Agents Disagree: The Selection Bottleneck in Multi-Agent LLM Pipelinescs.AI0Open
Subspace Kernel Learning on Tensor Sequencescs.AI0Open
AI in Work-Based Learning: Understanding the Purposes and Effects of Intelligent Tools Among Student Internscs.AI0Open
Plagiarism or Productivity? Students Moral Disengagement and Behavioral Intentions to Use ChatGPT in Academic Writingcs.AI0Open
Probing the Latent World: Emergent Discrete Symbols and Physical Structure in Latent Representationscs.AI0Open
Optimal Scalar Quantization for Matrix Multiplication: Closed-Form Density and Phase Transitioncs.AI0Open
Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Searchcs.AI0Open