๐ด High Significance
Model Releases
๐ด SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing โ score 85
Sources: huggingface
Current instruction-guided video editing models struggle to simultaneously balance precise semantic modifications with faithful motion preservation. While existing approaches rely on injecting explicit external priors (e.g., VLM features or structural conditions) to mitigate these issues, this relia
๐ด Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation โ score 75
Sources: huggingface
We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight
Developer Tools
๐ด Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding โ score 95
Sources: huggingface
While Multimodal Large Language Models demonstrate impressive semantic capabilities, they often suffer from spatial blindness, struggling with fine-grained geometric reasoning and physical dynamics. Existing solutions typically rely on explicit 3D modalities or complex geometric scaffolding, which a
๐ก Notable
Model Releases
๐ก FASTER: Rethinking Real-Time Flow VLAs โ score 55
Sources: huggingface
Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in act
Developer Tools
๐ก Memento-Skills: Let Agents Design Agents โ score 55
Sources: huggingface
We introduce Memento-Skills, a generalist, continually-learnable LLM agent system that functions as an agent-designing agent: it autonomously constructs, adapts, and improves task-specific agents through experience. The system is built on a memory-based reinforcement learning framework with stateful
๐ก 3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model โ score 55
Sources: huggingface
Creating dynamic, view-consistent videos of customized subjects is highly sought after for a wide range of emerging applications, including immersive VR/AR, virtual production, and next-generation e-commerce. However, despite rapid progress in subject-driven video generation, existing methods predom
๐ข Incremental
Model Releases
๐ข F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World โ score 5
Sources: huggingface
We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on p
Developer Tools
๐ข Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer โ score 35
Sources: huggingface
Prior motion generation largely follows two paradigms: continuous diffusion models that excel at kinematic control, and discrete token-based generators that are effective for semantic conditioning. To combine their strengths, we propose a three-stage framework comprising condition feature extraction
๐ข MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction โ score 25
Sources: huggingface
Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articulation regression uns
๐ข Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens โ score 15
Sources: huggingface
Visual generation with discrete tokens has gained significant attention as it enables a unified token prediction paradigm shared with language models, promising seamless multimodal architectures. However, current discrete generation methods remain limited to low-dimensional latent tokens (typically
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding | developer_tool | 100 | Open |
| SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing | model_release | 72 | Open |
| Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation | model_release | 68 | Open |
| FASTER: Rethinking Real-Time Flow VLAs | model_release | 61 | Open |
| Memento-Skills: Let Agents Design Agents | developer_tool | 61 | Open |
| 3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model | developer_tool | 61 | Open |
| Leveraging Machine Learning Techniques to Investigate Media and Information Literacy Competence in Tackling Disinformation | cs.AI | 0 | Open |
| FDARxBench: Benchmarking Regulatory and Clinical Reasoning on FDA Generic Drug Assessment | cs.AI | 0 | Open |
| When Agents Disagree: The Selection Bottleneck in Multi-Agent LLM Pipelines | cs.AI | 0 | Open |
| Subspace Kernel Learning on Tensor Sequences | cs.AI | 0 | Open |
| AI in Work-Based Learning: Understanding the Purposes and Effects of Intelligent Tools Among Student Interns | cs.AI | 0 | Open |
| Plagiarism or Productivity? Students Moral Disengagement and Behavioral Intentions to Use ChatGPT in Academic Writing | cs.AI | 0 | Open |
| Probing the Latent World: Emergent Discrete Symbols and Physical Structure in Latent Representations | cs.AI | 0 | Open |
| Optimal Scalar Quantization for Matrix Multiplication: Closed-Form Density and Phase Transition | cs.AI | 0 | Open |
| Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Search | cs.AI | 0 | Open |