๐ด High Significance
Model Releases
๐ด SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale โ score 75
Sources: huggingface
Software engineering agents (SWE) are improving rapidly, with recent gains largely driven by reinforcement learning (RL). However, RL training is constrained by the scarcity of large-scale task collections with reproducible execution environments and reliable test suites. Although a growing number o
Developer Tools
๐ด OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens โ score 95
Sources: huggingface
OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON formatting for both shapes and animation behaviors representation. However, the raw Lottie JSON fil
๐ด From Scale to Speed: Adaptive Test-Time Scaling for Image Editing โ score 85
Sources: huggingface
Image Chain-of-Thought (Image-CoT) is a test-time scaling paradigm that improves image generation by extending inference time. Most Image-CoT methods focus on text-to-image (T2I) generation. Unlike T2I generation, image editing is goal-directed: the solution space is constrained by the source image
๐ก Notable
Model Releases
๐ก RubricBench: Aligning Model-Generated Rubrics with Human Standards โ score 65
Sources: huggingface
As Large Language Model (LLM) alignment evolves from simple completions to complex, highly sophisticated generation, Reward Models are increasingly shifting toward rubric-guided evaluation to mitigate surface-level biases. However, the community lacks a unified benchmark to assess this evaluation pa
๐ก CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning โ score 55
Sources: huggingface
Large Language Models (LLMs) have recently exhibited remarkable reasoning capabilities, largely enabled by supervised fine-tuning (SFT)- and reinforcement learning (RL)-based post-training on high-quality reasoning data. However, reproducing and extending these capabilities in open and scalable sett
๐ก GPT-5.3 Instant System Card โ score 50
Sources: lab_blog/OpenAI
๐ก GPT-5.3 Instant: Smoother, more useful everyday conversations โ score 50
Sources: lab_blog/OpenAI
๐ก Gemini 3.1 Flash-Lite: Built for intelligence at scale โ score 50
Sources: lab_blog/DeepMind
Gemini 3.1 Flash-Lite is our fastest and most cost-efficient Gemini 3 series model yet.
Developer Tools
๐ก OpenAutoNLU: Open Source AutoML Library for NLU โ score 45
Sources: huggingface
OpenAutoNLU is an open-source automated machine learning library for natural language understanding (NLU) tasks, covering both text classification and named entity recognition (NER). Unlike existing solutions, we introduce data-aware training regime selection that requires no manual configuration fr
๐ข Incremental
Model Releases
๐ข MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning โ score 35
Sources: huggingface
Recent progress in the reasoning capabilities of multimodal large language models (MLLMs) has empowered them to address more complex tasks such as scientific analysis and mathematical reasoning. Despite their promise, MLLMs' reasoning abilities across different scenarios in real life remain largely
Developer Tools
๐ข VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection โ score 25
Sources: huggingface
Current multi-view indoor 3D object detectors rely on sensor geometry that is costly to obtain (i.e., precisely calibrated multi-view camera poses) to fuse multi-view information into a global scene representation, limiting deployment in real-world scenes. We target a more practical setting: Sensor-
๐ข CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification โ score 5
Sources: huggingface
Developing multi-turn interactive tool-use agents is challenging because real-world user needs are often complex and ambiguous, yet agents must execute deterministic actions to satisfy them. To address this gap, we introduce CoVe (Constraint-Verification), a post-training data synthesis framework de
Infrastructure & Compute
๐ข CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction โ score 15
Sources: huggingface
While music generation models have evolved to handle complex multimodal inputs mixing text, lyrics, and reference audio, evaluation mechanisms have lagged behind. In this paper, we bridge this critical gap by establishing a comprehensive ecosystem for music reward modeling under Compositional Multim
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens | developer_tool | 156 | Open |
| From Scale to Speed: Adaptive Test-Time Scaling for Image Editing | developer_tool | 143 | Open |
| SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale | model_release | 91 | Open |
| RubricBench: Aligning Model-Generated Rubrics with Human Standards | model_release | 67 | Open |
| CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning | model_release | 59 | Open |
| PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference | cs.AI | 0 | Open |
| What Capable Agents Must Know: Selection Theorems for Robust Decision-Making under Uncertainty | cs.AI | 0 | Open |
| Form Follows Function: Recursive Stem Model | cs.AI | 0 | Open |
| Revealing Positive and Negative Role Models to Help People Make Good Decisions | cs.AI | 0 | Open |
| NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect | cs.AI | 0 | Open |
| Learning Object-Centric Spatial Reasoning for Sequential Manipulation in Cluttered Environments | cs.AI | 0 | Open |
| Human-Certified Module Repositories for the AI Age | cs.AI | 0 | Open |
| LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model | cs.AI | 0 | Open |
| Bridging Diffusion Guidance and Anderson Acceleration via Hopfield Dynamics | cs.AI | 0 | Open |
| A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities | cs.AI | 0 | Open |