π΄ High Significance
Model Releases
π΄ Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters β score 95
Sources: huggingface
We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with
π΄ GENIUS: Generative Fluid Intelligence Evaluation Suite β score 75
Sources: huggingface
Unified Multimodal Models (UMMs) have shown remarkable progress in visual generation. Yet, existing benchmarks predominantly assess Crystallized Intelligence, which relies on recalling accumulated knowledge and learned schemas. This focus overlooks Generative Fluid Intelligence (GFI): the capacity t
Developer Tools
π΄ VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval β score 85
Sources: huggingface
Recent studies have adapted generative Multimodal Large Language Models (MLLMs) into embedding extractors for vision tasks, typically through fine-tuning to produce universal representations. However, their performance on video remains inferior to Video Foundation Models (VFMs). In this paper, we fo
π‘ Notable
Model Releases
π‘ ASA: Training-Free Representation Engineering for Tool-Calling Agents β score 55
Sources: huggingface
Adapting LLM agents to domain-specific tool calling remains notably brittle under evolving interfaces. Prompt and schema engineering is easy to deploy but often fragile under distribution shift and strict parsers, while continual parameter-efficient fine-tuning improves reliability at the cost of tr
π‘ Introducing GPT-5.3-Codex-Spark β score 50
Sources: lab_blog/OpenAI
Introducing GPT-5.3-Codex-Sparkβour first real-time coding model. 15x faster generation, 128k context, now in research preview for ChatGPT Pro users.
π‘ Gemini 3 Deep Think: Advancing science, research and engineering β score 50
Sources: lab_blog/DeepMind
Our most specialized reasoning mode is now updated to solve modern science, research and engineering challenges.
π‘ Towards Autonomous Mathematics Research β score 45
Sources: huggingface
Recent advances in foundational models have yielded reasoning systems capable of achieving a gold-medal standard at the International Mathematical Olympiad. The transition from competition-level problem-solving to professional research, however, requires navigating vast literature and constructing l
Developer Tools
π‘ PhyCritic: Multimodal Critic Models for Physical AI β score 65
Sources: huggingface
With the rapid development of large multimodal models, reliable judge and critic models have become essential for open-ended evaluation and preference alignment, providing pairwise preferences, numerical scores, and explanatory justifications for assessing model-generated responses. However, existin
π’ Incremental
Model Releases
π’ TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions β score 10
Sources: huggingface
This paper proposes Omni Dense Captioning, a novel task designed to generate continuous, fine-grained, and structured audio-visual narratives with explicit timestamps. To ensure dense semantic coverage, we introduce a six-dimensional structural schema to create "script-like" captions, enabling reade
Developer Tools
π’ When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning β score 35
Sources: huggingface
While reasoning over long context is crucial for various real-world applications, it remains challenging for large language models (LLMs) as they suffer from performance degradation as the context length grows. Recent work MemAgent has tried to tackle this by processing context chunk-by-chunk in an
π’ How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning β score 25
Sources: huggingface
Decoder-only large language models are increasingly used as behavioral encoders for user representation learning, yet the impact of attention masking on the quality of user embeddings remains underexplored. In this work, we conduct a systematic study of causal, hybrid, and bidirectional attention ma
π’ G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design β score 10
Sources: huggingface
While Large Language Models (LLMs) have recently shown promise in Automated Heuristic Design (AHD), existing approaches typically formulate AHD around constructive priority rules or parameterized local search guidance, thereby restricting the search space to fixed heuristic forms. Such designs offer
π New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters | model_release | 202 | Open |
| VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval | developer_tool | 126 | Open |
| GENIUS: Generative Fluid Intelligence Evaluation Suite | model_release | 58 | Open |
| PhyCritic: Multimodal Critic Models for Physical AI | developer_tool | 57 | Open |
| ASA: Training-Free Representation Engineering for Tool-Calling Agents | model_release | 44 | Open |
| From Noise to Order: Learning to Rank via Denoising Diffusion | cs.AI | 0 | Open |
| Credit Where It is Due: Cross-Modality Connectivity Drives Precise Reinforcement Learning for MLLM Reasoning | cs.AI | 0 | Open |
| EM-Aware Physical Synthesis: Neural Inductor Modeling and Intelligent Placement & Routing for RF Circuits | cs.AI | 0 | Open |
| Compiler-Guided Inference-Time Adaptation: Improving GPT-5 Programming Performance in Idris | cs.AI | 0 | Open |
| Understanding Persuasive Interactions between Generative Social Agents and Humans: The Knowledge-based Persuasion Model (KPM) | cs.AI | 0 | Open |
| IMAGAgent: Orchestrating Multi-Turn Image Editing via Constraint-Aware Planning and Reflection | cs.AI | 0 | Open |
| RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis | cs.AI | 0 | Open |
| Multimodal Fact-Level Attribution for Verifiable Reasoning | cs.AI | 0 | Open |
| AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems | cs.AI | 0 | Open |
| Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt | cs.AI | 0 | Open |