๐ด High Significance
Model Releases
๐ด MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier โ score 85
Sources: huggingface
While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, P(hypothesis|background) (P(h|b)), unexplored. We demonstrate that directly training P(h|b) is
Developer Tools
๐ด SkillNet: Create, Evaluate, and Connect AI Skills โ score 95
Sources: huggingface
Current AI agents can flexibly invoke tools and execute complex tasks, yet their long-term advancement is hindered by the lack of systematic accumulation and transfer of skills. Without a unified mechanism for skill consolidation, agents frequently ``reinvent the wheel'', rediscovering solutions in
๐ด DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval โ score 75
Sources: huggingface
Large Language Model (LLM) agents can automate data-science workflows, but many rigorous statistical methods implemented in R remain underused because LLMs struggle with statistical knowledge and tool retrieval. Existing retrieval-augmented approaches focus on function-level semantics and ignore dat
๐ก Notable
Model Releases
๐ก AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios โ score 65
Sources: huggingface
Real-world multimodal agents solve multi-step workflows grounded in visual evidence. For example, an agent can troubleshoot a device by linking a wiring photo to a schematic and validating the fix with online documentation, or plan a trip by interpreting a transit map and checking schedules under ro
Developer Tools
๐ก RoboPocket: Improve Robot Policies Instantly with Your Phone โ score 55
Sources: huggingface
Scaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predominantly operate in an open-loop manner: operators blindly collect demonstrations without knowing th
๐ก Codex Security: now in research preview โ score 50
Sources: lab_blog/OpenAI
Codex Security is an AI application security agent that analyzes project context to detect, validate, and patch complex vulnerabilities with higher confidence and less noise.
๐ก How Descript engineers multilingual video dubbing at scale โ score 50
Sources: lab_blog/OpenAI
Using OpenAI reasoning models, Descript unlocked automatic localization of large content libraries without losing timing or meaning.
๐ก How Balyasny Asset Management built an AI research engine โ score 50
Sources: lab_blog/OpenAI
By combining rigorous model evaluation, full-platform use of OpenAI, and agent workflows, Balyasny is reinventing investment research.
๐ก MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models โ score 45
Sources: huggingface
Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and ide
๐ข Incremental
Model Releases
๐ข Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling โ score 5
Sources: huggingface
We introduce Timer-S1, a strong Mixture-of-Experts (MoE) time series foundation model with 8.3B total parameters, 0.75B activated parameters for each token, and a context length of 11.5K. To overcome the scalability bottleneck in existing pre-trained time series foundation models, we perform Serial
Developer Tools
๐ข HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images โ score 35
Sources: huggingface
Human-product images, which showcase the integration of humans and products, play a vital role in advertising, e-commerce, and digital marketing. The essential challenge of generating such images lies in ensuring the high-fidelity preservation of product details. Among existing paradigms, reference-
๐ข Interactive Benchmarks โ score 25
Sources: huggingface
Standard benchmarks have become increasingly unreliable due to saturation, subjectivity, and poor generalization. We argue that evaluating model's ability to acquire information actively is important to assess model's intelligence. We propose Interactive Benchmarks, a unified evaluation paradigm tha
๐ข Large Multimodal Models as General In-Context Classifiers โ score 15
Sources: huggingface
Which multimodal model should we use for classification? Previous studies suggest that the answer lies in CLIP-like contrastive Vision-Language Models (VLMs), due to their remarkable performance in zero-shot classification. In contrast, Large Multimodal Models (LMM) are more suitable for complex tas
๐ New Papers
| Title | Category | Score | Link |
|---|---|---|---|
| SkillNet: Create, Evaluate, and Connect AI Skills | developer_tool | 99 | Open |
| MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier | model_release | 95 | Open |
| DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval | developer_tool | 56 | Open |
| AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios | model_release | 47 | Open |
| RoboPocket: Improve Robot Policies Instantly with Your Phone | developer_tool | 39 | Open |
| Bridging Domains through Subspace-Aware Model Merging | cs.AI | 0 | Open |
| Depth Charge: Jailbreak Large Language Models from Deep Safety Attention Heads | cs.AI | 0 | Open |
| Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models | cs.AI | 0 | Open |
| PVminerLLM: Structured Extraction of Patient Voice from Patient-Generated Text using Large Language Models | cs.AI | 0 | Open |
| Balancing Domestic and Global Perspectives: Evaluating Dual-Calibration and LLM-Generated Nudges for Diverse News Recommendation | cs.AI | 0 | Open |
| Visual Words Meet BM25: Sparse Auto-Encoder Visual Word Scoring for Image Retrieval | cs.AI | 0 | Open |
| Proof-of-Guardrail in AI Agents and What (Not) to Trust from It | cs.AI | 0 | Open |
| ProtAlign: Contrastive learning paradigm for Sequence and structure alignment | cs.AI | 0 | Open |
| AWPD: Frequency Shield Network for Agnostic Watermark Presence Detection | cs.AI | 0 | Open |
| Bi Directional Feedback Fusion for Activity Aware Forecasting of Indoor CO2 and PM2.5 | cs.AI | 0 | Open |