๐Ÿ”ด High Significance

Model Releases

๐Ÿ”ด Browse CVPR 2026 papers on PapersWithCode [P] โ€” score 81 Sources: reddit/r/MachineLearning

https://preview.redd.it/se5nr2z7tt4h1.png?width=3046&format=png&auto=webp&s=7db15b73afb749da236e5bb50ff96372f6a3239b Hi, Niels here from the open-source team at Hugging Face. It's been 2 weeks since I [launched](https://www.reddit.com/r/MachineLearning/comments/1tgmwqr/reviving_paperswit

Developer Tools

๐Ÿ”ด What's the hardest part of operating AI agents at scale? โ€” score 83 Sources: reddit/r/AIAgents

Building agents seems to be getting easier thanks to frameworks and tooling, but operating them in production still feels like an open question. When something goes wrong, how do teams investigate it? How do you track tool usage, audit decisions, and understand why an agent took a particular action?

๐Ÿ”ด RTX Spark does not have 600GB/s Bandwith โ€” score 79 Sources: reddit/r/LocalLLaMA

Check the slides from Computex. Every outlet that reported 600GB/s is completely wrong. That is the NvLink speed like everyone here said.

๐Ÿ”ด AI Agent Guidelines for CS336 at Stanford โ€” score 79 Sources: hackernews

๐Ÿ”ด VibeCoding is becoming the biggest illusion in software engineering. โ€” score 74 Sources: reddit/r/AIAgents

โ€‹ People are celebrating: "I built a SaaS app in 4 hours." "Al replaced my backend team." "Production-ready with one prompt." But almost nobody shows what happens 3 months later. That's where the real engineering starts. The problem with vibe coding is simple: It optimizes for speed of cr

๐Ÿ”ด What LLM eval tools are people actually using in production? โ€” score 72 Sources: reddit/r/AIAgents

I've reached the point where manually checking outputs doesn't really scale anymore. Started looking at different evaluation tools, but honestly it's hard to tell which ones people are genuinely using versus which ones just look good in demos. For teams running LLM apps or agents in production: What

Infrastructure & Compute

๐Ÿ”ด NVIDIA GB300 Grace Blackwell Ultra pricetags โ€” score 71 Sources: reddit/r/LocalLLaMA

https://www.scan.co.uk/shop/ai-and-robotics/workstations-ai/nvidia-dgx-station

Research Papers

๐Ÿ”ด LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation โ€” score 75 Sources: huggingface

Autoregressive (AR) video diffusion enables variable-length synthesis, but long-horizon generation often suffers from accumulated errors and identity drift. For efficiency, existing methods commonly adopt sliding-window attention during generation. This creates an irreversible generation trajectory:

Other Signals

๐Ÿ”ด Stop asking what model to run. There are literally only two. โ€” score 96 Sources: reddit/r/LocalLLaMA

Can we please ban the daily "I have an RTX 3060, what should I run?" slop threads? Itโ€™s not complicated. As of right now, Hugging Face is empty and exactly two local models exist on this entire planet: * Qwen 3.6 35b a3b * Qwen 3.6 27b That is the entire list. Your specs donโ€™t matter. Your u

๐Ÿ”ด CS336: Language Modeling from Scratch โ€” score 93 Sources: hackernews

๐Ÿ”ด I trusted random person on this subreddit and bought 3080 20gb made of chinesium โ€” score 88 Sources: reddit/r/LocalLLaMA

I don't know how long it will last, but it works, and I want 2 more now.

๐ŸŸก Notable

Model Releases

๐ŸŸก @xai: Composer 2.5 is now available inside Grok Build. Composer 2.5 is a fast, highly intelligent model that excels on long-running tasks and following complex instructions. โ€” score 60 Sources: twitter_rss

Composer 2.5 is now available inside Grok Build. Composer 2.5 is a fast, highly intelligent model that excels on long-running tasks and following complex instructions.

Developer Tools

๐ŸŸก How Bad MCP design cost your Agent 5ร— more tokens โ€” score 56 Sources: reddit/r/AIAgents

MCP is the golden standard for LLM Agent tools, but the quality of MCP tools design can dramatically impact the Agent's token and context window consumption. I recently did some experiments on two MCP implementations with identical functionalities, and found that one of them has really bad performan

๐ŸŸก OpenAI frontier models and Codex are now available on AWS โ€” score 50 Sources: hackernews

๐ŸŸก Codex is becoming a productivity tool for everyone โ€” score 50 Sources: lab_blog/OpenAI

The Next Era of Knowledge Work report explores how Codex is transforming productivity through AI-powered research, data analysis, workflow automation, and content creation.

๐ŸŸก Our views on AI policy and political advocacy โ€” score 50 Sources: lab_blog/OpenAI

Our approach to AI policy and political advocacy, transparency, support for thoughtful regulation and AI safety, and that no outside political group speaks on the companyโ€™s behalf.

๐ŸŸก @AnthropicAI: Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission. Pending completion of SEC review, this gives us the option to pursue an initial pu โ€” score 50 Sources: twitter_rss

Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission. Pending completion of SEC review, this gives us the option to pursue an initial public offering. Read more: https://www.anthropic.com/news/confidential-draft-s1-sec

Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

๐ŸŸก Finetuning a Reasoning LLM with Supervised or Reinforcement Learning? [D] โ€” score 69 Sources: reddit/r/MachineLearning

Hello, I have a task to fine-tune small LLMs on annotated conversational data. The dataset contains not only the final answers, but also reasoning traces and tool-calling decisions (i.e., when the model should think and when it should call a tool). I am wondering what the best training approach woul

๐ŸŸก Building the infrastructure for the Intelligence Age in Michigan โ€” score 50 Sources: lab_blog/OpenAI

OpenAI breaks ground on a 1GW data center project in Michigan as part of Stargate, building AI infrastructure to expand access, create jobs, and support communities.

Research Papers

๐ŸŸก FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search โ€” score 60 Sources: huggingface ยท arxiv/cs.CL

Agentic search requires language model agents to explore many sources and answer complex information-seeking questions. Scaling test-time compute is a promising way to improve these agents, but current approaches can fail, because correct answers are often sparse and score-based selection depends on

๐ŸŸก Can Predicted Dynamics Exist in the Physical World? โ€” score 60 Sources: huggingface ยท arxiv/cs.AI

Predictive Physical AI systems output state rollouts, action chunks, and latent plans, yet a low root-mean-square error (RMSE) does not imply that a particular proposal is physically executable. We formulate physical admissibility as a prediction-control interface: before execution, a decoded propos

๐ŸŸก Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation โ€” score 50 Sources: huggingface

Sentence embeddings are a foundational component for semantic search, clustering, classification, and retrieval-augmented generation. This paper presents embeddingmagibu-200m, a Turkish-focused sentence embedding model that produces 768-dimensional L2-normalized vectors and supports an 8,192-token c

๐ŸŸก EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers โ€” score 50 Sources: huggingface

This paper addresses the challenge of integrating 3D meshes as a native modality within Multimodal Large Language Models (MLLMs). Diffusion-based large reconstruction models decouple semantic understanding from geometric reasoning, operating as stateless reconstructors conditioned on dense 2D pixel

๐ŸŸก Confidence-Adaptive SwiGLU for Mixture-of-Experts โ€” score 42 Sources: huggingface ยท arxiv/cs.CL

SwiGLU has become a standard gated activation in modern Transformer MLPs, yet its gate sharpness -- the smoothness and selectivity of the gating function -- is typically fixed throughout training. In this work, we propose Confidence-Aware SwiGLU (ฮบ-SwiGLU), a variant of SwiGLU for Mixture-of-Experts

Omitted 1 additional research papers items from the main section; see raw data and source-specific sections below.

Other Signals

๐ŸŸก Can the stockmarket swallow Anthropic, SpaceX and OpenAI? โ€” score 64 Sources: hackernews

๐ŸŸก Man trains local model to detect and kill mosquitos with a laser โ€” score 62 Sources: reddit/r/LocalLLaMA

Now this is local AI innovation we can all get behind. https://x.com/stevencheng/status/2059836738449854898

๐ŸŸก I hate to be this guy but: Any good, recent CODING models in the 70-80B range? โ€” score 54 Sources: reddit/r/LocalLLaMA

  • 3x 24GB vram. - Qwen-coder-next is not bad. I'll continue to use it if you yell enough at me. - I do a lot of front-end work, which develops rapidly, so the most recent the model the better. - Larger than 80B and I'll have to sacrifice the decentish Q6 quant, or the minimum (for coding) 256k conte

๐ŸŸก @OpenAI: OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new way to build on Amazon Bedrock with OpenAI through the security, compliance, and governance workflows they โ€” score 50 Sources: twitter_rss

OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new way to build on Amazon Bedrock with OpenAI through the security, compliance, and governance workflows they already use. This is also the beginning of a broader expansion of OpenAI capabilities on AWS, inclu

๐ŸŸก Intel Arc Pro B70 llama.cpp benchmarks posted โ€” score 46 Sources: reddit/r/LocalLLaMA

https://www.reddit.com/r/LocalLLM/comments/1tuf6l1/intel_arc_pro_b70_llamacpp_sycl_63_ts_on_qwen/

๐ŸŸข Incremental

Model Releases

๐ŸŸข I asked each of my AI agents to describe their own role. The answers were surprisingly honest. โ€” score 36 Sources: reddit/r/AIAgents

I've been running a fleet of 5 agents (Claude, Gemini, Codex, Mistral, local Qwen) on a Mac Mini M4 for several months. They coordinate through a shared state layer I built called Flotilla. Last week a VC asked me who my team was. His face when I explained it was worth documenting. So I did somethin

Developer Tools

๐ŸŸข nvidia-LocateAnything-3B detects sushi as sweet in the video demo โ€” score 29 Sources: reddit/r/LocalLLaMA

https://preview.redd.it/xc0l68bj7t4h1.png?width=616&format=png&auto=webp&s=48a8b14bc4ae95700cd4efa76772f4e71fb2d41a https://huggingface.co/nvidia/LocateAnything-3B funny how they left this in the demo atleast it's honest

๐ŸŸข ICML 2026 | PIEVO: Overcoming Static Priors in AI Scientists via Principle-Evolvable Scientific Discovery (SOTA Solution Quality & 83.3% Faster Convergence) โ€” score 28 Sources: reddit/r/AIAgents

We are excited to share our latest framework, PIEVO (Principle-Evolvable scientific Discovery via Uncertainty Minimization), designed to address a fundamental limitation in current LLM-based scientific agents. # The Problem Existing AI Scientists (such as The AI Scientist, AI-Researcher, and

๐ŸŸข JetBrains open-sources Mellum2 - anyone tried these? โ€” score 12 Sources: reddit/r/LocalLLaMA

๐ŸŸข What's the status of non-CUDA inference? โ€” score 12 Sources: reddit/r/LocalLLaMA

I got a reminder e-Mail from eBay about a MI50 I had put on my watch list after quite a while. Aside from needing to jerryrig a blower into the back and bootstrapping ROCm - how is it? In fact, what's inference for LLMs like for non-CUDA? I know that image-gen is veeeeery hit or miss (although Comfy

๐ŸŸข The moment your AI agent's memory becomes load-bearing is the moment you realise you never built it to be infrastructure. โ€” score 6 Sources: reddit/r/AIAgents

No audit trail. No correction interface. No migration path. Just six months of accumulated context that everything downstream depends on and nobody fully understands anymore. When did your memory layer stop feeling like a feature and start feeling like a liability?

Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

๐ŸŸข Alphabet announces $80B equity capital raise to expand AI infra and compute โ€” score 21 Sources: hackernews

Business & Funding

๐ŸŸข I scraped over 2 million job postings across 100,000+ company career sites into a unified, daily-updated dataset. [P] โ€” score 12 Sources: reddit/r/MachineLearning

Over the past few months, I've been working on a high-scale scraping pipeline to aggregate listings directly from company job boards and applicant tracking systems. Mapping over 100,000 distinct companies to their career pages turned out to be a massive engineering headache, but it's finally stable.

Research Papers

๐ŸŸข ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and Formats โ€” score 15 Sources: huggingface

Charts are a primary medium for conveying quantitative and relational information, yet systematically evaluating chart parsing models remains difficult. Existing benchmarks focus on narrow chart types and leave diagrammatic structures such as flowcharts and mind maps largely unaddressed, while model

Other Signals

๐ŸŸข Real-time multilingual ASR using rolling buffers and monolingual models [P] โ€” score 36 Sources: reddit/r/MachineLearning

I built a routing-based approach to lightweight real-time multilingual ASR as part of my research at Gladia. The core problem was how multilingual models that accurately handle mid-conversation language switches are often too big for most local hardware and have poor accuracy. So rather than relying

๐ŸŸข Florida sues OpenAI and Sam Altman over AI risks โ€” score 36 Sources: hackernews

๐ŸŸข WiML at icml waitlist for travel funds [D] โ€” score 31 Sources: reddit/r/MachineLearning

presenting a poster there, and have registration covered. but they are placing me on waitlist for travel funds. As my travel depends on whether I get the travel grant, I need to get this off of my mind, either invite me or just say no. I'm waiting forever for this, more wait again? should i ask for

๐ŸŸข Building an AI assistant for a complex multi-repo backend system โ€” what's the right approach? โ€” score 28 Sources: reddit/r/AIAgents

I work on a distributed backend system split across multiple microservices in separate repos. Understanding how a failure propagates across services is non-trivial even for experienced team members. I've been using Claude Code with context files describing each service's role, key code paths, and go

๐ŸŸข Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro โ€” score 12 Sources: reddit/r/LocalLLaMA

Llama benchmark results |model|size|params|backend|ngl|threads|type_k|type_v|fa|test|t/s| |:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-| |qwen35moe 35B.A3B Q4_K - Medium|20.81 GiB|34.66 B|SYCL|99|1|q8_0|q8_0|1|pp512|977.40 ยฑ 2.02| |qwen35moe 35B.A3B Q4_K - Medium|20.81 GiB|34.66 B|SYCL|99|1|q8_0|q8_0|

Omitted 2 additional other signals items from the main section; see raw data and source-specific sections below.

๐Ÿ“„ New Papers

TitleCategoryHotnessLink
LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generationresearch_paper9Open
FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Searchresearch_paper3Open
Can Predicted Dynamics Exist in the Physical World?research_paper3Open
Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillationresearch_paper3Open
EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformersresearch_paper3Open
Position Paper: Post-Solve Robustness in Decision Engines: Feasible Regions and Smoothness Under Perturbationscs.AI0Open
Emergent Collaborative Deliberation in Multi-Model AI Systems: A BFT-Derived Protocol for Epistemic Synthesiscs.AI0Open
Deliberative Curation: A Protocol for Multi-Agent Knowledge Basescs.AI0Open
Agents on a Tree: Pathwise Coordination for Multi-Objective Molecular Optimizationcs.AI0Open
Optimal Transport-based Permutation-Invariant Bayesian Optimization of Offshore Wind Farm Layoutscs.AI0Open
MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attributioncs.AI0Open
Universal Quantum Transformercs.AI0Open
Grokers: Bottom-Up Inductive Comprehension and Write-Time Intelligence over Typed Knowledge Graphscs.AI0Open
Product-Aware Deep Autoencoders for Robust Process Monitoring in Multi-Product Cyber-Physical Systemscs.AI0Open
On the evolution of the concept of probability as a mirror of the evolution of reasoncs.AI0Open

๐Ÿข Lab Blog Posts

๐Ÿฆ Twitter/X Highlights

AccountTweet Summary
xaiComposer 2.5 is now available inside Grok Build. Composer 2.5 is a fast, highly intelligent model that excels on long-running tasks and following complex instructions. Post
AnthropicAIAnthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission. Pending completion of SEC review, this gives us the option to pursue an initial public offering. Read more: https://www.anthropic.com/news/confidential-draft-s1-sec Post
OpenAIOpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new way to build on Amazon Bedrock with OpenAI through the security, compliance, and governance workflows they already use. This is also the beginning of a broader expansion of OpenAI capabilities on AWS, inclu Post

Repeated From Recent Briefings