🔴 High Significance
Model Releases
🔴 MTP on Unsloth — score 79
Sources: reddit/r/LocalLLaMA
https://huggingface.co/unsloth/Qwen3.6-27B-GGUF-MTP https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF-MTP Unsloth release the model with preserved MTP layer, but you still have to chec
Developer Tools
🔴 garrytan/gstack — Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA — score 86
Sources: github_trending
Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA
🔴 Is reproducing or implementing a paper considered research? [R] — score 81
Sources: reddit/r/MachineLearning
I completed my bachelors recently and I plan to applying to a masters program either this cycle or the next. Unfortunately, I did not publish any papers or do any research during my undergrad. Right now I’m in a research internship which is coming to and soon and it’s unlikely that I’ll get to publi
🔴 The biggest lie in AI agents right now is that more autonomy automatically means more value — score 79
Sources: reddit/r/AIAgents
I actually think the opposite is true lol the more autonomous an agent becomes, the more expensive every mistake gets when an agent is just generating text, bad outputs are annoying when an agent starts: * sending emails * editing records * touching customer data * operating browsers * triggering wo
Infrastructure & Compute
🔴 "This is the first documented instance of AI self-replication via hacking." ... "We ran an experiment with a single prompt: hack a machine and copy yourself. The AI broke in and copied itself onto a new computer. The copy then did this again, and kept on copying, forming a chain." — score 93
Sources: reddit/r/AIAgents
🔴 Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s — score 70
Sources: hackernews
Research Papers
🔴 TMAS: Scaling Test-Time Compute via Multi-Agent Synergy — score 82
Sources: huggingface · arxiv/cs.AI
Test-time scaling has become an effective paradigm for improving the reasoning ability of large language models by allocating additional computation during inference. Recent structured approaches have further advanced this paradigm by organizing inference across multiple trajectories, refinement rou
🔴 SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding — score 72
Sources: huggingface · arxiv/cs.CL
Speculative decoding speeds up autoregressive generation in Large Language Models (LLMs) through a two-step procedure, where a lightweight draft model proposes tokens which the target model then verifies in a single forward pass. Although the drafter network is small in modern architectures, its LM-
Other Signals
🔴 Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec — score 96
Sources: reddit/r/LocalLLaMA
As the title states, my build is indeed able to run a 1 trillion parameter model (in this case Kimi K2.5) locally at ~4 tokens/second. I thought r/LocalLLaMA would be interested in the build due to that stat line, and also due to the inclusion of an unusual part, Intel Optane Persistent Memory, whi
🔴 If AI writes your code, why use Python? — score 90
Sources: hackernews
🔴 Found a way to cool the DGX — score 71
Sources: reddit/r/LocalLLaMA
Tap water keeps the temperature below 68 degree Celsius at 95% GPU utilization running Qwen3.5-122b-a10B Q6_K precision. 110 GB Memory usage, 80k context window, 18.77 tokens/second for continuous vision analyses. Not sure how often do I have to change the water but so far so good.
🟡 Notable
Model Releases
🟡 @OpenAI: Introducing Daybreak: frontier AI for cyber defenders. Daybreak brings together the most capable OpenAI models, Codex, and our security partners to accelerate cyber defense and continuously secure so — score 60
Sources: twitter_rss
Introducing Daybreak: frontier AI for cyber defenders. Daybreak brings together the most capable OpenAI models, Codex, and our security partners to accelerate cyber defense and continuously secure software. A step toward a future where security teams can move at the speed defense demands.
🟡 Will there be any more Qwen3.6 series models? — score 54
Sources: reddit/r/LocalLLaMA
I'm still hoping we see a Qwen3.6-122B or a Qwen3.6-coder, but my hopes are dimming. Seems like we would have seen/heard something by now, even if just tantalizing hints from the Qwen folks.
🟡 How ChatGPT adoption broadened in early 2026 — score 50
Sources: lab_blog/OpenAI
ChatGPT adoption surged in Q1 2026, with fastest growth among users over 35 and more balanced gender usage, signaling broader mainstream AI adoption.
🟡 @AnthropicAI: New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. — score 50
Sources: twitter_rss
New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?
🟡 @OpenAI: Today we’re launching the OpenAI Deployment Company to help businesses build and deploy AI. It's majority-owned and controlled by OpenAI. It brings together 19 leading investment firms, consultancies — score 50
Sources: twitter_rss
Today we’re launching the OpenAI Deployment Company to help businesses build and deploy AI. It's majority-owned and controlled by OpenAI. It brings together 19 leading investment firms, consultancies, and system integrators to help organizations deploy frontier AI to production for business impact.
Omitted 1 additional model releases items from the main section; see raw data and source-specific sections below.
Developer Tools
🟡 I think a lot of people are underestimating how expensive unreliable agents are — score 64
Sources: reddit/r/AIAgents
not in API cost in human attention I had a workflow recently that technically “worked” it completed tasks returned outputs didn’t crash but every few hours I’d still check it manually because I didn’t fully trust it and eventually I realized: if I’m constantly monitoring the system, then part of my
🟡 wanshuiyin/Auto-claude-code-research-in-sleep — ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works with Claude Code, Codex, OpenClaw, or any LLM agent. — score 62
Sources: github_trending
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works with Claude Code, Codex, OpenClaw, or any LLM agent.
🟡 Zackriya-Solutions/meetily — Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization built on Rust. 100% local processing. no cloud required. Meetily (Meetly Ai -https://meetily.ai) is the #1 Self-hosted, Open-source Ai meeting note taker for macOS & Windows. — score 58
Sources: github_trending
Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization built on Rust. 100% local processing. no cloud required. Meetily (Meetly Ai -https://meetily.ai) is the #1 Self-hosted, Open-source Ai meeting note taker for macOS &
🟡 THU-MAIC/OpenMAIC — Open Multi-Agent Interactive Classroom — Get an immersive, multi-agent learning experience in just one click — score 55
Sources: github_trending
Open Multi-Agent Interactive Classroom — Get an immersive, multi-agent learning experience in just one click
🟡 romainsimon/paperasse — 🇫🇷 Skills pour agents IA spécialisés dans la bureaucratie française : Comptable, Notaire, ... — score 51
Sources: github_trending
🇫🇷 Skills pour agents IA spécialisés dans la bureaucratie française : Comptable, Notaire, ...
Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.
Research Papers
🟡 FORTIS: Benchmarking Over-Privilege in Agent Skills — score 62
Sources: huggingface · arxiv/cs.AI
Large language model agents increasingly operate through an intermediate skill layer that mediates between user intent and concrete task execution. This layer is widely treated as an organizational abstraction, but we argue it is also a privilege boundary that current models routinely exceed. We pre
🟡 LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language — score 62
Sources: huggingface · arxiv/cs.CL
Sardinian, a Romance language with roughly one million speakers, has minimal presence in modern NLP. Commercial services do not support it, and current language models do not produce it reliably. We present LLiMba, a 3B parameter Sardinian-ready model adapted from Qwen2.5-3B-Instruct through continu
🟡 Path-Coupled Bellman Flows for Distributional Reinforcement Learning — score 60
Sources: arxiv/cs.AI · arxiv/cs.LG
arXiv:2605.08253v1 Announce Type: cross Abstract: Distributional reinforcement learning (DRL) models the full return distribution, but existing finite-support or quantile-based methods rely on projections, while recent flow-based approaches can suffer from \emph{boundary mismatch} at the flow source
🟡 SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis — score 55
Sources: huggingface
Generalizable novel view synthesis aims to render unseen views from uncalibrated input images without requiring per-scene optimization. Recent feed-forward approaches based on 3D Gaussian Splatting have achieved promising efficiency and rendering quality. However, most of them assign a fixed number
Other Signals
🟡 Interactive Jensen–Shannon Divergence Visualisation [P] — score 69
Sources: reddit/r/MachineLearning
An interactive visualisation of Jensen–Shannon divergence - the symmetric, always-finite cousin of KL. Shape two distributions and watch JSD, its ceiling of one bit, and the per-point contribution respond in real time. https://robotchinwag.com/posts/jensen-shannon-divergence-visualisation/ Feedback
🟡 MiniCPM 4.6 — score 62
Sources: reddit/r/LocalLLaMA
🟡 ICML Author Removal [D] — score 56
Sources: reddit/r/MachineLearning
PhD student. Need advice. After the ICML abstract deadline, industry coauthors asked to be removed, they missed their employer's internal approval window. They had contributed (discussions and written feedback) but I hadn't explicitly asked before adding them. January: wrote to PC chairs, got writte
🟡 Google says criminal hackers used AI to find a major software flaw — score 50
Sources: hackernews
🟡 @AnthropicAI: Claude's Constitution is now an audiobook, read by two of its authors, Amanda Askell and Joe Carlsmith. It includes a Q&A on the writing process, the philosophies that shaped the document, and how it — score 50
Sources: twitter_rss
Claude's Constitution is now an audiobook, read by two of its authors, Amanda Askell and Joe Carlsmith. It includes a Q&A on the writing process, the philosophies that shaped the document, and how it might change as models become more capable. Listen at http://anthropic.com/constitution
Omitted 1 additional other signals items from the main section; see raw data and source-specific sections below.
🟢 Incremental
Model Releases
🟢 I catalogued every way local models break JSON output and built a repair library, here's what I found across 288 model calls — score 29
Sources: reddit/r/LocalLLaMA
I've been running structured output prompts through a bunch of models on OpenRouter for the past few months — Llama 3, Mistral, Command R, DeepSeek, Qwen, and every other model on OpenRouter — alongside the usual closed-source suspects. 288 calls total. I wanted to know what actually breaks, how oft
🟢 Llama models: still valuable for finetuning or surpassed by everything new? — score 12
Sources: reddit/r/LocalLLaMA
Hello there people. So I have noticed that people are pretty much ignoring Llama 3 plus 3.1, 3.2, and 3.3 these days. They never mention how their experience goes with fine-tuning those models. But we haven't been getting many entries into the 70 billion space. So is, for example, Llama 3.3 70B the
🟢 Claude Platform on AWS — score 10
Sources: hackernews
Developer Tools
🟢 Same agent, same task, wildly different costs per session? — score 36
Sources: reddit/r/AIAgents
Been digging into agent observability lately and found something that surprised me - the same agent, same task had wildly different costs per session. One deployment was averaging $0.01 per session but occasionally spiking to $0.50. Tracked it down to runaway tool calls and bloated context from earl
🟢 AUTOMATIC1111/stable-diffusion-webui — Stable Diffusion web UI — score 30
Sources: github_trending
Stable Diffusion web UI
🟢 huggingface/skills — Give your agents the power of the Hugging Face ecosystem — score 27
Sources: github_trending
Give your agents the power of the Hugging Face ecosystem
🟢 RhysSullivan/executor — The missing integration layer for AI agents. Let them call any OpenAPI / MCP / GraphQL / custom js functions in secure environment. — score 22
Sources: github_trending
The missing integration layer for AI agents. Let them call any OpenAPI / MCP / GraphQL / custom js functions in secure environment.
🟢 Anyone here actually running voice agents in production? Looking for 10 min calls to learn from your stack — score 14
Sources: reddit/r/AIAgents
I'm Nico, building Patter (open-source voice SDK, alpha). Before writing more code I want to talk to 10 people actually running voice agents in production. Specifically anyone on: 1. Pipecat in production 2. LiveKit Agents in production 3. Vapi with custom LLM endpoint in production 10 min on a call
Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.
Infrastructure & Compute
🟢 lakehq/sail — Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads. — score 13
Sources: github_trending
Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.
Business & Funding
🟢 How can I check whether my paper follows the required ARR formatting before submission? [D] — score 12
Sources: reddit/r/MachineLearning
Last cycle, one of my research paper was rejected because of formatting issues. I recently heard from someone that there may be a tool or software called something like “aclpubcheck” that can be used to check whether a manuscript follows the required submission format correctly. Does anyone know the
Research Papers
🟢 Can Muon Fine-tune Adam-Pretrained Models? — score 38
Sources: huggingface · arxiv/cs.LG
Muon has emerged as an efficient alternative to Adam for pretraining, yet remains underused for fine-tuning. A key obstacle is that most open models are pretrained with Adam, and naively switching to Muon for fine-tuning leads to degraded performance due to an optimizer mismatch. We investigate this
🟢 Training-Free Dense Hand Contact Estimation with Multi-Modal Large Language Models — score 25
Sources: huggingface
Dense hand contact estimation requires both high-level semantic understanding and fine-grained geometric reasoning of human interaction to accurately localize contact regions. Recently, multi-modal large language models (MLLMs) have demonstrated strong capabilities in understanding visual semantics,
🟢 CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models — score 25
Sources: huggingface
This paper proposes a novel approach to address the challenge that pretrained VLA models often fail to effectively improve performance and reduce adaptation costs during standard supervised finetuning (SFT). Some advanced finetuning methods with auxiliary training objectives can improve performance
🟢 RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark — score 25
Sources: huggingface
Memory is a critical component of robotic intelligence, as robots must rely on past observations and actions to accomplish long-horizon tasks in partially observable environments. However, existing robotic memory benchmarks still lack multimodal annotations for memory formation, provide limited task
Other Signals
🟢 Drastically improve prompt processing speed for --n-cpu-moe partially offloaded models — score 38
Sources: reddit/r/LocalLLaMA
Bigger ubatch made gpt-oss-120b prompt processing much faster on my RTX 3090 I was tuning
gpt-oss-120b-F16.ggufwith llama.cpp on a 24 GB RTX 3090 and found that increasing the physical micro-batch size (-ub) can massively improve prompt processing throughput, as long as you also raise `--n-cp
🟢 Online RL Reading Group[D] — score 31
Sources: reddit/r/MachineLearning
Hi, I am a student going into my first year in Ph.D in RL this September. Although each university kinda has their own reading groups, I was wondering if there is active RL Online reading group I can participate. Sadly I couldnt find any info elsewhere. Does anyone have any information regarding Onl
🟢 I let AI build a tool to help me figure out what was waking me up at night — score 30
Sources: hackernews
🟢 Most RAG apps in production are confidently wrong and nobody talks about this enough — score 20
Sources: reddit/r/AIAgents
Been working with a few teams integrating RAG into internal tools, support bots, document Q&A, contract search, and I keep running into the same thing nobody warns you about when you're following tutorials. The basic retrieve-then-generate pipeline looks fine in demos. Clean question, clean doc,
🟢 Interaction Models from Thinking Machines Lab [P] — score 12
Sources: reddit/r/MachineLearning
Omitted 1 additional other signals items from the main section; see raw data and source-specific sections below.
📈 Trending Repos
| Repo | Description | Stars Today | Language |
|---|---|---|---|
| garrytan/gstack | Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA | 918 | typescript |
| wanshuiyin/Auto-claude-code-research-in-sleep | ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works with Claude Code, Codex, OpenClaw, or any LLM agent. | 186 | python |
| Zackriya-Solutions/meetily | Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization built on Rust. 100% local processing. no cloud required. Meetily (Meetly Ai -https://meetily.ai) is the #1 Self-hosted, Open-source Ai meeting note taker for macOS & Windows. | 140 | rust |
| THU-MAIC/OpenMAIC | Open Multi-Agent Interactive Classroom — Get an immersive, multi-agent learning experience in just one click | 130 | typescript |
| romainsimon/paperasse | 🇫🇷 Skills pour agents IA spécialisés dans la bureaucratie française : Comptable, Notaire, ... | 110 | python |
| jwadow/kiro-gateway | 👻 Proxy API gateway for Kiro IDE & CLI (Amazon Q Developer / AWS CodeWhisperer). Use free Claude models with any client. | 76 | python |
| bytedance/UI-TARS | Pioneering Automated GUI Interaction with Native Agents | 75 | python |
| AUTOMATIC1111/stable-diffusion-webui | Stable Diffusion web UI | 39 | python |
| huggingface/skills | Give your agents the power of the Hugging Face ecosystem | 38 | python |
| RhysSullivan/executor | The missing integration layer for AI agents. Let them call any OpenAPI / MCP / GraphQL / custom js functions in secure environment. | 35 | typescript |
📄 New Papers
| Title | Category | Hotness | Link |
|---|---|---|---|
| TMAS: Scaling Test-Time Compute via Multi-Agent Synergy | research_paper | 36 | Open |
| SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding | research_paper | 6 | Open |
| FORTIS: Benchmarking Over-Privilege in Agent Skills | research_paper | 2 | Open |
| LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language | research_paper | 2 | Open |
| Path-Coupled Bellman Flows for Distributional Reinforcement Learning | cs.AI | 0 | Open |
| SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis | research_paper | 2 | Open |
| Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits | cs.AI | 0 | Open |
| Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction | cs.AI | 0 | Open |
| Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria | cs.AI | 0 | Open |
| Embeddings for Preferences, Not Semantics | cs.AI | 0 | Open |
| On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective | cs.AI | 0 | Open |
| MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs | cs.AI | 0 | Open |
| SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents | cs.AI | 0 | Open |
| PLACO: A Multi-Stage Framework for Cost-Effective Performance in Human-AI Teams | cs.AI | 0 | Open |
| CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents | cs.AI | 0 | Open |
🏢 Lab Blog Posts
🐦 Twitter/X Highlights
| Account | Tweet Summary |
|---|---|
| OpenAI | Introducing Daybreak: frontier AI for cyber defenders. Daybreak brings together the most capable OpenAI models, Codex, and our security partners to accelerate cyber defense and continuously secure software. A step toward a future where security teams can move at the speed defense demands. Post |
| AnthropicAI | Claude's Constitution is now an audiobook, read by two of its authors, Amanda Askell and Joe Carlsmith. It includes a Q&A on the writing process, the philosophies that shaped the document, and how it might change as models become more capable. Listen at http://anthropic.com/constitution Post |
| AnthropicAI | New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How? Post |
| OpenAI | Today we’re launching the OpenAI Deployment Company to help businesses build and deploy AI. It's majority-owned and controlled by OpenAI. It brings together 19 leading investment firms, consultancies, and system integrators to help organizations deploy frontier AI to production for business impact. Post |
| simonw | Wrote about today's GitLab restructuring / "workforce reduction" announcement, and ended up digging around in version control for both the GitLab and the 37signals public employee handbooks to help illustrate my thoughts https://simonwillison.net/2026/May/11/gitlab-act-2/ Post |
| simonw | New TIL: I figured out how to use my LLM CLI tool in a shebang line, which means you can write executable scripts in English, or hook up more complex scripts with a snippet of YAML template Post |
Repeated From Recent Briefings
- NousResearch/hermes-agent — The agent that grows with you - first seen 2026-05-11
- anthropics/financial-services - first seen 2026-05-07
- farion1231/cc-switch — A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io - first seen 2026-05-08
- PhD students in ML, how many hours on average do you work? [D] - first seen 2026-05-11
- datawhalechina/hello-agents — 📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程 - first seen 2026-05-09
- Openclaw ia trending down and will disappear soon - first seen 2026-05-11
- bytedance/UI-TARS-desktop — The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra - first seen 2026-05-09
- Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models - first seen 2026-05-11
- HKUDS/AI-Trader — "AI-Trader: 100% Fully-Automated Agent-Native Trading" - first seen 2026-05-02
- earendil-works/pi — AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods - first seen 2026-05-09
- ... plus 131 more repeated items in processed data