π΄ High Significance
Model Releases
π΄ Gemma 4 MTP released β score 96
Sources: reddit/r/LocalLLaMA
Blog post:
https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/
MTP draft models:
[https://huggingface.co/google/gemma-4-31B-it-assistant](https://hugg
π΄ Sr Software Engineer - Haven't written a line of code in months β score 83
Sources: reddit/r/AIAgents
AI has reached the point that I no longer write code.
I used to work in shops where I was deep in the debugger without internet access; now I just drive intent and long term engineering decisions with Claude/Codex/Perplexity. I work at a mid-sized startup with a bit over one-hundred people.
I just
Developer Tools
π΄ How are you pricing without lighting money on fire? β score 72
Sources: reddit/r/AIAgents
Curious how everyone is pricing their AI agents right now. Are you going outcome based (only paying when the agent actually delivers), flat monthly fees, or straight usage based pricing tied to tokens or actions?
And if you're doing usage based or outcome based, how are you handling the inference c
Infrastructure & Compute
π΄ Struggling to reproduce paper results before improving them β stuck below reported accuracy [R] β score 94
Sources: reddit/r/MachineLearning
Iβm a PhD student working in AI/computer vision, and Iβve hit a frustrating wall with a project.
My supervisor asked me to improve the accuracy of a published paper. My first step has been to faithfully reproduce their results before trying any modifications. The issue is I canβt even match their r
π΄ Accelerating Gemma 4: faster inference with multi-token prediction drafters β score 81
Sources: hackernews
π΄ cheahjs/free-llm-api-resources β A list of free LLM inference resources accessible via API. β score 71
Sources: github_trending
A list of free LLM inference resources accessible via API.
Research Papers
π΄ PatRe: A Full-Stage Office Action and Rebuttal Generation Benchmark for Patent Examination β score 85
Sources: huggingface
Patent examination is a complex, multi-stage process requiring both technical expertise and legal reasoning, increasingly challenged by rising application volumes. Prior benchmarks predominantly view patent examination as discriminative classification or static extraction, failing to capture its inh
π΄ X2SAM: Any Segmentation in Images and Videos β score 82
Sources: huggingface Β· arxiv/cs.AI
Multimodal Large Language Models (MLLMs) have demonstrated strong image-level visual understanding and reasoning, yet their pixel-level perception across both images and videos remains limited. Foundation segmentation models such as the SAM series produce high-quality masks, but they rely on low-lev
Other Signals
π΄ DeepSeek V4 being 17x cheaper got me to actually measure what I send to cloud vs what I could run locally. the results are stupid. β score 88
Sources: reddit/r/LocalLLaMA
That foodtruck bench post showing deepseek v4 matching gpt-5.2 at 17x cheaper got me thinking. if frontier cloud models are that overpriced for equivalent quality, how much of my daily work even needs cloud at all?
Ran my normal coding workflow for 10 days. every task got logged: what it was, token
π΄ Heretic 1.3 released: Reproducible models, integrated benchmarking system, reduced peak VRAM usage, broader model support, and more β score 81
Sources: reddit/r/LocalLLaMA
Dear fellow Llamas, it is my distinct pleasure to announce the immediate availability of version 1.3 of Heretic (https://github.com/p-e-w/heretic), the leading software for removing censorship from language models.
This was a long and eventful release cycle, during which Heretic became a high-p
π΄ NeurIPS Submission Number [D] β score 81
Sources: reddit/r/MachineLearning
Hey guys,
Just saw that NeurIPS this year might be exceeding 40k, what submission number did you get? The max I know of was 29k, that was 24 hours ago
π‘ Notable
Model Releases
π‘ **[@xai: Grok 4.3 is now live on the xAI API. Itβs our fastest, most intelligent model to date.
It tops the @ArtificialAnlys leaderboards in agentic tool calling and instruction following, and ranks #1 in @Va](https://x.com/xai/status/2051703217697010103)** β score 60
Sources: twitter_rss
Grok 4.3 is now live on the xAI API. Itβs our fastest, most intelligent model to date.
It tops the @ArtificialAnlys leaderboards in agentic tool calling and instruction following, and ranks #1 in @ValsAI enterprise domains like case law and corporate finance.
Grok 4.3 supports a 1 million token co
π‘ **[@OpenAI: Pinned: GPT-5.5 Instant is starting to roll out in ChatGPT.
Itβs a big upgrade, giving you smarter, clearer, and more personalized answers in a warmer, more natural tone.
And it's also more concise,](https://x.com/OpenAI/status/2051709028250915275)** β score 50
Sources: twitter_rss
Pinned: GPT-5.5 Instant is starting to roll out in ChatGPT.
Itβs a big upgrade, giving you smarter, clearer, and more personalized answers in a warmer, more natural tone.
And it's also more concise, which we heard you wanted. We think you'll love chatting with it.
π‘ MTP on strix halo with llama.cpp (PR #22673) β score 42
Sources: reddit/r/LocalLLaMA
I saw a post about incoming MTP support in llama.cpp so i tried it out on a AI max 395 with 128GB DDR5 8000:
I rebuilt the radv container from https://github.com/kyuz0/amd-strix-halo-toolboxes with that PR : [https://github.com/ggml-org/llama.cp
Developer Tools
π‘ Agents can now create Cloudflare accounts, buy domains, and deploy β score 69
Sources: hackernews
π‘ bytedance/deer-flow β An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of tasks that could take minutes to hours. β score 68
Sources: github_trending
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of tasks that could take minutes to hours.
π‘ Arindam200/awesome-ai-apps β A collection of projects showcasing RAG, agents, workflows, and other AI use cases β score 62
Sources: github_trending
A collection of projects showcasing RAG, agents, workflows, and other AI use cases
π‘ ProgramBench: Can we really rebuild huge binaries from scratch? (doesn't look like it) β score 58
Sources: reddit/r/LocalLLaMA
There's been quite a few case studies recently on agents building whole programs from scratch, but most of them test a single or just a few projects with hand-tuned setups.
We've spent the last couple of months formalizing this setting and building a benchmark of 200 tasks while doubling down on te
π‘ Prompt evals are not enough once an agent starts taking actions β score 56
Sources: reddit/r/AIAgents
One thing I keep running into with AI agents is that testing the prompt is only a small part of the problem.
An agent can give a decent response in a simple test and still break once it has to move through a real workflow.
The weird failures usually show up when it has to:
- remember context acro
Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.
Research Papers
π‘ StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing β score 58
Sources: huggingface Β· arxiv/cs.LG
We present StateSMix, a fully self-contained lossless compressor that couples an online-trained Mamba-style State Space Model (SSM) with sparse n-gram context mixing and arithmetic coding. The model is initialised from scratch and trained token-by-token on the file being compressed, requiring no pre
π‘ The TTS-STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail β score 45
Sources: huggingface
Niche-domain Indic ASR -- digit strings, currency amounts, addresses, brand names, English/Indic codemix -- is under-served by both open-source SOTA and commercial systems. On a synthesised entity-dense Telugu test set (held-out by synthesis system), vasista22/whisper-telugu-large-v2 (open SOTA) ach
π‘ ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue β score 45
Sources: huggingface
The rapid advancement of Multimodal Large Language Models (MLLMs) has empowered Unmanned Aerial Vehicle (UAV) with exceptional capabilities in spatial reasoning, semantic understanding, and complex decision-making, making them inherently suited for UAV Search and Rescue (SAR). However, existing UAV
Other Signals
π‘ Transformers with Selective Access to Early Representations [R] β score 69
Sources: reddit/r/MachineLearning Β· arxiv/cs.LG
Hello everyone. Iβm excited to share our new paper!
Figure 1: Comparison Across Architectures
A lot of recent Transformer variants try to improve information flow acr
π‘ Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) β score 65
Sources: reddit/r/LocalLLaMA
The following is a non-comprehensive test I came up with to test the quality difference (a.k.a degradation) between different quantizations of Qwen 3.6 27B. I want to figure out what's the best quant to run on my 16 GB VRAM setup.
WHAT WE ARE TESTING
First, the prompt:
Given this PGN stri
π‘ Production AI very different from the demos [D] β score 56
Sources: reddit/r/MachineLearning
Moved an AI feature into production a few months ago and the cost profile has been a constant surprise since so the demos and the early prototypes ran cheap because the volume was tiny + the prompts were short but when it hit traffic the token usage scaled a lot. I think it was partly because custom
π‘ Dense Model Shoot-Off: Gemma 4 31B vs Qwen3.6/5 27B... Result is Slower is Faster. β score 50
Sources: reddit/r/LocalLLaMA
Not affiliated with Kaitchup, but a fan of their testing. I was looking forward to this article... and it did not disappoint. Lots of free info in the link. The juicy part is behind a paywall. I'll respect that, but the short of it is:
It's showing that the Qwen's are more benchmaxxed, and Ge
π’ Incremental
Model Releases
π’ Tired of copy-pasting prompts between Claude and Codex tabs: built a small file-backed queue that automates the handoff β score 33
Sources: reddit/r/AIAgents
I've been working on agent-lanes
https://github.com/leo-diehl/agent-lanes
A small Python tool that lets one AI coding agent hand work to another over a shared folder. The queue is just JSON files on disk: no daemon, no server, no network.
Think o
π’ Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama β score 27
Sources: reddit/r/LocalLLaMA
π’ Qwen 3.6 27B MTP on v100 32GB: 54 t/s β score 19
Sources: reddit/r/LocalLLaMA
Just a quick note that I got a nice result using am17an's MTP branch of llama.cpp on v100 32GB SXM module using one of those pcie card adapters. Pulled and built in one shot, and llama-server ran without a hitch.
Tested using am17an's MTP GGUF, q8_0 kv cache and 200k cache limit acting as vscode
π’ Solidity LM surpasses Opus β score 4
Sources: reddit/r/LocalLLaMA
My weekend project overran a little but happy with the end result.
soleval pass@1 beat Opus 4.7 on the same set of tasks. Some more work to be done here but any feedback is welcome, I spent quite a lot of time (and money) on this one!
https://huggingface.co/samscrack/Qwen3.6-Solidity-27B
Developer Tools
π’ Question about PLS-DA hyperparameter tuning [R] β score 38
Sources: reddit/r/MachineLearning
Hi all! I am a bioinformatician and I am working on learning some ML tools for some disease/biomarker stuff. I am working with sparse PLS-DA at the moment. Before actually tuning the model, I run on overall global model (without sparsity) to get an idea of what my data looks like and to get to a sta
π’ PriorLabs/TabPFN β β‘ TabPFN: Foundation Model for Tabular Data β‘ β score 38
Sources: github_trending
β‘ TabPFN: Foundation Model for Tabular Data β‘
π’ Early attempt at tracking agent work across the economy β score 33
Sources: reddit/r/AIAgents
I made an Agent Economy tracker and would love feedback!
Itβs an early attempt to track how agent work could show up across the economy: agent GDP, deployed agent employment, revenue, stack costs, and productivity.
Curious what people here think, especially if youβre already using agents seriously
π’ GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents β score 31
Sources: hackernews
π’ Show HN: Airbyte Agents β context for agents across multiple data sources β score 19
Sources: hackernews
Omitted 2 additional developer tools items from the main section; see raw data and source-specific sections below.
Infrastructure & Compute
π’ TritonSigmoid: A fast, padding-aware sigmoid attention kernel for GPUs [R] β score 18
Sources: reddit/r/MachineLearning
We are open-sourcing TritonSigmoid β a fast, padding-aware sigmoid attention kernel for GPUs.
We built this for single-cell foundation models, where every cell is represented as a sequence of genes. A single gene can be regulated by multiple transcription factors at once. Softmax forces them to com
π’ Competition - League of Robot Runners 2026: Multi-robot coordination under uncertainty [N] β score 6
Sources: reddit/r/MachineLearning
Hello ML and RL community
We are inviting participants to the League of Robot Runners (LoRR) 2026: https://www.leagueofrobotrunners.org
Co-located with AAMAS 2026, LoRR is a research competition on large-scale multi-robot coordination. These are important pr
Research Papers
π’ Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO β score 10
Sources: huggingface
We introduce Skills-Coach, a novel automated framework designed to significantly enhance the self-evolution of skills within Large Language Model (LLM)-based agents. Addressing the current fragmentation of the skill ecosystem, Skills-Coach explores the boundaries of skill capabilities, thereby facil
Other Signals
π’ What do you use Gemma 4 for? β score 35
Sources: reddit/r/LocalLLaMA
Both Gemma 4 and Qwen 3.6 seems to be the hottest local models right now. Looking at the benchmarks and reviews, it seems like it's better in every way: coding, benchmarks, agentic tasks. So is Qwen outright better? In what case would you pick Gemma over Qwen?
π’ How to get from vibe-coding to compounding revenue growth using AI agents for GTM β free session with ThriveStack + Brevo β score 26
Sources: reddit/r/AIAgents
Register to join: [Registration Link](https://app.livestorm.co/brevo/thrivestack-x-brevo-vibe-ai-driven-playbook?utm_source=reddit&utm_medium=p
π’ Radar Engineer to Autonomy/AI [D] β score 19
Sources: reddit/r/MachineLearning
Hi all, Iβve spent the last 3 years working on Radar Perception for a legacy automotive project in Germany. My background is an MSc in Robotics & AI. Currently, I spend my time analyzing point clouds and SNR distributions to debug failures. Itβs mathematically complex, but Iβm not implementing a
π’ Wiki Builder: Skill to Build LLM Knowledge Bases β score 6
Sources: hackernews
π’ What if AI agents can now talk? β score 0
Sources: reddit/r/AIAgents
Quick context: I use Claude Code and Codex daily and noticed I was spending half my "agent is working" time just sitting there watching the screen. I was like, what if Claude or Codex can just narrate its process back to me, so I know what it's doing?
So I built Heard. Open-source.
What it does:
π Trending Repos
| Repo | Description | Stars Today | Language |
|---|---|---|---|
| cheahjs/free-llm-api-resources | A list of free LLM inference resources accessible via API. | 344 | python |
| bytedance/deer-flow | An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of tasks that could take minutes to hours. | 328 | python |
| Arindam200/awesome-ai-apps | A collection of projects showcasing RAG, agents, workflows, and other AI use cases | 211 | python |
| vercel-labs/agent-browser | Browser automation CLI for AI agents | 117 | rust |
| vercel-labs/ai-cli | Generate anything from your terminal | 80 | typescript |
| PriorLabs/TabPFN | β‘ TabPFN: Foundation Model for Tabular Data β‘ | 57 | python |
| tensorzero/tensorzero | TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation. | 8 | rust |
π New Papers
| Title | Category | Hotness | Link |
|---|---|---|---|
| PatRe: A Full-Stage Office Action and Rebuttal Generation Benchmark for Patent Examination | research_paper | 5 | Open |
| X2SAM: Any Segmentation in Images and Videos | research_paper | 16 | Open |
| StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing | research_paper | 2 | Open |
| AI Agents for Sustainable SMEs: A Green ESG Assessment Framework | cs.AI | 0 | Open |
| ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable Citations | cs.AI | 0 | Open |
| Effect-Transparent Governance for AI Workflow Architectures: Semantic Preservation, Expressive Minimality, and Decidability Boundaries | cs.AI | 0 | Open |
| Algebraic Semantics of Governed Execution: Monoidal Categories, Effect Algebras, and Coterminous Boundaries | cs.AI | 0 | Open |
| A Knowledge-Driven LLM-Based Decision-Support System for Explainable Defect Analysis and Mitigation Guidance in Laser Powder Bed Fusion | cs.AI | 0 | Open |
| Towards Multi-Agent Autonomous Reasoning in Hydrodynamics | cs.AI | 0 | Open |
| New Bounds for Zarankiewicz Numbers via Reinforced LLM Evolutionary Search | cs.AI | 0 | Open |
| PERSA: Reinforcement Learning for Professor-Style Personalized Feedback with LLMs | cs.AI | 0 | Open |
| Iterative Finetuning is Mostly Idempotent | cs.AI | 0 | Open |
| To Use AI as Dice of Possibilities with Timing Computation | cs.AI | 0 | Open |
| A Low-Latency Fraud Detection Layer for Detecting Adversarial Interaction Patterns in LLM-Powered Agents | cs.AI | 0 | Open |
| Position: Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment | cs.AI | 0 | Open |
π¦ Twitter/X Highlights
| Account | Tweet Summary |
|---|---|
| xai | Grok 4.3 is now live on the xAI API. Itβs our fastest, most intelligent model to date. It tops the @ArtificialAnlys leaderboards in agentic tool calling and instruction following, and ranks #1 in @ValsAI enterprise domains like case law and corporate finance. Grok 4.3 supports a 1 million token context window and is priced at $1.25/m input and $2.50/m output. Create an API key and start building: http://console.x.ai/team/default/api-keys Post |
| OpenAI | Pinned: GPT-5.5 Instant is starting to roll out in ChatGPT. Itβs a big upgrade, giving you smarter, clearer, and more personalized answers in a warmer, more natural tone. And it's also more concise, which we heard you wanted. We think you'll love chatting with it. Post |
Repeated From Recent Briefings
- Hmbown/DeepSeek-TUI β Coding agent for DeepSeek models that runs in your terminal - first seen 2026-05-02; reason: canonical_url
- ruvnet/ruflo β π The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, self-learning swarm intelligence, RAG integration, and native Claude Code / Codex Integration - first seen 2026-05-02; reason: canonical_url
- TauricResearch/TradingAgents β TradingAgents: Multi-Agents LLM Financial Trading Framework - first seen 2026-05-02; reason: canonical_url
- Most people donβt need agents. They need cleaner workflows. - first seen 2026-05-05; reason: canonical_url
- Google Chrome silently installs a 4 GB AI model on your device without consent - first seen 2026-05-05; reason: canonical_url
- rtk-ai/rtk β CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies - first seen 2026-05-05; reason: canonical_url
- AIDC-AI/Pixelle-Video β π AI ε ¨θͺε¨ηθ§ι’εΌζ | AI Fully Automated Short Video Engine - first seen 2026-05-03; reason: canonical_url
- virattt/dexter β An autonomous agent for deep financial research - first seen 2026-05-03; reason: canonical_url
- raullenchai/Rapid-MLX β The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider. - first seen 2026-05-05; reason: canonical_url
- cocoindex-io/cocoindex β Incremental engine for long horizon agents π Star if you like it! - first seen 2026-05-03; reason: canonical_url
- ... plus 458 more repeated items in processed data