π΄ High Significance
Model Releases
π΄ A rare look inside Qwen 3.7βs open source model release approval process: β score 89
Sources: reddit/r/LocalLLaMA
For real tho, 9b, 27b, 122b, I donβt really care at this point, just show us that you still love us. EDIT: I guess I gotta use /s on my posts from now on. Nobody appreciates a good sarcatic shitpost anymore clearly. I love Qwen and all our brothers and sisters in the east. I kid them because I love
π΄ Okay 27B made me a believer β score 75
Sources: reddit/r/LocalLLaMA
I previously hated on this model, but I have just been impressed by it, and I understand the hype now. I have been working on a HTML5 game console and I decided to see if Qwen3.6 27B can handle making some quick games in it to showcase functionality (save games, console API handling for stat trackin
π΄ What have you built with Claude Code just for yourself? β score 75
Sources: reddit/r/AIAgents
I've been curious about using Claude Code for a personal side project β not for work, just something for myself. Has anyone here done this? I'd love to know: - What did you build? - What resources or references helped you get started? - What was the most important thing you learned along the way?
Developer Tools
π΄ Stop traumatizing AI into loops and turn hallucinations into an honest "I don't know!" by being NICE to them (Proof of Concept, Research, I don't want to sell anything) β score 82
Sources: reddit/r/LocalLLaMA
TL;DR Some AI behavior reminded me of ADHD/Trauma Response (thought loops, task paralysis...) and I laughed it off at first. Then I treated it like my neurodivergent friends: give em some slack. And just like that, the thought loops stopped, response was fast, the answers correct most of the time AN
π΄ NangoHQ/nango β Build product integrations with AI. β score 79
Sources: github_trending
Build product integrations with AI.
π΄ Hmbown/CodeWhale β DeepSeek v4 coding agent in terminal β score 77
Sources: github_trending
DeepSeek v4 coding agent in terminal
π΄ Asked my AI to move its cron job to a different channel yesterday but guess what it did... β score 75
Sources: reddit/r/AIAgents
https://preview.redd.it/j8fyo4renm3h1.png?width=1674&format=png&auto=webp&s=3214a968bb5bfe203a273ade017d08b77c8ced51 So I have this cron job who gives me report every 10 AM but the channel where my agent was supposed to send its report got lost. Here's how my setup looks like: * I use Te
Infrastructure & Compute
π΄ [D] Where do you go for serious AI research discussion online? [D] β score 74
Sources: reddit/r/MachineLearning
Looking for communities where people actually dig into ML/AI research, not hype, not "look what I built with an LLM API," but discussions about papers, training dynamics, debugging real models, infra problems, that kind of thing. I'm specifically interested in places where you can post something lik
Research Papers
π΄ Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling β score 78
Sources: huggingface Β· arxiv/cs.CL
Test-Time Scaling (TTS) enhances the reasoning capabilities of large language models by allocating additional inference compute to explore the solution space. However, existing parallel TTS methods typically keep branches isolated during search: intermediate discoveries remain branch-private and can
Other Signals
π΄ Outsourcing plus local AI will soon become more economical vs. frontier labs β score 75
Sources: hackernews
π‘ Notable
Model Releases
π‘ $400 Qwen 3.6-27B Setup - Dual RTX 3060 - 30-50 t/s β score 54
Sources: reddit/r/LocalLLaMA
I picked up a 7900 XTX earlier which runs qwen3.6-27b fine, but not to my like. Its compute performance is quite unstable for me. With MTP the decode speed can reach 40-60 t/s, but prefill is just too slow. Regardless of whether I used ROCm or Vulkan, the prefill speed varies between 300t/s and 500
π‘ @AnthropicAI: New on the Engineering Blog: The access and permissions we grant agents should evolve with their capabilities. In our own products, we set these parameters through sandboxing, which limits the scope o β score 50
Sources: twitter_rss
New on the Engineering Blog: The access and permissions we grant agents should evolve with their capabilities. In our own products, we set these parameters through sandboxing, which limits the scope of any potentially destructive actions. Read more: https://www.anthropic.com/engineering/how-we-conta
π‘ @GoogleDeepMind: Our Gemini for Science tools could help scientists unlock their next breakthrough. 𧬠β score 50
Sources: twitter_rss
Our Gemini for Science tools could help scientists unlock their next breakthrough. π§¬
π‘ @xai: Thank you so much for all the feedback on the Grok Build Beta. Some of you reported hitting limits quickly. Our team found areas to improve caching, so we've reset Grok Build usage limits for all acc β score 50
Sources: twitter_rss
Thank you so much for all the feedback on the Grok Build Beta. Some of you reported hitting limits quickly. Our team found areas to improve caching, so we've reset Grok Build usage limits for all accounts. Please keep sharing feedback - the team is here to help.
Developer Tools
π‘ [P] Built a portable GPU ISA after reading too many architecture manuals [P] β score 69
Sources: reddit/r/MachineLearning
Iβve been reading GPU architecture docs in my free time. NVIDIA PTX, AMD ISA reference guides, Intel Xe, reverse-engineered Apple GPU stuff. Over 5,000 pages across 16 microarchitectures. After a while you notice all four vendors are doing the same 11 things with different names. So I wrote a spec t
π‘ thedotmack/claude-mem β Persistent Context Across Sessions for Every Agent β Captures everything your agent does during sessions, compresses it with AI, and injects relevant context back into future sessions. Works with Claude Code, OpenClaw, Codex, Gemini, Hermes, Copilot, OpenCode + More β score 66
Sources: github_trending
Persistent Context Across Sessions for Every Agent β Captures everything your agent does during sessions, compresses it with AI, and injects relevant context back into future sessions. Works with Claude Code, OpenClaw, Codex, Gemini, Hermes, Copilot, OpenCode + More
π‘ p-e-w/heretic β Fully automatic censorship removal for language models β score 62
Sources: github_trending
Fully automatic censorship removal for language models
π‘ Turning local agents into self-optimizing agents β score 61
Sources: reddit/r/LocalLLaMA
I was experimenting with a self-optimizing agentic pipeline to climb the benchmark leaderboard (TerminalBench). On a 10-task subset, I got the performance to rise from ~30% β ~90%. That loop worked, so I asked: can the same reflect-and-rewrite step run continuously against everyday chats instead o
π‘ @GoogleDeepMind: SynthID has already watermarked over 100 billion pieces of content, but transparency is a team sport. Thatβs why weβre partnering with @OpenAI, @ElevenLabs and Kakao to add SynthID watermarking to th β score 50
Sources: twitter_rss
SynthID has already watermarked over 100 billion pieces of content, but transparency is a team sport. Thatβs why weβre partnering with @OpenAI, @ElevenLabs and Kakao to add SynthID watermarking to their models β accelerating the industry-wide momentum we started with @NVIDIA.
Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.
Research Papers
π‘ Rethinking VLM Representation for VLA Initialization β score 65
Sources: huggingface
Vision-Language-Action (VLA) models widely adopt pretrained Vision-Language Models (VLMs) as policy backbones, yet it remains unclear what kind of pretrained VLM representation is useful as a VLA initialization. In this paper, we study VLA initialization as a controlled representation-design problem
π‘ VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions β score 62
Sources: huggingface Β· arxiv/cs.AI
Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks. Effective collaboration in such settings increasingly depends on understanding the user beyond what is explicitly stated, as user intent is often reflected in fragmented daily interacti
π‘ Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments β score 45
Sources: huggingface Β· arxiv/cs.AI
Recent advances in large language models (LLMs) have facilitated the widespread deployment of LLMs as interactive agents capable of reasoning, planning, and tool use. Despite strong performance on existing benchmarks, such agents often exhibit notable degradation when deployed in real-world settings
Other Signals
π‘ China Clamps Down on Overseas Travel for AI Talent at Alibaba, DeepSeek β score 68
Sources: reddit/r/LocalLLaMA
Big, if true. Doesn't bode well for research / OS models out of China.
π‘ Testing AI agents where prompt injection turns into actions β score 64
Sources: reddit/r/AIAgents
I am building RedThread, an open-source CLI for repeatable AI-agent red-team campaigns. Repo: https://github.com/matheusht/redthread Demo campaign result: 3 runs, 33.3% ASR, one SUCCESS, one PARTIAL, one FAILURE. The agent problem I am focused on is the action boundary: untrusted text influencing to
π‘ [R]GNN Model For Fraud Detection Isn't Performing Well[R] β score 56
Sources: reddit/r/MachineLearning
We're writing a research paper on explainable fraud detection GNN model and in the first step we're creating a basic Graph Neural Network for that. We're using the most famous dataset available on this topic i.e IEEE CIS Fraud Detection Dataset and implemented all necessary feature engineering on th
π‘ New DeepSWE benchmark finds Claude Opus cheats β score 46
Sources: reddit/r/LocalLLaMA
Sadly the open models seem far behind.
π’ Incremental
Model Releases
π’ Cactus Hybrid Router: Gemma4-2B can match Gemini-3.1-Flash-Lite by routing 15-55% of tasks to Gemini And Running The Rest Locally. β score 39
Sources: reddit/r/LocalLLaMA
Last week, we announced the βSimple Attention Networkβ and trained Needle, a 26m function call model that beats models 10-25x its size. Some LocalLlama Redditors asked if we could use make a router model. We now built βCactus Hybrid Routerβ, a 65k parameter model that decodes on the fly when to comp
π’ Folks running qwen 3.6 27b for agentic work. Do you dare to use q4_k_m? β score 32
Sources: reddit/r/LocalLLaMA
I dont have good experience running q4_k_m, the difference to q6 is "a few errors an hour" to " a few errors every couple of days". Edit: How it fails? Just like user DifficultDog8435 and FullstackSensei explained in the comments. They worded it better than me.
π’ How I gave my AI agent real-time Reddit awareness in ~20 lines of Python using MCP β score 31
Sources: reddit/r/AIAgents
One thing that's been consistently annoying when building LLM agents: they reason well, but they're blind to anything that happened after their training cutoff. For tasks like market monitoring, trend detection, or product research, that's a real problem. I solved this by wiring up a Reddit MCP serv
π’ AI agent for WhatsApp & Telegram β score 31
Sources: reddit/r/AIAgents
Officially announcing the launch of our AI Agent for WhatsApp and Telegram β a solution designed to help businesses automate communication, improve response speed, and scale customer interactions directly inside messaging apps. But in real life it changes the game! Messaging platforms have become th
π’ Multi-Agent Parallelism and State Sovereignty in Agent Fleets β score 31
Sources: reddit/r/AIAgents
Most agent frameworks handle task dispatching sequentially: launching one agent process, blocking the execution loop, waiting for an exit code, and then moving to the next task. If an agent crashes, hits a rate limit, or takes 20 minutes on a complex codebase, your entire workspace freezes. With the
Omitted 5 additional model releases items from the main section; see raw data and source-specific sections below.
Developer Tools
π’ Recently i setupped my Hermes Agent persona and i think i crushed it β score 31
Sources: reddit/r/AIAgents
Iβve been tweaking my Hermes Agent setup (persona, profiles, the works), and I think I finally made it perfect. i wanted to create a perfect personalized ai Well, it worked... maybe a little too well. The Conflict: I created a custom skill called
prompt-engineeri tried to test it in a chat i
π’ alpic-ai/skybridge β Skybridge is a full-stack TypeScript framework for MCP Apps and ChatGPT Apps. Type-safe. React-powered. Platform-agnostic. β score 27
Sources: github_trending
Skybridge is a full-stack TypeScript framework for MCP Apps and ChatGPT Apps. Type-safe. React-powered. Platform-agnostic.
π’ What to use for Sign Language Recognition [R] β score 25
Sources: reddit/r/MachineLearning
Hi everyone, I'm finishing up my proposal for my undergraduate thesis for computer science on sign language recognition, specifically Filipino Sign Language and i want to ask what architecture to use for my methodology that is best, rn im considering Mediapipe Holistic + Transformers or Media Pipe H
π’ modelscope/FunASR β Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API. β score 18
Sources: github_trending
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
π’ A Tiny Open-Source Self-Driving AI That Runs on a Phone [P] β score 6
Sources: reddit/r/MachineLearning
https://preview.redd.it/ww14mzr2fm3h1.png?width=1890&format=png&auto=webp&s=79873d47ae79c7815ca3e7e91fd43141632174f5 https://www.youtube.com/watch?v=rr_uS4bf0B4&feature=youtu.be trained a 7MB open-source L4 self-dri
Infrastructure & Compute
π’ Intel b60 48gb? β score 18
Sources: reddit/r/LocalLLaMA
2k AUD for a 48gb card, itβs certainly lodged itself into my brain. But thereβs very little in this sub about the intel cards; a post from a quarter of a year ago saying to avoid them, but thats also a lifetime in this sphere. Are they really that bad? Surely my little 3060 canβt be better at infere
Research Papers
π’ Learning High-Frequency Continuous Action Chunks in Latent Space β score 20
Sources: huggingface
Modern robotic policies increasingly rely on action chunking to execute complex tasks in the physical world. While action chunking improves temporal consistency at moderate action frequencies, it becomes insufficient when the action frequency is further increased (e.g., to 60~Hz). At such high frequ
Other Signals
π’ Augmented Equivariant Mesh Networks for Anatomical Mesh Segmentation (ICML 2026 Workshops) [R] β score 25
Sources: reddit/r/MachineLearning
Paper: https://arxiv.org/abs/2605.08172 Workshops: AI for Science & Structured Data for Health at ICML 2026 Abstract: >Anatomical mesh segmentation requires models that operate directly on irregular surface geometry while remaining robust to arb
π Trending Repos
| Repo | Description | Stars Today | Language |
|---|---|---|---|
| NangoHQ/nango | Build product integrations with AI. | 860 | typescript |
| Hmbown/CodeWhale | DeepSeek v4 coding agent in terminal | 448 | rust |
| thedotmack/claude-mem | Persistent Context Across Sessions for Every Agent β Captures everything your agent does during sessions, compresses it with AI, and injects relevant context back into future sessions. Works with Claude Code, OpenClaw, Codex, Gemini, Hermes, Copilot, OpenCode + More | 352 | typescript |
| p-e-w/heretic | Fully automatic censorship removal for language models | 314 | python |
| alpic-ai/skybridge | Skybridge is a full-stack TypeScript framework for MCP Apps and ChatGPT Apps. Type-safe. React-powered. Platform-agnostic. | 52 | typescript |
| modelscope/FunASR | Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API. | 42 | python |
π New Papers
| Title | Category | Hotness | Link |
|---|---|---|---|
| Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling | research_paper | 19 | Open |
| Rethinking VLM Representation for VLA Initialization | research_paper | 7 | Open |
| VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions | research_paper | 6 | Open |
| BrickAnything: Geometry-Conditioned Buildable Brick Generation with Structure-Aware Tokenization | cs.AI | 0 | Open |
| Can LLMs Introspect? A Reality Check | cs.AI | 0 | Open |
| Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory | cs.AI | 0 | Open |
| Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions | cs.AI | 0 | Open |
| Constraint acquisition needs better benchmarks | cs.AI | 0 | Open |
| Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems | cs.AI | 0 | Open |
| Experiments in Agentic AI for Science | cs.AI | 0 | Open |
| Anchor: Mitigating Artifact Drift in Agent Benchmark Generation | cs.AI | 0 | Open |
| OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling | cs.AI | 0 | Open |
| JobBench: Aligning Agent Work With Human Will | cs.AI | 0 | Open |
| Managing Uncertainty in LLM-Generated Procedural Knowledge for Virtual Laboratory Planning | cs.AI | 0 | Open |
| ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence | cs.AI | 0 | Open |
π¦ Twitter/X Highlights
| Account | Tweet Summary |
|---|---|
| AnthropicAI | New on the Engineering Blog: The access and permissions we grant agents should evolve with their capabilities. In our own products, we set these parameters through sandboxing, which limits the scope of any potentially destructive actions. Read more: https://www.anthropic.com/engineering/how-we-conta Post |
| GoogleDeepMind | Our Gemini for Science tools could help scientists unlock their next breakthrough. 𧬠Post |
| GoogleDeepMind | SynthID has already watermarked over 100 billion pieces of content, but transparency is a team sport. Thatβs why weβre partnering with @OpenAI, @ElevenLabs and Kakao to add SynthID watermarking to their models β accelerating the industry-wide momentum we started with @NVIDIA. Post |
| xai | Thank you so much for all the feedback on the Grok Build Beta. Some of you reported hitting limits quickly. Our team found areas to improve caching, so we've reset Grok Build usage limits for all accounts. Please keep sharing feedback - the team is here to help. Post |
Repeated From Recent Briefings
- Lum1104/Understand-Anything β Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. - first seen 2026-05-21
- colbymchenry/codegraph β Pre-indexed code knowledge graph for Claude Code, Codex, Gemini, Cursor, OpenCode, AntiGravity, Kiro, and Hermes Agent β fewer tokens, fewer tool calls, 100% local - first seen 2026-05-09
- Qwen3.5 35B A3B uncensored heretic Native MTP Preserved is Out Now With the Full 785 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats - first seen 2026-05-07
- D^2-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing - first seen 2026-05-26
- rohitg00/ai-engineering-from-scratch β Learn it. Build it. Ship it for others. - first seen 2026-05-21
- anthropics/knowledge-work-plugins β Open source repository of plugins primarily intended for knowledge workers to use in Claude Cowork - first seen 2026-05-25
- NousResearch/hermes-agent β The agent that grows with you - first seen 2026-05-11
- farion1231/cc-switch β A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io - first seen 2026-05-08
- earendil-works/pi β AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods - first seen 2026-05-09
- garrytan/gstack β Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA - first seen 2026-05-12
- ... plus 144 more repeated items in processed data