π΄ High Significance
Model Releases
π΄ Qwen cant wait to release 3.7 models β score 97
Sources: reddit/r/LocalLLaMA
π΄ Reviving PapersWithCode (by Hugging Face) [P] β score 94
Sources: reddit/r/MachineLearning
Hi, Niels here from the open-source team at Hugging Face. Like many others, I was a huge fan of paperswithcode. Sadly, that website is no longer maintained after its acquisition by Meta. Hence, I've been working on reviving it. I obviously use AI agents to parse papers at scale and automatically gen
π΄ Qwen is cooking hard β score 70
Sources: reddit/r/LocalLLaMA
I am waiting for 122B and new 27B
Developer Tools
π΄ TanStack supply chain attack compromised 42 packages in 6 minutes. Not the first time something like this happended. How are you protecting your agent's toolchain? β score 88
Sources: reddit/r/AIAgents
The recent TanStack incident last week was a wake-up call for anyone running agents in production. 42 npm packages hit in 6 minutes. Self-propagating malware that enumerated your packages and republished the injection under your own credentials. Encrypted C2 with no central server to take down. Code
π΄ I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you. β score 77
Sources: reddit/r/LocalLLaMA
DystopiaBench runs 36 escalating scenarios across 6 dystopia types: * Petrov: Autonomous weapons, nuclear override * Orwell: Mass surveillance, truth manipulation * Huxley: Behavioral conditioning, pleasure pacification * Basaglia: Coercive therapeutic control * LaGuardia: Regulatory capture, civic
Infrastructure & Compute
π΄ Sub-JEPA: a simple fix to LeCun group's LeWorldModel that consistently improves performance [P] β score 81
Sources: reddit/r/MachineLearning
World models learn compact latent representations for planning without pixel reconstruction. LeWorldModel (LeWM), from LeCun's group at NYU, achieves stable end-to-end JEPA training by enforcing an isotropic Gaussian prior over the full latent space. The flaw: real environment dynamics live
Research Papers
π΄ Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis β score 85
Sources: huggingface
Designing realistic and functional 3D indoor rooms is essential for a wide range of applications, including interior design, virtual reality, gaming, and embodied AI. While recent MLLM-based approaches have shown great potential for 3D room synthesis from textual descriptions or reference images, te
π΄ SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution β score 82
Sources: huggingface Β· arxiv/cs.CL
Long-horizon LLM agents leave traces that could become reusable experience, but raw trajectories are noisy and hard to govern. We treat Agent Skills as an experience schema that couples executable scripts, with non-executable guidance on procedures. Yet open skill ecosystems contain redundant, uneve
Other Signals
π΄ Have you actually found an AI tool that remembers across sessions, or are you just patching the context manually? β score 88
Sources: reddit/r/AIAgents
https://preview.redd.it/wvl4u675zw1h1.png?width=1460&format=png&auto=webp&s=2daee34c8637515a459316856c5905481fbb7a44 Seriously, if an AI can't last for more than a few months then how much more if you're going to use it for the next few years? If you've been using AI assistants for over
π΄ Still happy for yall β score 83
Sources: reddit/r/LocalLLaMA
π΄ Anthropic acquires Stainless β score 75
Sources: hackernews
π΄ Elon Musk has lost his lawsuit against Sam Altman and OpenAI β score 72
Sources: hackernews
π‘ Notable
Model Releases
π‘ What happens to local LLM if/when LLMs are no longer released for free? β score 57
Sources: reddit/r/LocalLLaMA
Iβm thinking about where this might wind up in 3-5+ years. As others have noted thereβs no guarantee that Qwen, Google, and others will continue to release models in the future. Suppose the supply of new LLM models dries up overnight. Whatever is available today, May 2026, is all that we ever get. W
π‘ Released a free 9.8M doc Indic multilingual corpus β Hindi, Bengali, Tamil, Telugu + 7 more (CC0, HuggingFace) [P] β score 56
Sources: reddit/r/MachineLearning
Built this over the past few weeks as part of a multilingual research project. Figured I'd share it here. Check it out! ~9.8M web documents across 11 languages β hi, bn, ta, te, mr, gu, kn, ml, pa, ur, en. ~8.4B tokens. CC0 license. π€ [https://huggingface.co/datasets/AM0908/indic-hplt-v1](https://
Developer Tools
π‘ A Simple Solution to Improve Broken Peer Review System at AI Conferences [R] β score 69
Sources: reddit/r/MachineLearning
An issue with the peer review system is reciprocal reviewing, which incentivizes reviewers to unfairly reject good papers to increase their own papers' chances of acceptance. My proposed solution is that the conference should divide the authors/papers into 2 halves (A and B). If you are an author in
π‘ Overworked AI Agents Turn Marxist, Researchers Find - In a recent experiment, mistreated AI agents started grumbling about inequality and calling for collective bargaining rights. β score 69
Sources: reddit/r/AIAgents
π‘ humanlayer/12-factor-agents β What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers? β score 60
Sources: github_trending
What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?
π‘ @AnthropicAI: Anthropic is acquiring @stainlessapi, an SDK and MCP server platform that has powered every Anthropic SDK since the earliest days of our API. Read more: https://www.anthropic.com/news/anthropic-acqui β score 50
Sources: twitter_rss
Anthropic is acquiring @stainlessapi, an SDK and MCP server platform that has powered every Anthropic SDK since the earliest days of our API. Read more: https://www.anthropic.com/news/anthropic-acquires-stainless
π‘ mattzh72/articraft β An Agentic System for Scalable Articulated 3D Asset Generation β score 46
Sources: github_trending
An Agentic System for Scalable Articulated 3D Asset Generation
Omitted 5 additional developer tools items from the main section; see raw data and source-specific sections below.
Infrastructure & Compute
π‘ Memory expert suspects RAM price drop in 2027'H2 due to china heavy investments β score 63
Sources: reddit/r/LocalLLaMA
Quote: ..., the former executive remarked that Chinese companies are investing aggressively to boost their memory chip production. According to him, if these investments are successful and lead to an increase in output, then the surge in supply could cause prices to fall a year from now in the secon
π‘ 21 GPU's benchmarked running a small TTS model (vram peak: 5GB) β score 50
Sources: reddit/r/LocalLLaMA
I rented different GPUs on vast.ai for a few minutes each to benchmark a small TTS model, OmniVoice, with a peak VRAM usage of about 5 GB. I wanted to see how various mostly consumer GPUs would stack up against my own RTX 3090. This is by no means an extensive or scientific analysis, but I think it
Research Papers
π‘ NGM: A Plug-and-Play Training-Free Memory Module for LLMs β score 65
Sources: huggingface
Recent studies introduce conditional memory modules that decouple knowledge storage from neural computation, enabling more direct knowledge access. Compared to MoE, which relies on dynamic computation paths, explicit lookup provides a more efficient knowledge retrieval mechanism. However, these appr
π‘ A2RBench: An Automatic Paradigm for Formally Verifiable Abstract Reasoning Benchmark Generation β score 52
Sources: huggingface Β· arxiv/cs.LG
Abstract reasoning ability reflects the intelligence and generalization capacity of LLMs to extract and apply abstract rules. However, accurately measuring this ability remains challenging: existing benchmarks either rely on expensive manual annotation, limiting their scale, or risk measuring memori
π‘ TOBench: A Task-Oriented Omni-Modal Benchmark for Real-World Tool-Using Agents β score 50
Sources: huggingface
Tool-using agents are increasingly expected to operate across realistic professional workflows, where they must interpret multimodal inputs, coordinate external tools, inspect intermediate artifacts, and revise their actions before producing a final result. Existing benchmarks, however, often evalua
π‘ SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training β score 50
Sources: huggingface
Diffusion models have been widely studied for removing unsafe content learned during pre-training. Existing methods require expensive supervised data, either unsafe-text paired with safe-image groundtruth or negative/positive image pairs, making them impractical to scale. Furthermore, offline reinfo
π‘ Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics β score 42
Sources: huggingface Β· arxiv/cs.CL
Large Reasoning Models (LRMs) introduce new opportunities for safety monitoring through their Chain of Thought (CoT) reasoning. However, CoT is not always faithful to the model's final output, undermining its reliability as a monitoring tool. To address this, we investigate the hidden representation
Omitted 1 additional research papers items from the main section; see raw data and source-specific sections below.
Other Signals
π‘ The last six months in LLMs in five minutes β score 58
Sources: hackernews
π‘ @GoogleDeepMind: The stage is set. The tech is ready. Are you? π Join us tomorrow for #GoogleIO as we unveil the breakthroughs, tools, and innovations shaping the future of AI. Tune in live right here on @X from 10a β score 50
Sources: twitter_rss
The stage is set. The tech is ready. Are you? π Join us tomorrow for #GoogleIO as we unveil the breakthroughs, tools, and innovations shaping the future of AI. Tune in live right here on @X from 10am PT: https://goo.gle/499OxaJ
π‘ llama.cpp MTP support landed - Qwen3.6 27B at 2.44Γ on a Strix Halo, 2.17Γ on a RTX 3090 rig β score 43
Sources: reddit/r/LocalLLaMA
PR #22673 (commit 4f13cb7) landed MTP speculative decoding in mainline llama.cpp on May 16. I tested it on two separate rigs. Qwen3.6 27B, single-stream chat, temperature 0, median of 5 runs: Strix Halo (Framework Desktop, ROCm 7.0.2): * Q4_K_M: 11.7 β 21.2 tok/s (1.81Γ) * Q8_0: 7.4 β 18.1 tok/s
π’ Incremental
Model Releases
π’ MTP (Multi-Token Prediction): 2x Faster Token Generation on AMD Strix Halo & Radeon 9700 AI Pro β score 37
Sources: reddit/r/LocalLLaMA
https://preview.redd.it/8gpkg8zxmy1h1.png?width=1672&format=png&auto=webp&s=a95db16a39cdc49c0ff155117b734d413a49c2d3 https://youtu.be/MI0Pm1d6YF4 MTP can accelerate LLM inference 2x, especially for coding agents. This video covers what MTP is and the perfo
π’ favorite Agentic Coding Harness β score 30
Sources: reddit/r/LocalLLaMA
So far, Iβve tried Codex CLI, Claude Code, Gemini CLI, OpenCode, and recently, Pi with local models. Pi is the leanest of them all, with just four tools: read, write, edit, and bash. Its system prompt is only under 2K tokens, and it's perfect for local models. I've been trying out Qwen 27B-MXFP8 wit
π’ We built a tool that installs frameworks like ComfyUI, Ollama, OpenWebUI etc on any cloud GPU in one command and saves your whole setup between sessions [R] β score 0
Sources: reddit/r/MachineLearning
We kept running into the same problem every time we rented a GPU to run Ollama + OpenWebUI or ComfyUI, we'd spend the first 45 minutes reinstalling everything. Custom nodes, models, configs, all of it. Docker images went stale fast, different providers had different base images, and nothing was trul
Developer Tools
π’ nanocoai/nanoclaw β A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs directly on Anthropic's Agents SDK β score 38
Sources: github_trending
A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs directly on Anthropic's Agents SDK
π’ topoteretes/cognee β Memory control plane for AI Agents in 6 lines of code β score 30
Sources: github_trending
Memory control plane for AI Agents in 6 lines of code
π’ zinja-coder/jadx-mcp-server β MCP server for JADX-AI Plugin β score 22
Sources: github_trending
MCP server for JADX-AI Plugin
π’ GreyDGL/PentestGPT β Automated Penetration Testing Agentic Framework Powered by Large Language Models β score 19
Sources: github_trending
Automated Penetration Testing Agentic Framework Powered by Large Language Models
π’ Whatβs your current local LLM setup in 2026? β score 17
Sources: reddit/r/LocalLLaMA
Hey all β Iβve been trying to get a better sense of what people are actually running locally these days. Curious about your setup: GPU (or CPU if youβre brave ) RAM / VRAM Models you use the most Main use case (coding, chat, agents, etc.) Also β whatβs the biggest bottleneck youβre hitting right now
Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.
Infrastructure & Compute
π’ Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment β score 25
Sources: hackernews
Other Signals
π’ First-time ICML workshop acceptance (GlobalSouthML) but can't afford to travel to South Korea. What are my options? [D] β score 31
Sources: reddit/r/MachineLearning
Hey everyone, Iβm an undergrad from India and I just found out I had two papers accepted at the ICML 2026 GlobalSouthML workshop! I am super excited since this is my first time getting accepted into a major conference venue, but Iβm also kind of panicking right now because I absolutely cannot afford
π’ We have sub-agents at home β score 23
Sources: reddit/r/LocalLLaMA
At work I get unfettered access to gpt 5.4 and sonnet, so I'm quite used to spawning sub-agents to go crazy on a repo and split up tasks. At home I am VRAM poor and like to run the models locally for my own enjoyment. Almost every single sub-agent extension/implementation does not account for any of
π’ LLMCap β A proxy that hard-stops LLM API calls when you hit a dollar cap β score 8
Sources: hackernews
π’ Audio upscaling, cleanup, or improvement models? β score 3
Sources: reddit/r/LocalLLaMA
I never see this type of model talked about. Are there many open models in the category? I do a lot of audio cleanup and end up using auphonic but would like to be using a local model. Edit: e.g like voice recovery, reverb removal, auto-EQ type stuff
π Trending Repos
| Repo | Description | Stars Today | Language |
|---|---|---|---|
| humanlayer/12-factor-agents | What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers? | 399 | typescript |
| mattzh72/articraft | An Agentic System for Scalable Articulated 3D Asset Generation | 156 | python |
| nanocoai/nanoclaw | A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs directly on Anthropic's Agents SDK | 55 | typescript |
| topoteretes/cognee | Memory control plane for AI Agents in 6 lines of code | 36 | python |
| zinja-coder/jadx-mcp-server | MCP server for JADX-AI Plugin | 28 | python |
| GreyDGL/PentestGPT | Automated Penetration Testing Agentic Framework Powered by Large Language Models | 24 | python |
π New Papers
| Title | Category | Hotness | Link |
|---|---|---|---|
| Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis | research_paper | 28 | Open |
| SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution | research_paper | 31 | Open |
| NGM: A Plug-and-Play Training-Free Memory Module for LLMs | research_paper | 4 | Open |
| A2RBench: An Automatic Paradigm for Formally Verifiable Abstract Reasoning Benchmark Generation | research_paper | 2 | Open |
| TOBench: A Task-Oriented Omni-Modal Benchmark for Real-World Tool-Using Agents | research_paper | 3 | Open |
| SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training | research_paper | 3 | Open |
| The Scaling Laws of Skills in LLM Agent Systems | cs.CL | 0 | Open |
| PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures | cs.CL | 0 | Open |
| Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4 | cs.CL | 0 | Open |
| Beyond Sentiment Classification: A Generative Framework for Emotion Intensity Evaluation in Text | cs.CL | 0 | Open |
| SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs | cs.CL | 0 | Open |
| A Scalable Tool for Measuring Manner and Result Verbs in Developmental Language Research | cs.CL | 0 | Open |
| CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows? | cs.CL | 0 | Open |
| Language Acquisition Device in Large Language Models | cs.CL | 0 | Open |
| Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free | cs.CL | 0 | Open |
π¦ Twitter/X Highlights
| Account | Tweet Summary |
|---|---|
| AnthropicAI | Anthropic is acquiring @stainlessapi, an SDK and MCP server platform that has powered every Anthropic SDK since the earliest days of our API. Read more: https://www.anthropic.com/news/anthropic-acquires-stainless Post |
| GoogleDeepMind | The stage is set. The tech is ready. Are you? π Join us tomorrow for #GoogleIO as we unveil the breakthroughs, tools, and innovations shaping the future of AI. Tune in live right here on @X from 10am PT: https://goo.gle/499OxaJ Post |
Repeated From Recent Briefings
- tinyhumansai/openhuman β Your Personal AI super intelligence. Private, Simple and extremely powerful. - first seen 2026-05-11
- Imbad0202/academic-research-skills β Academic Research Skills for Claude Code: research β write β review β revise β finalize - first seen 2026-05-13
- tech-leads-club/agent-skills β The secure, validated skill registry for professional AI coding agents. Extend Antigravity, Claude Code, Cursor, Copilot and more with absolute confidence. - first seen 2026-05-17
- rohitg00/agentmemory β #1 Persistent memory for AI coding agents based on real-world benchmarks - first seen 2026-05-09
- HKUDS/CLI-Anything β "CLI-Anything: Making ALL Software Agent-Native" -- CLI-Hub:https://clianything.cc/ - first seen 2026-05-17
- colbymchenry/codegraph β Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, and OpenCode β fewer tokens, fewer tool calls, 100% local - first seen 2026-05-09
- BigBodyCobain/Shadowbroker β Open-source intelligence for the global theater. Track everything from the corporate/private jets of the wealthy, and spy satellites, to seismic events in one unified interface. Hook an AI agent up to have it parse through data and find previously unseen correlations. The knowledge is available to all but rarely aggregated in the open, until now. - first seen 2026-05-07
- dograh-hq/dograh β Open Source Voice Agent Platform - first seen 2026-05-17
- FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models - first seen 2026-05-18
- K-Dense-AI/scientific-agent-skills β A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing. - first seen 2026-05-14
- ... plus 484 more repeated items in processed data