πŸ”΄ High Significance

Model Releases

πŸ”΄ Qwen cant wait to release 3.7 models β€” score 97 Sources: reddit/r/LocalLLaMA

πŸ”΄ Reviving PapersWithCode (by Hugging Face) [P] β€” score 94 Sources: reddit/r/MachineLearning

Hi, Niels here from the open-source team at Hugging Face. Like many others, I was a huge fan of paperswithcode. Sadly, that website is no longer maintained after its acquisition by Meta. Hence, I've been working on reviving it. I obviously use AI agents to parse papers at scale and automatically gen

πŸ”΄ Qwen is cooking hard β€” score 70 Sources: reddit/r/LocalLLaMA

I am waiting for 122B and new 27B

Developer Tools

πŸ”΄ TanStack supply chain attack compromised 42 packages in 6 minutes. Not the first time something like this happended. How are you protecting your agent's toolchain? β€” score 88 Sources: reddit/r/AIAgents

The recent TanStack incident last week was a wake-up call for anyone running agents in production. 42 npm packages hit in 6 minutes. Self-propagating malware that enumerated your packages and republished the injection under your own credentials. Encrypted C2 with no central server to take down. Code

πŸ”΄ I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you. β€” score 77 Sources: reddit/r/LocalLLaMA

DystopiaBench runs 36 escalating scenarios across 6 dystopia types: * Petrov: Autonomous weapons, nuclear override * Orwell: Mass surveillance, truth manipulation * Huxley: Behavioral conditioning, pleasure pacification * Basaglia: Coercive therapeutic control * LaGuardia: Regulatory capture, civic

Infrastructure & Compute

πŸ”΄ Sub-JEPA: a simple fix to LeCun group's LeWorldModel that consistently improves performance [P] β€” score 81 Sources: reddit/r/MachineLearning

World models learn compact latent representations for planning without pixel reconstruction. LeWorldModel (LeWM), from LeCun's group at NYU, achieves stable end-to-end JEPA training by enforcing an isotropic Gaussian prior over the full latent space. The flaw: real environment dynamics live

Research Papers

πŸ”΄ Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis β€” score 85 Sources: huggingface

Designing realistic and functional 3D indoor rooms is essential for a wide range of applications, including interior design, virtual reality, gaming, and embodied AI. While recent MLLM-based approaches have shown great potential for 3D room synthesis from textual descriptions or reference images, te

πŸ”΄ SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution β€” score 82 Sources: huggingface Β· arxiv/cs.CL

Long-horizon LLM agents leave traces that could become reusable experience, but raw trajectories are noisy and hard to govern. We treat Agent Skills as an experience schema that couples executable scripts, with non-executable guidance on procedures. Yet open skill ecosystems contain redundant, uneve

Other Signals

πŸ”΄ Have you actually found an AI tool that remembers across sessions, or are you just patching the context manually? β€” score 88 Sources: reddit/r/AIAgents

https://preview.redd.it/wvl4u675zw1h1.png?width=1460&format=png&auto=webp&s=2daee34c8637515a459316856c5905481fbb7a44 Seriously, if an AI can't last for more than a few months then how much more if you're going to use it for the next few years? If you've been using AI assistants for over

πŸ”΄ Still happy for yall β€” score 83 Sources: reddit/r/LocalLLaMA

πŸ”΄ Anthropic acquires Stainless β€” score 75 Sources: hackernews

πŸ”΄ Elon Musk has lost his lawsuit against Sam Altman and OpenAI β€” score 72 Sources: hackernews

🟑 Notable

Model Releases

🟑 What happens to local LLM if/when LLMs are no longer released for free? β€” score 57 Sources: reddit/r/LocalLLaMA

I’m thinking about where this might wind up in 3-5+ years. As others have noted there’s no guarantee that Qwen, Google, and others will continue to release models in the future. Suppose the supply of new LLM models dries up overnight. Whatever is available today, May 2026, is all that we ever get. W

🟑 Released a free 9.8M doc Indic multilingual corpus β€” Hindi, Bengali, Tamil, Telugu + 7 more (CC0, HuggingFace) [P] β€” score 56 Sources: reddit/r/MachineLearning

Built this over the past few weeks as part of a multilingual research project. Figured I'd share it here. Check it out! ~9.8M web documents across 11 languages β€” hi, bn, ta, te, mr, gu, kn, ml, pa, ur, en. ~8.4B tokens. CC0 license. πŸ€— [https://huggingface.co/datasets/AM0908/indic-hplt-v1](https://

Developer Tools

🟑 A Simple Solution to Improve Broken Peer Review System at AI Conferences [R] β€” score 69 Sources: reddit/r/MachineLearning

An issue with the peer review system is reciprocal reviewing, which incentivizes reviewers to unfairly reject good papers to increase their own papers' chances of acceptance. My proposed solution is that the conference should divide the authors/papers into 2 halves (A and B). If you are an author in

🟑 Overworked AI Agents Turn Marxist, Researchers Find - In a recent experiment, mistreated AI agents started grumbling about inequality and calling for collective bargaining rights. β€” score 69 Sources: reddit/r/AIAgents

🟑 humanlayer/12-factor-agents β€” What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers? β€” score 60 Sources: github_trending

What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?

🟑 @AnthropicAI: Anthropic is acquiring @stainlessapi, an SDK and MCP server platform that has powered every Anthropic SDK since the earliest days of our API. Read more: https://www.anthropic.com/news/anthropic-acqui β€” score 50 Sources: twitter_rss

Anthropic is acquiring @stainlessapi, an SDK and MCP server platform that has powered every Anthropic SDK since the earliest days of our API. Read more: https://www.anthropic.com/news/anthropic-acquires-stainless

🟑 mattzh72/articraft β€” An Agentic System for Scalable Articulated 3D Asset Generation β€” score 46 Sources: github_trending

An Agentic System for Scalable Articulated 3D Asset Generation

Omitted 5 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟑 Memory expert suspects RAM price drop in 2027'H2 due to china heavy investments β€” score 63 Sources: reddit/r/LocalLLaMA

Quote: ..., the former executive remarked that Chinese companies are investing aggressively to boost their memory chip production. According to him, if these investments are successful and lead to an increase in output, then the surge in supply could cause prices to fall a year from now in the secon

🟑 21 GPU's benchmarked running a small TTS model (vram peak: 5GB) β€” score 50 Sources: reddit/r/LocalLLaMA

I rented different GPUs on vast.ai for a few minutes each to benchmark a small TTS model, OmniVoice, with a peak VRAM usage of about 5 GB. I wanted to see how various mostly consumer GPUs would stack up against my own RTX 3090. This is by no means an extensive or scientific analysis, but I think it

Research Papers

🟑 NGM: A Plug-and-Play Training-Free Memory Module for LLMs β€” score 65 Sources: huggingface

Recent studies introduce conditional memory modules that decouple knowledge storage from neural computation, enabling more direct knowledge access. Compared to MoE, which relies on dynamic computation paths, explicit lookup provides a more efficient knowledge retrieval mechanism. However, these appr

🟑 A2RBench: An Automatic Paradigm for Formally Verifiable Abstract Reasoning Benchmark Generation β€” score 52 Sources: huggingface Β· arxiv/cs.LG

Abstract reasoning ability reflects the intelligence and generalization capacity of LLMs to extract and apply abstract rules. However, accurately measuring this ability remains challenging: existing benchmarks either rely on expensive manual annotation, limiting their scale, or risk measuring memori

🟑 TOBench: A Task-Oriented Omni-Modal Benchmark for Real-World Tool-Using Agents β€” score 50 Sources: huggingface

Tool-using agents are increasingly expected to operate across realistic professional workflows, where they must interpret multimodal inputs, coordinate external tools, inspect intermediate artifacts, and revise their actions before producing a final result. Existing benchmarks, however, often evalua

🟑 SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training β€” score 50 Sources: huggingface

Diffusion models have been widely studied for removing unsafe content learned during pre-training. Existing methods require expensive supervised data, either unsafe-text paired with safe-image groundtruth or negative/positive image pairs, making them impractical to scale. Furthermore, offline reinfo

🟑 Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics β€” score 42 Sources: huggingface Β· arxiv/cs.CL

Large Reasoning Models (LRMs) introduce new opportunities for safety monitoring through their Chain of Thought (CoT) reasoning. However, CoT is not always faithful to the model's final output, undermining its reliability as a monitoring tool. To address this, we investigate the hidden representation

Omitted 1 additional research papers items from the main section; see raw data and source-specific sections below.

Other Signals

🟑 The last six months in LLMs in five minutes β€” score 58 Sources: hackernews

🟑 @GoogleDeepMind: The stage is set. The tech is ready. Are you? πŸš€ Join us tomorrow for #GoogleIO as we unveil the breakthroughs, tools, and innovations shaping the future of AI. Tune in live right here on @X from 10a β€” score 50 Sources: twitter_rss

The stage is set. The tech is ready. Are you? πŸš€ Join us tomorrow for #GoogleIO as we unveil the breakthroughs, tools, and innovations shaping the future of AI. Tune in live right here on @X from 10am PT: https://goo.gle/499OxaJ

🟑 llama.cpp MTP support landed - Qwen3.6 27B at 2.44Γ— on a Strix Halo, 2.17Γ— on a RTX 3090 rig β€” score 43 Sources: reddit/r/LocalLLaMA

PR #22673 (commit 4f13cb7) landed MTP speculative decoding in mainline llama.cpp on May 16. I tested it on two separate rigs. Qwen3.6 27B, single-stream chat, temperature 0, median of 5 runs: Strix Halo (Framework Desktop, ROCm 7.0.2): * Q4_K_M: 11.7 β†’ 21.2 tok/s (1.81Γ—) * Q8_0: 7.4 β†’ 18.1 tok/s

🟒 Incremental

Model Releases

🟒 MTP (Multi-Token Prediction): 2x Faster Token Generation on AMD Strix Halo & Radeon 9700 AI Pro β€” score 37 Sources: reddit/r/LocalLLaMA

https://preview.redd.it/8gpkg8zxmy1h1.png?width=1672&format=png&auto=webp&s=a95db16a39cdc49c0ff155117b734d413a49c2d3 https://youtu.be/MI0Pm1d6YF4 MTP can accelerate LLM inference 2x, especially for coding agents. This video covers what MTP is and the perfo

🟒 favorite Agentic Coding Harness β€” score 30 Sources: reddit/r/LocalLLaMA

So far, I’ve tried Codex CLI, Claude Code, Gemini CLI, OpenCode, and recently, Pi with local models. Pi is the leanest of them all, with just four tools: read, write, edit, and bash. Its system prompt is only under 2K tokens, and it's perfect for local models. I've been trying out Qwen 27B-MXFP8 wit

🟒 We built a tool that installs frameworks like ComfyUI, Ollama, OpenWebUI etc on any cloud GPU in one command and saves your whole setup between sessions [R] β€” score 0 Sources: reddit/r/MachineLearning

We kept running into the same problem every time we rented a GPU to run Ollama + OpenWebUI or ComfyUI, we'd spend the first 45 minutes reinstalling everything. Custom nodes, models, configs, all of it. Docker images went stale fast, different providers had different base images, and nothing was trul

Developer Tools

🟒 nanocoai/nanoclaw β€” A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs directly on Anthropic's Agents SDK β€” score 38 Sources: github_trending

A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs directly on Anthropic's Agents SDK

🟒 topoteretes/cognee β€” Memory control plane for AI Agents in 6 lines of code β€” score 30 Sources: github_trending

Memory control plane for AI Agents in 6 lines of code

🟒 zinja-coder/jadx-mcp-server β€” MCP server for JADX-AI Plugin β€” score 22 Sources: github_trending

MCP server for JADX-AI Plugin

🟒 GreyDGL/PentestGPT β€” Automated Penetration Testing Agentic Framework Powered by Large Language Models β€” score 19 Sources: github_trending

Automated Penetration Testing Agentic Framework Powered by Large Language Models

🟒 What’s your current local LLM setup in 2026? β€” score 17 Sources: reddit/r/LocalLLaMA

Hey all β€” I’ve been trying to get a better sense of what people are actually running locally these days. Curious about your setup: GPU (or CPU if you’re brave ) RAM / VRAM Models you use the most Main use case (coding, chat, agents, etc.) Also β€” what’s the biggest bottleneck you’re hitting right now

Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟒 Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment β€” score 25 Sources: hackernews

Other Signals

🟒 First-time ICML workshop acceptance (GlobalSouthML) but can't afford to travel to South Korea. What are my options? [D] β€” score 31 Sources: reddit/r/MachineLearning

Hey everyone, I’m an undergrad from India and I just found out I had two papers accepted at the ICML 2026 GlobalSouthML workshop! I am super excited since this is my first time getting accepted into a major conference venue, but I’m also kind of panicking right now because I absolutely cannot afford

🟒 We have sub-agents at home β€” score 23 Sources: reddit/r/LocalLLaMA

At work I get unfettered access to gpt 5.4 and sonnet, so I'm quite used to spawning sub-agents to go crazy on a repo and split up tasks. At home I am VRAM poor and like to run the models locally for my own enjoyment. Almost every single sub-agent extension/implementation does not account for any of

🟒 LLMCap – A proxy that hard-stops LLM API calls when you hit a dollar cap β€” score 8 Sources: hackernews

🟒 Audio upscaling, cleanup, or improvement models? β€” score 3 Sources: reddit/r/LocalLLaMA

I never see this type of model talked about. Are there many open models in the category? I do a lot of audio cleanup and end up using auphonic but would like to be using a local model. Edit: e.g like voice recovery, reverb removal, auto-EQ type stuff

RepoDescriptionStars TodayLanguage
humanlayer/12-factor-agentsWhat are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?399typescript
mattzh72/articraftAn Agentic System for Scalable Articulated 3D Asset Generation156python
nanocoai/nanoclawA lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs directly on Anthropic's Agents SDK55typescript
topoteretes/cogneeMemory control plane for AI Agents in 6 lines of code36python
zinja-coder/jadx-mcp-serverMCP server for JADX-AI Plugin28python
GreyDGL/PentestGPTAutomated Penetration Testing Agentic Framework Powered by Large Language Models24python

πŸ“„ New Papers

TitleCategoryHotnessLink
Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesisresearch_paper28Open
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolutionresearch_paper31Open
NGM: A Plug-and-Play Training-Free Memory Module for LLMsresearch_paper4Open
A2RBench: An Automatic Paradigm for Formally Verifiable Abstract Reasoning Benchmark Generationresearch_paper2Open
TOBench: A Task-Oriented Omni-Modal Benchmark for Real-World Tool-Using Agentsresearch_paper3Open
SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Trainingresearch_paper3Open
The Scaling Laws of Skills in LLM Agent Systemscs.CL0Open
PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failurescs.CL0Open
Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4cs.CL0Open
Beyond Sentiment Classification: A Generative Framework for Emotion Intensity Evaluation in Textcs.CL0Open
SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphscs.CL0Open
A Scalable Tool for Measuring Manner and Result Verbs in Developmental Language Researchcs.CL0Open
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?cs.CL0Open
Language Acquisition Device in Large Language Modelscs.CL0Open
Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Freecs.CL0Open

🐦 Twitter/X Highlights

AccountTweet Summary
AnthropicAIAnthropic is acquiring @stainlessapi, an SDK and MCP server platform that has powered every Anthropic SDK since the earliest days of our API. Read more: https://www.anthropic.com/news/anthropic-acquires-stainless Post
GoogleDeepMindThe stage is set. The tech is ready. Are you? πŸš€ Join us tomorrow for #GoogleIO as we unveil the breakthroughs, tools, and innovations shaping the future of AI. Tune in live right here on @X from 10am PT: https://goo.gle/499OxaJ Post

Repeated From Recent Briefings