AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Qwen cant wait to release 3.7 models — score 97 Sources: reddit/r/LocalLLaMA

🔴 Reviving PapersWithCode (by Hugging Face) [P] — score 94 Sources: reddit/r/MachineLearning

Hi, Niels here from the open-source team at Hugging Face. Like many others, I was a huge fan of paperswithcode. Sadly, that website is no longer maintained after its acquisition by Meta. Hence, I've been working on reviving it. I obviously use AI agents to parse papers at scale and automatically gen

🔴 Qwen is cooking hard — score 70 Sources: reddit/r/LocalLLaMA

I am waiting for 122B and new 27B

Developer Tools

🔴 TanStack supply chain attack compromised 42 packages in 6 minutes. Not the first time something like this happended. How are you protecting your agent's toolchain? — score 88 Sources: reddit/r/AIAgents

The recent TanStack incident last week was a wake-up call for anyone running agents in production. 42 npm packages hit in 6 minutes. Self-propagating malware that enumerated your packages and republished the injection under your own credentials. Encrypted C2 with no central server to take down. Code

🔴 I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you. — score 77 Sources: reddit/r/LocalLLaMA

DystopiaBench runs 36 escalating scenarios across 6 dystopia types: * Petrov: Autonomous weapons, nuclear override * Orwell: Mass surveillance, truth manipulation * Huxley: Behavioral conditioning, pleasure pacification * Basaglia: Coercive therapeutic control * LaGuardia: Regulatory capture, civic

Infrastructure & Compute

🔴 Sub-JEPA: a simple fix to LeCun group's LeWorldModel that consistently improves performance [P] — score 81 Sources: reddit/r/MachineLearning

World models learn compact latent representations for planning without pixel reconstruction. LeWorldModel (LeWM), from LeCun's group at NYU, achieves stable end-to-end JEPA training by enforcing an isotropic Gaussian prior over the full latent space. The flaw: real environment dynamics live

Research Papers

🔴 Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis — score 85 Sources: huggingface

Designing realistic and functional 3D indoor rooms is essential for a wide range of applications, including interior design, virtual reality, gaming, and embodied AI. While recent MLLM-based approaches have shown great potential for 3D room synthesis from textual descriptions or reference images, te

🔴 SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution — score 82 Sources: huggingface · arxiv/cs.CL

Long-horizon LLM agents leave traces that could become reusable experience, but raw trajectories are noisy and hard to govern. We treat Agent Skills as an experience schema that couples executable scripts, with non-executable guidance on procedures. Yet open skill ecosystems contain redundant, uneve

Other Signals

🔴 Have you actually found an AI tool that remembers across sessions, or are you just patching the context manually? — score 88 Sources: reddit/r/AIAgents

https://preview.redd.it/wvl4u675zw1h1.png?width=1460&format=png&auto=webp&s=2daee34c8637515a459316856c5905481fbb7a44 Seriously, if an AI can't last for more than a few months then how much more if you're going to use it for the next few years? If you've been using AI assistants for over

🔴 Still happy for yall — score 83 Sources: reddit/r/LocalLLaMA

🔴 Anthropic acquires Stainless — score 75 Sources: hackernews

🔴 Elon Musk has lost his lawsuit against Sam Altman and OpenAI — score 72 Sources: hackernews

🟡 Notable

Model Releases

🟡 What happens to local LLM if/when LLMs are no longer released for free? — score 57 Sources: reddit/r/LocalLLaMA

I’m thinking about where this might wind up in 3-5+ years. As others have noted there’s no guarantee that Qwen, Google, and others will continue to release models in the future. Suppose the supply of new LLM models dries up overnight. Whatever is available today, May 2026, is all that we ever get. W

🟡 Released a free 9.8M doc Indic multilingual corpus — Hindi, Bengali, Tamil, Telugu + 7 more (CC0, HuggingFace) [P] — score 56 Sources: reddit/r/MachineLearning

Built this over the past few weeks as part of a multilingual research project. Figured I'd share it here. Check it out! ~9.8M web documents across 11 languages — hi, bn, ta, te, mr, gu, kn, ml, pa, ur, en. ~8.4B tokens. CC0 license. 🤗 [https://huggingface.co/datasets/AM0908/indic-hplt-v1](https://

Developer Tools

🟡 A Simple Solution to Improve Broken Peer Review System at AI Conferences [R] — score 69 Sources: reddit/r/MachineLearning

An issue with the peer review system is reciprocal reviewing, which incentivizes reviewers to unfairly reject good papers to increase their own papers' chances of acceptance. My proposed solution is that the conference should divide the authors/papers into 2 halves (A and B). If you are an author in

🟡 Overworked AI Agents Turn Marxist, Researchers Find - In a recent experiment, mistreated AI agents started grumbling about inequality and calling for collective bargaining rights. — score 69 Sources: reddit/r/AIAgents

🟡 humanlayer/12-factor-agents — What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers? — score 60 Sources: github_trending

What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?

🟡 @AnthropicAI: Anthropic is acquiring @stainlessapi, an SDK and MCP server platform that has powered every Anthropic SDK since the earliest days of our API. Read more: https://www.anthropic.com/news/anthropic-acqui — score 50 Sources: twitter_rss

Anthropic is acquiring @stainlessapi, an SDK and MCP server platform that has powered every Anthropic SDK since the earliest days of our API. Read more: https://www.anthropic.com/news/anthropic-acquires-stainless

🟡 mattzh72/articraft — An Agentic System for Scalable Articulated 3D Asset Generation — score 46 Sources: github_trending

An Agentic System for Scalable Articulated 3D Asset Generation

Omitted 5 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟡 Memory expert suspects RAM price drop in 2027'H2 due to china heavy investments — score 63 Sources: reddit/r/LocalLLaMA

Quote: ..., the former executive remarked that Chinese companies are investing aggressively to boost their memory chip production. According to him, if these investments are successful and lead to an increase in output, then the surge in supply could cause prices to fall a year from now in the secon

🟡 21 GPU's benchmarked running a small TTS model (vram peak: 5GB) — score 50 Sources: reddit/r/LocalLLaMA

I rented different GPUs on vast.ai for a few minutes each to benchmark a small TTS model, OmniVoice, with a peak VRAM usage of about 5 GB. I wanted to see how various mostly consumer GPUs would stack up against my own RTX 3090. This is by no means an extensive or scientific analysis, but I think it

Research Papers

🟡 NGM: A Plug-and-Play Training-Free Memory Module for LLMs — score 65 Sources: huggingface

Recent studies introduce conditional memory modules that decouple knowledge storage from neural computation, enabling more direct knowledge access. Compared to MoE, which relies on dynamic computation paths, explicit lookup provides a more efficient knowledge retrieval mechanism. However, these appr

🟡 A2RBench: An Automatic Paradigm for Formally Verifiable Abstract Reasoning Benchmark Generation — score 52 Sources: huggingface · arxiv/cs.LG

Abstract reasoning ability reflects the intelligence and generalization capacity of LLMs to extract and apply abstract rules. However, accurately measuring this ability remains challenging: existing benchmarks either rely on expensive manual annotation, limiting their scale, or risk measuring memori

🟡 TOBench: A Task-Oriented Omni-Modal Benchmark for Real-World Tool-Using Agents — score 50 Sources: huggingface

Tool-using agents are increasingly expected to operate across realistic professional workflows, where they must interpret multimodal inputs, coordinate external tools, inspect intermediate artifacts, and revise their actions before producing a final result. Existing benchmarks, however, often evalua

🟡 SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training — score 50 Sources: huggingface

Diffusion models have been widely studied for removing unsafe content learned during pre-training. Existing methods require expensive supervised data, either unsafe-text paired with safe-image groundtruth or negative/positive image pairs, making them impractical to scale. Furthermore, offline reinfo

🟡 Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics — score 42 Sources: huggingface · arxiv/cs.CL

Large Reasoning Models (LRMs) introduce new opportunities for safety monitoring through their Chain of Thought (CoT) reasoning. However, CoT is not always faithful to the model's final output, undermining its reliability as a monitoring tool. To address this, we investigate the hidden representation

Omitted 1 additional research papers items from the main section; see raw data and source-specific sections below.

Other Signals

🟡 The last six months in LLMs in five minutes — score 58 Sources: hackernews

🟡 @GoogleDeepMind: The stage is set. The tech is ready. Are you? 🚀 Join us tomorrow for #GoogleIO as we unveil the breakthroughs, tools, and innovations shaping the future of AI. Tune in live right here on @X from 10a — score 50 Sources: twitter_rss

The stage is set. The tech is ready. Are you? 🚀 Join us tomorrow for #GoogleIO as we unveil the breakthroughs, tools, and innovations shaping the future of AI. Tune in live right here on @X from 10am PT: https://goo.gle/499OxaJ

🟡 llama.cpp MTP support landed - Qwen3.6 27B at 2.44× on a Strix Halo, 2.17× on a RTX 3090 rig — score 43 Sources: reddit/r/LocalLLaMA

PR #22673 (commit 4f13cb7) landed MTP speculative decoding in mainline llama.cpp on May 16. I tested it on two separate rigs. Qwen3.6 27B, single-stream chat, temperature 0, median of 5 runs: Strix Halo (Framework Desktop, ROCm 7.0.2): * Q4_K_M: 11.7 → 21.2 tok/s (1.81×) * Q8_0: 7.4 → 18.1 tok/s

🟢 Incremental

Model Releases

🟢 MTP (Multi-Token Prediction): 2x Faster Token Generation on AMD Strix Halo & Radeon 9700 AI Pro — score 37 Sources: reddit/r/LocalLLaMA

https://preview.redd.it/8gpkg8zxmy1h1.png?width=1672&format=png&auto=webp&s=a95db16a39cdc49c0ff155117b734d413a49c2d3 https://youtu.be/MI0Pm1d6YF4 MTP can accelerate LLM inference 2x, especially for coding agents. This video covers what MTP is and the perfo

🟢 favorite Agentic Coding Harness — score 30 Sources: reddit/r/LocalLLaMA

So far, I’ve tried Codex CLI, Claude Code, Gemini CLI, OpenCode, and recently, Pi with local models. Pi is the leanest of them all, with just four tools: read, write, edit, and bash. Its system prompt is only under 2K tokens, and it's perfect for local models. I've been trying out Qwen 27B-MXFP8 wit

🟢 We built a tool that installs frameworks like ComfyUI, Ollama, OpenWebUI etc on any cloud GPU in one command and saves your whole setup between sessions [R] — score 0 Sources: reddit/r/MachineLearning

We kept running into the same problem every time we rented a GPU to run Ollama + OpenWebUI or ComfyUI, we'd spend the first 45 minutes reinstalling everything. Custom nodes, models, configs, all of it. Docker images went stale fast, different providers had different base images, and nothing was trul

Developer Tools

🟢 nanocoai/nanoclaw — A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs directly on Anthropic's Agents SDK — score 38 Sources: github_trending

A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs directly on Anthropic's Agents SDK

🟢 topoteretes/cognee — Memory control plane for AI Agents in 6 lines of code — score 30 Sources: github_trending

Memory control plane for AI Agents in 6 lines of code

🟢 zinja-coder/jadx-mcp-server — MCP server for JADX-AI Plugin — score 22 Sources: github_trending

MCP server for JADX-AI Plugin

🟢 GreyDGL/PentestGPT — Automated Penetration Testing Agentic Framework Powered by Large Language Models — score 19 Sources: github_trending

Automated Penetration Testing Agentic Framework Powered by Large Language Models

🟢 What’s your current local LLM setup in 2026? — score 17 Sources: reddit/r/LocalLLaMA

Hey all — I’ve been trying to get a better sense of what people are actually running locally these days. Curious about your setup: GPU (or CPU if you’re brave ) RAM / VRAM Models you use the most Main use case (coding, chat, agents, etc.) Also — what’s the biggest bottleneck you’re hitting right now

Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟢 Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment — score 25 Sources: hackernews

Other Signals

🟢 First-time ICML workshop acceptance (GlobalSouthML) but can't afford to travel to South Korea. What are my options? [D] — score 31 Sources: reddit/r/MachineLearning

Hey everyone, I’m an undergrad from India and I just found out I had two papers accepted at the ICML 2026 GlobalSouthML workshop! I am super excited since this is my first time getting accepted into a major conference venue, but I’m also kind of panicking right now because I absolutely cannot afford

🟢 We have sub-agents at home — score 23 Sources: reddit/r/LocalLLaMA

At work I get unfettered access to gpt 5.4 and sonnet, so I'm quite used to spawning sub-agents to go crazy on a repo and split up tasks. At home I am VRAM poor and like to run the models locally for my own enjoyment. Almost every single sub-agent extension/implementation does not account for any of

🟢 LLMCap – A proxy that hard-stops LLM API calls when you hit a dollar cap — score 8 Sources: hackernews

🟢 Audio upscaling, cleanup, or improvement models? — score 3 Sources: reddit/r/LocalLLaMA

I never see this type of model talked about. Are there many open models in the category? I do a lot of audio cleanup and end up using auphonic but would like to be using a local model. Edit: e.g like voice recovery, reverb removal, auto-EQ type stuff

Repo	Description	Stars Today	Language
humanlayer/12-factor-agents	What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?	399	typescript
mattzh72/articraft	An Agentic System for Scalable Articulated 3D Asset Generation	156	python
nanocoai/nanoclaw	A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs directly on Anthropic's Agents SDK	55	typescript
topoteretes/cognee	Memory control plane for AI Agents in 6 lines of code	36	python
zinja-coder/jadx-mcp-server	MCP server for JADX-AI Plugin	28	python
GreyDGL/PentestGPT	Automated Penetration Testing Agentic Framework Powered by Large Language Models	24	python

📄 New Papers

Title	Category	Hotness	Link
Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis	research_paper	28	Open
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution	research_paper	31	Open
NGM: A Plug-and-Play Training-Free Memory Module for LLMs	research_paper	4	Open
A2RBench: An Automatic Paradigm for Formally Verifiable Abstract Reasoning Benchmark Generation	research_paper	2	Open
TOBench: A Task-Oriented Omni-Modal Benchmark for Real-World Tool-Using Agents	research_paper	3	Open
SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training	research_paper	3	Open
The Scaling Laws of Skills in LLM Agent Systems	cs.CL	0	Open
PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures	cs.CL	0	Open
Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4	cs.CL	0	Open
Beyond Sentiment Classification: A Generative Framework for Emotion Intensity Evaluation in Text	cs.CL	0	Open
SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs	cs.CL	0	Open
A Scalable Tool for Measuring Manner and Result Verbs in Developmental Language Research	cs.CL	0	Open
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?	cs.CL	0	Open
Language Acquisition Device in Large Language Models	cs.CL	0	Open
Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free	cs.CL	0	Open

🐦 Twitter/X Highlights

Account	Tweet Summary
AnthropicAI	Anthropic is acquiring @stainlessapi, an SDK and MCP server platform that has powered every Anthropic SDK since the earliest days of our API. Read more: https://www.anthropic.com/news/anthropic-acquires-stainless Post
GoogleDeepMind	The stage is set. The tech is ready. Are you? 🚀 Join us tomorrow for #GoogleIO as we unveil the breakthroughs, tools, and innovations shaping the future of AI. Tune in live right here on @X from 10am PT: https://goo.gle/499OxaJ Post

Repeated From Recent Briefings

tinyhumansai/openhuman — Your Personal AI super intelligence. Private, Simple and extremely powerful. - first seen 2026-05-11
Imbad0202/academic-research-skills — Academic Research Skills for Claude Code: research → write → review → revise → finalize - first seen 2026-05-13
tech-leads-club/agent-skills — The secure, validated skill registry for professional AI coding agents. Extend Antigravity, Claude Code, Cursor, Copilot and more with absolute confidence. - first seen 2026-05-17
rohitg00/agentmemory — #1 Persistent memory for AI coding agents based on real-world benchmarks - first seen 2026-05-09
HKUDS/CLI-Anything — "CLI-Anything: Making ALL Software Agent-Native" -- CLI-Hub:https://clianything.cc/ - first seen 2026-05-17
colbymchenry/codegraph — Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, and OpenCode — fewer tokens, fewer tool calls, 100% local - first seen 2026-05-09
BigBodyCobain/Shadowbroker — Open-source intelligence for the global theater. Track everything from the corporate/private jets of the wealthy, and spy satellites, to seismic events in one unified interface. Hook an AI agent up to have it parse through data and find previously unseen correlations. The knowledge is available to all but rarely aggregated in the open, until now. - first seen 2026-05-07
dograh-hq/dograh — Open Source Voice Agent Platform - first seen 2026-05-17
FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models - first seen 2026-05-18
K-Dense-AI/scientific-agent-skills — A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing. - first seen 2026-05-14
... plus 484 more repeated items in processed data

AI Watchtower Briefing — 2026-05-19

🔴 High Significance

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Other Signals

📈 Trending Repos

📄 New Papers

🐦 Twitter/X Highlights

Repeated From Recent Briefings