AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 "Generate a photorealistic realtime render of a human face with webGL" (Qwen3.5-122B-A10B UD-Q3_K_XL) — score 83 Sources: reddit/r/LocalLLaMA

Developer Tools

🔴 NVlabs/Sana — SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer — score 85 Sources: github_trending

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

🔴 Everyone wants AI agents with “long-term memory” until they realize memory creates operational debt — score 81 Sources: reddit/r/AIAgents

A few examples we ran into: * Old user preferences quietly overriding newer ones * Derived summaries becoming more “trusted” than raw facts * No clear audit trail for where a memory came from * Tiny retrieval mistakes compounding over weeks of interactions * Teams afraid to touch the memory layer be

🔴 Andyyyy64/whichllm — Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly. — score 71 Sources: github_trending

Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.

Infrastructure & Compute

🔴 M5 vs DGX Spark vs Strix Halo vs RTX 6000 — score 97 Sources: reddit/r/LocalLLaMA

Hey guys, super simple. There have been a lot of online debates about the new M5 Macs vs DGX Sparks vs Strix Halo vs dedicated GPUs etc. So I put them all in a room with good power and cooling and ran everything in parallel with standardized tests for the past 3 days, and published everything to a r

Research Papers

🔴 PhysBrain 1.0 Technical Report — score 78 Sources: huggingface · arxiv/cs.AI

Vision-language-action models have advanced rapidly, but robot trajectories alone provide limited coverage for learning broad physical understanding. PhysBrain 1.0 studies a complementary route: converting large-scale human egocentric video into structured physical commonsense supervision before rob

🔴 FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization — score 75 Sources: huggingface

Human-centric video customization, particularly at the garment level, has shown significant commercial value. However, existing approaches cannot support low-latency and interactive garment control, which is crucial for applications such as e-commerce and content creation. This paper studies how to

Other Signals

🔴 I hope that someday we will have a 124B Gemma. — score 90 Sources: reddit/r/LocalLLaMA

🔴 85 GPU-hours comparing 5 abliteration methods on Qwen3.6-27B: benchmarks, safety, weight forensics - Abliterlitics — score 77 Sources: reddit/r/LocalLLaMA

I've been building Abliterlitics, an open-source abliteration forensics toolkit. The idea is straightforward: take the same base model, compare the different abliteration techniques others have applied, then measure what actually changed using benchmarks

🔴 I don't think AI will make your processes go faster — score 75 Sources: hackernews

🟡 Notable

Model Releases

🟡 llama: avoid copying logits during prompt decode in MTP by am17an · Pull Request #23198 · ggml-org/llama.cpp — score 63 Sources: reddit/r/LocalLLaMA

time to update your llama.cpp -> improved prompt processing speed

🟡 The power of structured workflows and small local models — score 57 Sources: reddit/r/LocalLLaMA

A month ago, I experimented with a very basic home-rolled agent loop with a handful of tools and found it worked surprisingly well in spite of how crude it was: https://www.reddit.com/r/LocalLLaMA/comments/1sl7f8e/homerolled_loop_agent_is_surprisingly_effective/ Later, I wrote about how I addictive

Developer Tools

🟡 jamiepine/voicebox — The open-source AI voice studio. Clone, dictate, create. — score 65 Sources: github_trending

The open-source AI voice studio. Clone, dictate, create.

🟡 langflow-ai/langflow — Langflow is a powerful tool for building and deploying AI-powered agents and workflows. — score 61 Sources: github_trending

Langflow is a powerful tool for building and deploying AI-powered agents and workflows.

🟡 yichuan-w/LEANN — [MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device. — score 59 Sources: github_trending

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

🟡 golemcloud/golem — Golem Cloud is the agent-native platform for building AI agents and distributed applications that never lose state, never duplicate work, and never require you to build infrastructure. — score 57 Sources: github_trending

Golem Cloud is the agent-native platform for building AI agents and distributed applications that never lose state, never duplicate work, and never require you to build infrastructure.

🟡 May 2026 updated chart of strix halo mini pc size chart — score 50 Sources: reddit/r/LocalLLaMA

https://gist.github.com/RexYuan/3fc27edcd12475e496eb20946f8c8485

Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟡 Light-Heart-Labs/DreamServer — Local AI anywhere, for everyone — LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. No cloud, no subscriptions. — score 47 Sources: github_trending

Local AI anywhere, for everyone — LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. No cloud, no subscriptions.

Other Signals

🟡 Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention [P] — score 56 Sources: reddit/r/MachineLearning

🟡 I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how — score 50 Sources: reddit/r/LocalLLaMA

I was frustrated that every coding agent (OpenCode, Cursor, Claude Code) assumes you're running GPT-5.4 or Claude Opus. If you try them with a local model like Gemma or Qwen they fall apart. I find that often tool calls fail, context overflows, multi-step tasks collapse. So I built SmallCode. It's d

🟡 @simonw: Also a great example of positive contribution to open source by wanderingmeow - you don't need to contribute code to have a positive impact, just providing detailed feedback and confirmation that some — score 50 Sources: twitter_rss

Also a great example of positive contribution to open source by wanderingmeow - you don't need to contribute code to have a positive impact, just providing detailed feedback and confirmation that something like this works is enormously useful

🟢 Incremental

Model Releases

🟢 could refusal layers be masking dialect-conditioned safety failures in MoE models [d] — score 25 Sources: reddit/r/MachineLearning

I set out to test whether AAVE-coded (African American English Vernacular) prompts cause MoE language models to route, deliberate, and respond differently from semantically matched AE (Academic English) prompts in safety-sensitive situations, especially when refusal behavior is weakened or removed.

Developer Tools

🟢 NousResearch/hermes-paperclip-adapter — Paperclip adapter for Hermes Agent — run Hermes as a managed employee in a Paperclip company — score 27 Sources: github_trending

Paperclip adapter for Hermes Agent — run Hermes as a managed employee in a Paperclip company

🟢 Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep — score 25 Sources: hackernews

🟢 rohitg00/skillkit — Supercharge AI coding agents with portable skills. Install, translate & share skills across Claude Code, Cursor, Codex, Copilot & 40 more — score 25 Sources: github_trending

Supercharge AI coding agents with portable skills. Install, translate & share skills across Claude Code, Cursor, Codex, Copilot & 40 more

🟢 Gemma-4-Gembrain-31B-it-uncensored-heretic Is Out Now, a Merge of Multiple Gemma 4 31B it Finetunes Designed to Boost Logical and Lateral Thinking for Improved Adherence, Increased Swipe Variety and Enhanced Creative Prose, With KLD of 0.0186 and 13/100 Refusals! — score 23 Sources: reddit/r/LocalLLaMA

Provided in both Safetensors and GGUFs. Safetensors: llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/Gemma-4-Gembrain-31B-it-uncensored-heretic GGUFs: llmfan46/Gemma-4-Gembrain-31B-it-u

🟢 lee-to/ai-factory — You want to build with AI, but setting up the right context, prompts, and workflows takes time. AI Factory handles all of that so you can focus on what matters — shipping quality code. — score 19 Sources: github_trending

You want to build with AI, but setting up the right context, prompts, and workflows takes time. AI Factory handles all of that so you can focus on what matters — shipping quality code.

Omitted 2 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟢 simular-ai/Agent-S — Agent S: an open agentic framework that uses computers like a human — score 23 Sources: github_trending

Agent S: an open agentic framework that uses computers like a human

Business & Funding

🟢 ICML financial aid [D] — score 25 Sources: reddit/r/MachineLearning

I am an undergraduate student from India who recently got accepted to TAIGR, an ICML workshop for a Poster. I will be requiring financial aid for registration fees and accommodation, since I will be travelling to Seoul and it is independent research so we don't have any backing by any labs/instituti

Research Papers

🟢 From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing — score 30 Sources: huggingface

Modern image editing models produce realistic results but struggle with abstract, multi step instructions (e.g., ``make this advertisement more vegetarian-friendly''). Prior agent based methods decompose such tasks but rely on handcrafted pipelines or teacher imitation, limiting flexibility and deco

🟢 Unlocking Dense Metric Depth Estimation in VLMs — score 30 Sources: huggingface

Vision-Language Models (VLMs) excel at 2D tasks such as grounding and captioning, yet remain limited in 3D understanding. A key limitation is their text-only supervision paradigm, which under-constrains fine-grained visual perception and prevents the recovery of dense geometry. Prior methods either

🟢 OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation — score 15 Sources: huggingface

Cross-embodiment video generation aims to transfer motions across different humanoid embodiments, such as human-to-robot and robot-to-robot, enabling scalable data generation for embodied intelligence. A major challenge in this setting is that motion dynamics are partly transferable across embodimen

Other Signals

🟢 Benchmarking the new b9200 update: Optimizing Qwen 3.6 27B mtp for Hermes Agent on a single RTX 3090 — score 37 Sources: reddit/r/LocalLLaMA

UPDATED (POST b9200) Okay, here is the updated version using the new Qwen 3.6 27B mtp gguf from Unsloth, running it as the backend for the hermes agent. While dialing it in, I noticed that the currently recommended Unsloth mtp flags actually bottleneck performance and tank draft acceptance rates for

🟢 Benchmarking vLLM vs SGLang vs llama.cpp on a mixed Blackwell/Ada cluster — score 30 Sources: reddit/r/LocalLLaMA

I have been running some benchmarks on a heterogeneous 7-GPU cluster to see how different inference engines handle long context prefill using pipeline parallelism. My setup consists of a mix of Blackwell and Ada cards: one RTX PRO 6000 96GB, one PRO 5000 48GB, two 5090 32GB, and three modded 4090 48

🟢 Has anyone tried repriced.ai? — score 19 Sources: reddit/r/AIAgents

Hey guys, I been trying to find out if repriced.ai is actually legit or not. If anyone has some legit case study and can give me their genuine experience it'd be appreciated!

🟢 Qwen 3.6 27B Q8 on four Nvidia RTX A4000 (16GB each) with Llama.cpp and MTP enabled — score 17 Sources: reddit/r/LocalLLaMA

Qwen 3.6 27B Q8 on four Nvidia RTX A4000 (16GB each) with Llama.cpp and MTP enabled My setup is heterogenous, I originally acquired my server (Lenovo ThinkStation P3 Tower Gen 2) to run OpenShift/K8s clusters (because I work on that), and later on I started purchasing one by one those cards Nvid

🟢 I trained TIME: short context-triggered thinking on Qwen model instead of overthinking — score 10 Sources: reddit/r/LocalLLaMA

Started this as a personal project for my Open-WebUI setup to use. Somehow it ended up as an ACL 2026 paper. Not some lab paper, it is personal solo independent paper that happened. TIME is basically my attempt to train Qwen3 models to think in short bursts wherever the response actually

Omitted 1 additional other signals items from the main section; see raw data and source-specific sections below.

Repo	Description	Stars Today	Language
NVlabs/Sana	SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer	472	python
Andyyyy64/whichllm	Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.	209	python
jamiepine/voicebox	The open-source AI voice studio. Clone, dictate, create.	195	typescript
langflow-ai/langflow	Langflow is a powerful tool for building and deploying AI-powered agents and workflows.	155	python
yichuan-w/LEANN	[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.	146	python
golemcloud/golem	Golem Cloud is the agent-native platform for building AI agents and distributed applications that never lose state, never duplicate work, and never require you to build infrastructure.	134	rust
Light-Heart-Labs/DreamServer	Local AI anywhere, for everyone — LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. No cloud, no subscriptions.	112	python
NousResearch/hermes-paperclip-adapter	Paperclip adapter for Hermes Agent — run Hermes as a managed employee in a Paperclip company	37	typescript
rohitg00/skillkit	Supercharge AI coding agents with portable skills. Install, translate & share skills across Claude Code, Cursor, Codex, Copilot & 40 more	32	typescript
simular-ai/Agent-S	Agent S: an open agentic framework that uses computers like a human	29	python

📄 New Papers

Title	Category	Hotness	Link
PhysBrain 1.0 Technical Report	research_paper	66	Open
FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization	research_paper	41	Open
DeepSlide: From Artifacts to Presentation Delivery	cs.AI	0	Open
SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch	cs.AI	0	Open
Does Theory of Mind Improvement Really Benefit Human-AI Interactions? Empirical Findings from Interactive Evaluations	cs.AI	0	Open
SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces	cs.AI	0	Open
Fair outputs, Biased Internals: Causal Potency and Asymmetry of Latent Bias in LLMs for High-Stakes Decisions	cs.AI	0	Open
CAX-Agent: A Lightweight Agent Harness for Reliable APDL Automation	cs.AI	0	Open
NOVA: Fundamental Limits of Knowledge Discovery Through AI	cs.AI	0	Open
ICRL: Learning to Internalize Self-Critique with Reinforcement Learning	cs.AI	0	Open
NIMO Controller: a self-driving laboratory orchestrator based on the Model Context Protocol	cs.AI	0	Open
Verifiable Agentic Infrastructure: Proof-Derived Authorization for Sovereign AI Systems	cs.AI	0	Open
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution	cs.AI	0	Open
SMCEvolve: Principled Scientific Discovery via Sequential Monte Carlo Evolution	cs.AI	0	Open
Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning	cs.AI	0	Open

🐦 Twitter/X Highlights

Account	Tweet Summary
simonw	Also a great example of positive contribution to open source by wanderingmeow - you don't need to contribute code to have a positive impact, just providing detailed feedback and confirmation that something like this works is enormously useful Post

Repeated From Recent Briefings

tinyhumansai/openhuman — Your Personal AI super intelligence. Private, Simple and extremely powerful. - first seen 2026-05-11
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence - first seen 2026-05-14
Backlash against Arxiv's proposed 1 year ban is genuinely perplexing. [D] - first seen 2026-05-16
Google literally dropped the new SEO playbook for AI - first seen 2026-05-17
colbymchenry/codegraph — Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, and OpenCode — fewer tokens, fewer tool calls, 100% local - first seen 2026-05-09
K-Dense-AI/scientific-agent-skills — A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing. - first seen 2026-05-14
joeseesun/qiaomu-anything-to-notebooklm — Claude Skill: Multi-source content processor for NotebookLM. Supports WeChat articles, web pages, YouTube, PDF, Markdown, search queries → Podcast/PPT/MindMap/Quiz etc. - first seen 2026-05-16
anthropics/skills — Public repository for Agent Skills - first seen 2026-05-11
Program misleading high school students into paying to perform academic misconduct in ML Research [D] - first seen 2026-05-17
BigBodyCobain/Shadowbroker — Open-source intelligence for the global theater. Track everything from the corporate/private jets of the wealthy, and spy satellites, to seismic events in one unified interface. Hook an AI agent up to have it parse through data and find previously unseen correlations. The knowledge is available to all but rarely aggregated in the open, until now. - first seen 2026-05-07
... plus 122 more repeated items in processed data

AI Watchtower Briefing — 2026-05-18

🔴 High Significance

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Business & Funding

Research Papers

Other Signals

📈 Trending Repos

📄 New Papers

🐦 Twitter/X Highlights

Repeated From Recent Briefings