AW · AI Watchtower

🔴 High Significance

Developer Tools

🔴 millionco/react-doctor — Your agent writes bad React. This catches it — score 87 Sources: github_trending

Your agent writes bad React. This catches it

🔴 One thing I didn’t expect after building AI agents for businesses. — score 83 Sources: reddit/r/AIAgents

Most companies don’t actually know where automation should begin. They usually come in asking for: an AI employee a smart assistant a fully autonomous agent But after mapping their operations, the real bottleneck is often something much smaller. Things like: - leads sitting unanswered for hours -

🔴 lsdefine/GenericAgent — Self-evolving agent: grows skill tree from 3.3K-line seed, achieving full system control with 6x less token consumption — score 79 Sources: github_trending

Self-evolving agent: grows skill tree from 3.3K-line seed, achieving full system control with 6x less token consumption

🔴 I built a framework where multi-agent swarms are YAML files, not code. — score 74 Sources: reddit/r/AIAgents

I work on enterprise projects where you have thousands of documents, dozens of APIs, configuration dumps, and project code scattered across different systems. Last year I needed multi-agent setups to make sense of all this and kept running into the same problem: every time I wanted to change who doe

Other Signals

🔴 80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP — score 88 Sources: reddit/r/LocalLLaMA

Just wanted to share my config in hopes of helping other 12GB GPU owners achieve what I see as very respectable token generation speeds with modest VRAM. Using the latest llama.cpp build + MTP PR, I got over 80 tok/sec with 80%+ draft acceptance rate on the benchmark found here: [https://gist.github

🔴 Apple Removes 256GB M3 Ultra Mac Studio Model From Online Store — score 81 Sources: reddit/r/LocalLLaMA

Getting really worried about the m5 Ultra. From removing 512gb -> 256gb -> 96gb.

🔴 BeeLlama.cpp: advanced DFlash & TurboQuant with support of reasoning and vision. Qwen 3.6 27B Q5 with 200k context on 3090, 2-3x faster than baseline (peak 135 tps!) — score 73 Sources: reddit/r/LocalLLaMA

TL;DR New llama.cpp fork! I wanted a Windows-friendly inference to run Qwen 3.6 27B Q5 on a single RTX 3090 with speculative decoding, high context without excess quantization, and vision enabled. No option did this out of the box for me without VRAM and/or tooling issues (this was before MTP PR

🟡 Notable

Model Releases

🟡 NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing — score 65 Sources: reddit/r/LocalLLaMA

I saw this on another sub and didn't see it posted here, it looks awesome, and can definitely be run local. I guess it was released 11 days ago, but it never hit the top of my feed (which I look at way too often), so posting it again. # This is my take on it: Think of this as like scalable video cod

🟡 Pi and Qwen3.6 27B make setting up Archlinux really easy. — score 58 Sources: reddit/r/LocalLLaMA

Just thought I'd share this use case. I was setting up a miniPC as a home theatre with Archlinux (It's the OS I'm most familiar with). I needed to twiddle some things and am not yet familiar with wayland (I'm trying our hyprland, but normally rock i3). So, I installed pi coding agent, pointed it at

🟡 More Qwen3.6-27B MTP success but on dual Mi50s — score 42 Sources: reddit/r/LocalLLaMA

TLDR: The hype is real! 1.5x speedup. Up to 2x speedup with tensor parallelism! After reading the PR I immediately hunted for MTP-compatible Q4_1 quants (they offer a small speedup on these compute-lacking older cards) but couldn't find any. Luckily I came across [this](https://www.reddit.com/r

Developer Tools

🟡 What is an average publication outcome for an ML PhD? [D] — score 69 Sources: reddit/r/MachineLearning

I know publication count is not everything, and quality, contribution, advisor/lab culture, subfield, and luck all matter a lot. But to make the comparison easier, I’m curious about the publication-count side specifically. For an ML PhD, what would you consider an average publication outcome by grad

🟡 openai/codex — Lightweight coding agent that runs in your terminal — score 69 Sources: github_trending

Lightweight coding agent that runs in your terminal

🟡 heygen-com/hyperframes — Write HTML. Render video. Built for agents. — score 67 Sources: github_trending

Write HTML. Render video. Built for agents.

🟡 hesreallyhim/awesome-claude-code — A curated list of awesome skills, hooks, slash-commands, agent orchestrators, applications, and plugins for Claude Code by Anthropic — score 58 Sources: github_trending

A curated list of awesome skills, hooks, slash-commands, agent orchestrators, applications, and plugins for Claude Code by Anthropic

🟡 rowboatlabs/rowboat — Open-source AI coworker, with memory — score 55 Sources: github_trending

Open-source AI coworker, with memory

Omitted 2 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟡 sgl-project/sglang — SGLang is a high-performance serving framework for large language models and multimodal models. — score 58 Sources: github_trending

SGLang is a high-performance serving framework for large language models and multimodal models.

Other Signals

🟡 Running Minimax 2.7 at 100k context on strix halo — score 50 Sources: reddit/r/LocalLLaMA

Just wanted to share because it took me a lot of tweaking to get here: llama-server -hf unsloth/MiniMax-M2.7-GGUF:UD-IQ3_XXS --temp 1.0 --top-k 40 --top-p 0.95 --host 0.0.0.0 --port 8080 -c 100000 -fa on -ngl 999 --no-context-shift -fit off --no-mmap -np 2 --kv-unified --cache-ram 0 -b 1024 -ub 1024

🟡 LLMs corrupt your documents when you delegate — score 50 Sources: hackernews

🟡 EEML 2026 summer school [D] — score 44 Sources: reddit/r/MachineLearning

Has anyone accepted to EEML 2026 summer school?

🟢 Incremental

Model Releases

🟢 Building a AI teacher-assistance software, Assistance needed. — score 39 Sources: reddit/r/AIAgents

Ok, so I have multiple school teachers in my family, so I have an exposure to what problems they face (in teaching, obviously, idc about admin stuff). So I thought of building an AI worksheet generator (idea under development). Claude helped me build a beautiful backend, and through it, I discovered

🟢 Would a community-driven AI agent lab help people actually ship agents? — score 39 Sources: reddit/r/AIAgents

I’m validating an idea and would like honest criticism from people building or studying AI agents. A pattern I keep seeing: people test LangChain, CrewAI, AutoGen, OpenAI tools, Claude, local LLMs, n8n, MCP servers, etc., but very few actually ship a working agent into a real workflow. My hypothesis

🟢 Exactly a year ago, I started working on an MCP server I launched on reddit that became by far my most active open source project! — score 35 Sources: reddit/r/LocalLLaMA

This isn't an advertisement, and it's very much local and open - I already don't have enough time to keep up with the existing pull requests and issues... just a fond look back on how much this space has grown and matured in the past year. Shit was the wild west back then. Nowadays I can run gemma4

🟢 Gemini API File Search is now multimodal — score 30 Sources: hackernews

🟢 I am overwhelmed by Harnesses — score 27 Sources: reddit/r/LocalLLaMA

What do i choose? They all have their good but then some features don't work then i end up breaking more with claude code. Is there one harness that rules them all out there for llama cpp??

Omitted 2 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

🟢 As now many companies have started integrating agents in their operations and still question about reliability? — score 39 Sources: reddit/r/AIAgents

Most companies are still in their beta version and rolling out features integrated with AI to a set of customers now as they too high many reasons for this. I'm trying to figure out how the companies are going to keep track of whether the system has been reliable or not? Any teams or folks out their

🟢 LangChain vs custom wrappers, when did you realize you needed to drop the framework? — score 39 Sources: reddit/r/AIAgents

When I first started messing around with LLM agents, langchain seemed like absolute magic. It felt like I could hook up memory, tools, and chains in five lines of code. But over the last few weeks of building something slightly more complex, it’s been driving me crazy. The abstractions are so deep t

🟢 Devs building agents... what's actually breaking for you in production? — score 39 Sources: reddit/r/AIAgents

I've been going deep on prompt engineering as a control mechanism for agents and I'm working on something that makes certain behaviors more explicit and deterministic rather than relying on instruction following. Before I narrow down where to focus, I want to hear from people actually in the trenche

🟢 Most AI workflows drift because state slowly becomes implicit. — score 39 Sources: reddit/r/AIAgents

Most AI workflow systems drift over time because state slowly becomes implicit. Not because the models fail, but because: * summaries mutate, * assumptions harden, * artifacts lose provenance, * and inference becomes impossible to inspect afterward. We’ve been experimenting with: * explicit continui

🟢 vellum-ai/vellum-assistant — A personal AI assistant that evolves with you. Memory, personality, proactive reach-outs — across macOS, Telegram, and Slack. — score 37 Sources: github_trending

A personal AI assistant that evolves with you. Memory, personality, proactive reach-outs — across macOS, Telegram, and Slack.

Omitted 7 additional developer tools items from the main section; see raw data and source-specific sections below.

Enterprise Adoption

🟢 Gen Z Resentment Toward AI Grows as Adoption Stagnates and Workplace Fears Mount — score 10 Sources: hackernews

Other Signals

🟢 The gap between knowing something and actually understanding it — AI accelerated my learning curve — score 19 Sources: reddit/r/LocalLLaMA

I've been experimenting with setting up local LLMs lately, and here's what hit me hard: Just because it's cheap to build something doesn't mean you should. If a compatible tool already exists for your use case, use it first. Only roll your own once you've confirmed the existing option falls short. I

🟢 Anyone Trying to submit for ICML FM4LS workshop but noticed link closed Early? [D] — score 6 Sources: reddit/r/MachineLearning

I was trying to submit to ICML FM4LS workshop but noticed that openreview is not accepting submissions any more? although the deadline listed on the website is end of day May 9th AoE. Was there any communication that

🟢 Homelab setup — score 4 Sources: reddit/r/LocalLLaMA

Hi everyone, I've been running local models on a MacBook Pro M3 Max with 128GB RAM for a while, and I've recently been thinking about improving my setup. What would make more sense, having a ~7-8K budget? 1- Another MBP (M5 Max) with 128GB, then set up an Exo cluster with my M3 for a total of 256GB

Repo	Description	Stars Today	Language
millionco/react-doctor	Your agent writes bad React. This catches it	806	typescript
lsdefine/GenericAgent	Self-evolving agent: grows skill tree from 3.3K-line seed, achieving full system control with 6x less token consumption	538	python
openai/codex	Lightweight coding agent that runs in your terminal	383	rust
heygen-com/hyperframes	Write HTML. Render video. Built for agents.	345	typescript
sgl-project/sglang	SGLang is a high-performance serving framework for large language models and multimodal models.	153	python
hesreallyhim/awesome-claude-code	A curated list of awesome skills, hooks, slash-commands, agent orchestrators, applications, and plugins for Claude Code by Anthropic	153	python
rowboatlabs/rowboat	Open-source AI coworker, with memory	144	typescript
jingyaogong/minimind	🧠「大模型」2小时完全从0训练64M的小参数LLM！Train a 64M-parameter LLM from scratch in just 2h!	112	python
HKUDS/ViMax	"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"	108	python
vellum-ai/vellum-assistant	A personal AI assistant that evolves with you. Memory, personality, proactive reach-outs — across macOS, Telegram, and Slack.	54	typescript

Repeated From Recent Briefings

anthropics/financial-services - first seen 2026-05-07
Hmbown/DeepSeek-TUI — Coding agent for DeepSeek models that runs in your terminal - first seen 2026-05-02
Shel Silverstein predicts LLM's (and its hallucinations), cira 1981 - first seen 2026-05-09
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction - first seen 2026-05-09
farion1231/cc-switch — A cross-platform desktop All-in-One assistant tool for Claude Code, Codex, OpenCode, openclaw & Gemini CLI. - first seen 2026-05-08
My experience interviewing with Huawei Vancouver for an ML research role: strong mismatch between how it was pitched and how it was evaluated [D] - first seen 2026-05-09
datawhalechina/hello-agents — 📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程 - first seen 2026-05-09
A recent experience with ChatGPT 5.5 Pro - first seen 2026-05-09
Audio-Visual Intelligence in Large Foundation Models - first seen 2026-05-09
HKUDS/AI-Trader — "AI-Trader: 100% Fully-Automated Agent-Native Trading" - first seen 2026-05-02
... plus 36 more repeated items in processed data

AI Watchtower Briefing — 2026-05-10

🔴 High Significance

Developer Tools

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Enterprise Adoption

Other Signals

📈 Trending Repos

Repeated From Recent Briefings