AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 vLLM ROCm has been added to Lemonade as an experimental backend — score 77 Sources: reddit/r/LocalLLaMA

vLLM has the ability to run .safetensors LLMs before they are converted to GGUF and represents a new engine to explore. I personally had never tried it out until u/krishna2910-amd/ u/mikkoph and u/sa1sr1 made it as easy as running llama.cpp in Lemonade: ` lemonade backends install vllm:rocm lemonade

🔴 A recent experience with ChatGPT 5.5 Pro — score 75 Sources: hackernews

🔴 I'm kinda good at getting users for ai tools through reddit - could I make money? — score 72 Sources: reddit/r/AIAgents

So I've made and launched my own ai tools and agents before, and ive helped some of my friends too. I learned multiple reddit post strategies a bit ago that, with the right tweaking usually gets me around 100+ organic users within a week or 2 for every project. My last project went crazy I made 2 un

Developer Tools

🔴 bytedance/UI-TARS-desktop — The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra — score 91 Sources: github_trending

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

🔴 datawhalechina/hello-agents — 📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程 — score 89 Sources: github_trending

📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程

🔴 Shel Silverstein predicts LLM's (and its hallucinations), cira 1981 — score 86 Sources: reddit/r/LocalLLaMA

Ran across this cartoon / poem on accident as I was reminiscing about my favorite childhood poet, Shel Silverstein, and couldn't help thinking of LLM's of course!

🔴 earendil-works/pi — AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods — score 86 Sources: github_trending

AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods

🔴 anomalyco/opencode — The open source coding agent. — score 84 Sources: github_trending

The open source coding agent.

Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

There is a lot of disdain for DGX Sparks here on the sub. And I get it. A lot of people say “It could have been great if it had been better memory bandwidth”, “SM-121 is a fake /second-class Blackwell chip” yadda, yadda. These criticisms are valid. I bought one anyway because I’m pursuing a Masters

Research Papers

🔴 Audio-Visual Intelligence in Large Foundation Models — score 85 Sources: huggingface

Audio-Visual Intelligence (AVI) has emerged as a central frontier in artificial intelligence, bridging auditory and visual modalities to enable machines that can perceive, generate, and interact in the multimodal real world. In the era of large foundation models, joint modeling of audio and vision h

🔴 Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction — score 82 Sources: huggingface · arxiv/cs.AI

Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but for agentic search, it becomes a bottleneck: exact lexical constraints, sparse cl

Other Signals

🔴 OpenAI’s WebRTC problem — score 92 Sources: hackernews

🟡 Notable

Model Releases

🟡 Teaching Claude Why — score 58 Sources: hackernews

🟡 Reports suggest DeepSeek is seeking $7.35 billion in funding and plans to release its V4.1 update next month. — score 50 Sources: reddit/r/LocalLLaMA

DeepSeek Reportedly Seeking to Raise Over RMB 50 Billion ($7.35 Billion), Accelerating Its Commercialization and Monetization Strategy According to two people familiar with the matter, DeepSeek founder and CEO Liang Wenfeng plans to contribute the maximum allowable amount in the company’s first fund

🟡 @OpenAI: Chain of thought monitors are a key layer of defense against AI agent misalignment. To preserve monitorability, we avoid penalizing misaligned reasoning during RL. We found a limited amount of accide — score 50 Sources: twitter_rss

Chain of thought monitors are a key layer of defense against AI agent misalignment. To preserve monitorability, we avoid penalizing misaligned reasoning during RL. We found a limited amount of accidental CoT grading which affected released models, and are sharing our analysis. https://alignment.open

🟡 Using Claude Code: The unreasonable effectiveness of HTML — score 42 Sources: hackernews

🟡 new MoE from ai2, EMO — score 41 Sources: reddit/r/LocalLLaMA

new MoE release from ai2 - EMO, 1b-active/14b-total trained on 1t tokens interesting thing is document-level routing. experts cluster around domains like health, news, etc. instead of surface patterns models: [https://huggingface.co/collections/allenai/emo](https://huggingface.co/collections/allenai

Developer Tools

🟡 CopilotKit/CopilotKit — The Frontend Stack for Agents & Generative UI. React + Angular. Makers of the AG-UI Protocol — score 64 Sources: github_trending

The Frontend Stack for Agents & Generative UI. React + Angular. Makers of the AG-UI Protocol

🟡 vercel-labs/skills — The open agent skills tool - npx skills — score 61 Sources: github_trending

The open agent skills tool - npx skills

🟡 colbymchenry/codegraph — Pre-indexed code knowledge graph for Claude Code — fewer tokens, fewer tool calls, 100% local — score 55 Sources: github_trending

Pre-indexed code knowledge graph for Claude Code — fewer tokens, fewer tool calls, 100% local

🟡 ChromeDevTools/chrome-devtools-mcp — Chrome DevTools for coding agents — score 53 Sources: github_trending

Chrome DevTools for coding agents

🟡 PaddlePaddle/PaddleOCR — Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages. — score 51 Sources: github_trending

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟡 DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D] — score 56 Sources: reddit/r/MachineLearning

DeepSeek dropped the full V4 paper this week. preview from april was 58 pages, this version adds a lot of technical depth. What stood out for me. FP4 quantization aware training. theyre running FP4 QAT directly in late stage training. MoE expert weights quantized to FP4 (the main gpu memory consumer

Other Signals

🟡 Qwen 35B-A3B is very usable with 12GB of VRAM — score 68 Sources: reddit/r/LocalLLaMA

Hardware: RTX 3060 12GB 32GB DDR4-3200 Windows CUDA 13.x Model: Qwen3.6-35B-A3B-MTP-IQ4_XS.gguf The model is a 35B MoE, so -ncmoe matters a lot. Lower -ncmoe means more MoE blocks stay on GPU. # Main takeaway 12GB VRAM feels like a very practical size for this model. It lets you keep enough

🟡 I built a platform to run AI employees and companies autonomously. — score 52 Sources: reddit/r/AIAgents

🟡 NeurIPS reviewers, any word after the invite email? [D] — score 44 Sources: reddit/r/MachineLearning

I got a NeurIPS reviewer invite last week, and accepted it. It said that bidding for papers will start may 8th (today). But haven’t heard anything yet. Has anyone else heard anything? Did I mess up while accepting the reviewer invite or is this normal? P.s., thoughts on the AI-assisted reviewing exp

🟢 Incremental

Model Releases

🟢 Formalizing statistical learning theory in Lean 4 [R] — score 31 Sources: reddit/r/MachineLearning

I’ve been working on a Lean 4 project focused on formalizing parts of statistical learning theory: FormalSLT repository Current results include: * finite-class ERM bounds * Rademacher symmetrization * high-probability Rademacher bounds

🟢 How long for llama.cpp official support of MTP? — score 14 Sources: reddit/r/LocalLLaMA

Hello there (beginner here) I've been unable to build myself llama.cpp for my Strix Halo (Windows 11) (cmake errors, I have not digged too much into it, already burned hours...), so I was wondering when an official release for Vulkan/HIP with MTP support would be available? Thanks!

🟢 FEEDBACK FOR MY APP — score 0 Sources: reddit/r/AIAgents

I built this app using Lovable as my first AI-powered project. It’s a fully functional messaging application with chat, voice calling, and video calling features, and everything is working smoothly. I also converted it into an APK using andriod studio f

Developer Tools

🟢 Code Reviewer can see everything and yet production keeps breaking — score 39 Sources: reddit/r/AIAgents

What’s interesting to me about AI code reviews isn’t really the code generation part anymore. It’s the fact that review tools can now see almost everything inside a codebase, and production incidents are still going up anyway. I came across a stat saying teams using AI coding tools saw PR volume inc

🟢 AI adaptive capability Synthesis?? Thoughts? JL_Engine — score 39 Sources: reddit/r/AIAgents

hey yall. i’ve been thinking about how autonomous AI agents already operate differently than traditional software systems. normal software usually depends on fixed tools, predefined permissions, and predictable workflows. meanwhile there are already agent systems capable of dynamically creating work

🟢 Would love feedback for this tool that catches failures before deploying — score 39 Sources: reddit/r/AIAgents

https://preview.redd.it/onlu1ys3jxzg1.png?width=638&format=png&auto=webp&s=c2e4078c2410a2e4b0ba63be1f49532e18223b76 Hey everyone I'm looking for AI agent builders to give feedback on Stratix SDK, an open-source Python SDK for proper pre-depl

🟢 All my clients wanted a carousel, now it's an AI chatbot — score 25 Sources: hackernews

🟢 My experience interviewing with Huawei Vancouver for an ML research role: strong mismatch between how it was pitched and how it was evaluated [D] — score 19 Sources: reddit/r/MachineLearning

I want to share an interview experience anonymously in case it helps others on the job market. I was approached about a Vancouver ML role that was presented to me as research-oriented. The recruiter told me the team had looked at my research and that I should be ready to discuss my projects, so I ex

Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟢 MTP is all about acceptance rate — score 23 Sources: reddit/r/LocalLLaMA

So I was very excited about the MTP stuff especially since Gemma4 has become my "daily driver" for some stuff. I grabbed the latest mlx-vlm and did some tests and found it disappointing. | Workload | MTP off | MTP on | Result | Draft accept rate | |---|---|---|---|---| | Code generation | 75 tok/s |

Research Papers

🟢 GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs — score 20 Sources: huggingface

We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain exper

Other Signals

🟢 Got MTP + TurboQuant running — Qwen3.6-27B -- 80+ t/s at 262K context on a single RTX 4090 — score 32 Sources: reddit/r/LocalLLaMA

So I've been messing around trying to get MTP working alongside TBQ4_0 (TurboQuant's lossless 4.25 bpv KV cache) on Qwen3.6-27B for my own use. So after a day of vibecoding I think I may have gotten something viable. Went from about 43 t/s when I first got it compiling to 80-87 t/s after optimizing

🟢 Can LLMs model real-world systems in TLA+? — score 8 Sources: hackernews

🟢 Neurips : Pushing anonymous repo after rebuttal [D] — score 6 Sources: reddit/r/MachineLearning

Hi everyone, I have a question about NeurIPS submission/review rules and anonymous code repositories. Suppose a paper was submitted before the deadline, and the anonymous code repo is linked as supplementary/reproducibility material. After the deadline, we notice that one label/name in the paper is

Repo	Description	Stars Today	Language
bytedance/UI-TARS-desktop	The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra	850	typescript
datawhalechina/hello-agents	📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程	667	python
earendil-works/pi	AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods	638	typescript
anomalyco/opencode	The open source coding agent.	628	typescript
rohitg00/agentmemory	#1 Persistent memory for AI coding agents based on real-world benchmarks	400	typescript
Fission-AI/OpenSpec	Spec-driven development (SDD) for AI coding assistants.	316	typescript
CopilotKit/CopilotKit	The Frontend Stack for Agents & Generative UI. React + Angular. Makers of the AG-UI Protocol	215	typescript
vercel-labs/skills	The open agent skills tool - npx skills	208	typescript
colbymchenry/codegraph	Pre-indexed code knowledge graph for Claude Code — fewer tokens, fewer tool calls, 100% local	161	typescript
ChromeDevTools/chrome-devtools-mcp	Chrome DevTools for coding agents	145	typescript

📄 New Papers

Title	Category	Hotness	Link
Audio-Visual Intelligence in Large Foundation Models	research_paper	25	Open
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction	research_paper	61	Open
Partial Evidence Bench: Benchmarking Authorization-Limited Evidence in Agentic Systems	cs.AI	0	Open
Intelligent CCTV for Urban Design: AI-Based Analysis of Soft Infrastructure at Intersections	cs.AI	0	Open
When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models	cs.AI	0	Open
PRISM: Perception Reasoning Interleaved for Sequential Decision Making	cs.AI	0	Open
LaTA: A Drop-in, FERPA-Compliant Local-LLM Autograder for Upper-Division STEM Coursework	cs.AI	0	Open
From History to State: Constant-Context Skill Learning for LLM Agents	cs.AI	0	Open
The Geopolitics of AI Safety: A Causal Analysis of Regional LLM Bias	cs.AI	0	Open
Authorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure	cs.AI	0	Open
Agentic Discovery of Exchange-Correlation Density Functionals	cs.AI	0	Open
Intentionality is a Design Decision: Measuring Functional Intentionality for Accountable AI Systems	cs.AI	0	Open
LANTERN: LLM-Augmented Neurosymbolic Transfer with Experience-Gated Reasoning Networks	cs.AI	0	Open
FoodCHA: Multi-Modal LLM Agent for Fine-Grained Food Analysis	cs.AI	0	Open
Housing Potential Common Data Model and City Digital Twin	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: Running Codex safely at OpenAI

🐦 Twitter/X Highlights

Account	Tweet Summary
OpenAI	Chain of thought monitors are a key layer of defense against AI agent misalignment. To preserve monitorability, we avoid penalizing misaligned reasoning during RL. We found a limited amount of accidental CoT grading which affected released models, and are sharing our analysis. https://alignment.open Post

Repeated From Recent Briefings

Hmbown/DeepSeek-TUI — Coding agent for DeepSeek models that runs in your terminal - first seen 2026-05-02
anthropics/financial-services - first seen 2026-05-07
farion1231/cc-switch — A cross-platform desktop All-in-One assistant tool for Claude Code, Codex, OpenCode, openclaw & Gemini CLI. - first seen 2026-05-08
Getting harassed by an aggressive “independent researcher” demanding very specific citations and phrasing in my paper [D] - first seen 2026-05-08
rtk-ai/rtk — CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies - first seen 2026-05-05
LearningCircuit/local-deep-research — ~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.cpp, Ollama, Google, ...). 10+ search engines - arXiv, PubMed, your private documents. Everything Local & Encrypted. - first seen 2026-05-03
People Interested in Continual Learning Research[R] - first seen 2026-05-08
aaif-goose/goose — an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM - first seen 2026-05-07
I built an A2A Context Bus, which helps you to make sure every agent uses the same optimized context. - first seen 2026-05-08
z-lab/dflash — DFlash: Block Diffusion for Flash Speculative Decoding - first seen 2026-05-08
... plus 309 more repeated items in processed data

AI Watchtower Briefing — 2026-05-09

🔴 High Significance

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

📈 Trending Repos

📄 New Papers

🏢 Lab Blog Posts

🐦 Twitter/X Highlights

Repeated From Recent Briefings