AW · AI Watchtower

🔴 High Significance

Developer Tools

🔴 Anthropic's open-source framework for AI-powered vulnerability discovery — score 90 Sources: hackernews

🔴 KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag) — score 82 Sources: reddit/r/LocalLLaMA

The KV-cache quant race just got more interesting. Huawei just open-sourced KVarN, a KV-cache quantization method under Apache 2.0, drops into vLLM with one flag. Posting because the tradeoff it's claiming is genuinely different from what's already in the stack, and I'd like to see it stress-tes

🔴 What AI app builder are you using these days? Strong use cases + real experiences — score 81 Sources: reddit/r/AIAgents

I'm starting to reach a saturation point with the AI app builders now. Feels like every other day on X someone’s claiming they built and shipped a full app over the weekend with some new tool. Lovable, Bolt.new, Emergent, Replit Agent… it’s nonstop and hard to tell what’s good atp. I’m trying to pic

🔴 fathah/hermes-desktop — Desktop Companion for Hermes Agent — score 78 Sources: github_trending

Desktop Companion for Hermes Agent

Infrastructure & Compute

🔴 Nvidia's been paying shills on LinkedIn — score 96 Sources: reddit/r/LocalLLaMA

3 different accounts, some even with LinkedIn Gold, made the above posts all on the same day. And clearly all of them followed the marketing team's pointers without even understanding how locally hosted AI works, no way a $249 8GB machine can replace frontier models.

Research Papers

🔴 AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints — score 78 Sources: huggingface · arxiv/cs.CL

Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still underexplore adaptive planning under such progressively revealed dual c

🔴 Complexity-Balanced Diffusion Splitting — score 75 Sources: huggingface

Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance, deploying a massive network uniformly across the entire generative

Other Signals

🔴 finally — score 89 Sources: reddit/r/LocalLLaMA

🟡 Notable

Model Releases

🟡 Computer-use agents now beat humans on AndroidWorld. Where are the production QA deployments? — score 56 Sources: reddit/r/AIAgents

Was looking at the AndroidWorld leaderboard this week. The top entry hits 92% on mobile UI tasks, beating an 88% human baseline. On paper that's already past the line where you'd expect production QA agents to be everywhere. But every time I talk to QA leads at meetups they're still on Selenium + Cy

Developer Tools

🟡 KVarN: Variance-Normalized KV-Cache Quantization [R] — score 69 Sources: reddit/r/MachineLearning

Excited to share some of my own work here :) KVarN is our new KV-Cache quantization method. In very brief, we combine Hadamard rotations with variance-normalization on both axes of the K and V matrices, then round to nearest. Simple, but works very well, especially for decode-heavy test-time-s

🟡 mvanhorn/last30days-skill — AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary — score 57 Sources: github_trending

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

🟡 What should an agent handoff include besides the transcript? — score 56 Sources: reddit/r/AIAgents

Full transcripts feel like the wrong default once an agent run gets long. I’m getting more value from a compact handoff: what we’re trying to do, what’s already decided, what failed, current state, and the next action. What else has actually reduced rework for you?

🟡 AIDC-AI/Pixelle-Video — 🚀 AI 全自动短视频引擎 | AI Fully Automated Short Video Engine — score 49 Sources: github_trending

🚀 AI 全自动短视频引擎 | AI Fully Automated Short Video Engine

Infrastructure & Compute

🟡 cyberpapiii/chipotlai-max — The AI coding agent that runs on stolen Chipotle compute 🌯 Fork of OpenCode with Pepper AI as default model. Community project to add providers from Home Depot, Lowes, Target, Starbucks & more. — score 59 Sources: github_trending

The AI coding agent that runs on stolen Chipotle compute 🌯 Fork of OpenCode with Pepper AI as default model. Community project to add providers from Home Depot, Lowes, Target, Starbucks & more.

Business & Funding

🟡 Would you say capture-time semantic annotation for robot trajectories is a solved problem? [R] — score 44 Sources: reddit/r/MachineLearning

It seems raw teleoperation data (RGB + joint states) structurally lacks affordance, contact intent, and embodiment-specific kinematic context. (information that can't be reliably recovered post-hoc once the demonstration is recorded) Most current approaches either filter/clean after collection, or r

Research Papers

🟡 LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs — score 68 Sources: huggingface · arxiv/cs.AI

Large language models can reproduce training data, but existing memorization evaluations mostly measure whether models can be forced to do so, rather than whether they do so under ordinary use. We introduce PropMe, a propensity-aware framework for memorization evaluation that contrasts prefix-based

🟡 Towards One-to-Many Temporal Grounding — score 62 Sources: huggingface · arxiv/cs.AI

Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios, however, often require localizing multiple disjoint segments for a single query -- a setting we term One-to-Many Temporal

🟡 Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs — score 58 Sources: huggingface · arxiv/cs.CL

Automatic Speech Recognition (ASR) has become a key technology for human--AI interaction. However, code-switching ASR (CS-ASR) remains particularly challenging due to the severe scarcity of multilingual CS speech resources across diverse language pairs. Existing approaches primarily improve CS-ASR p

Other Signals

🟡 Today made me realize just how bad things have gotten without Meta — score 68 Sources: reddit/r/LocalLLaMA

🟡 Showcase: a much easier way to give your agent a free phone number — score 64 Sources: reddit/r/AIAgents

Yo I just wanted to share a project me and my friend are working on called OP. It gives agents (especially hermes/openclaw) a real phone number they can use to send texts, do 2FA, calls, etc. Twilio works, but the free tier sucks with the sandbox since all their numbers are VoIP and have to follow 1

🟡 You guys were right - Qwen 3.6 35B IS good...and KV Cache DOES matter. — score 61 Sources: reddit/r/LocalLLaMA

WARNING: I'm speed typing this, no time to organizea/format, so if short paragraph chunks bother you, just keep it moving. CONTEXT UPDATE: (for those interested, otherwise skip) >For those interested in the data points, the task was building an agentic workflow inside of rivet that incl

🟡 How do ML researchers actually use AI tools to improve their writing? [D] — score 56 Sources: reddit/r/MachineLearning

As an ML researcher, how do you use AI tools in your daily work? Do you mostly use them to clean up grammar and wording, or also to rewrite, structure, or draft technical text?

🟡 Finally finished my LLM server: EPYC 9575F, 4× RTX 3090 (96GB VRAM), 768GB ECC RAM — score 54 Sources: reddit/r/LocalLLaMA

Took a while, but Nalthis is finally up and assembled. Specs: * Supermicro H13SSL-N * AMD EPYC 9575F (64C/128T Zen 5) * 768GB DDR5-5600 ECC RDIMM * 4× RTX 3090 (96GB VRAM total) * 1× 2TB NVMe OS * 2× 3.94TB NVMe data * 2050W ATX 3.1 PSU * Corsair 9000D Planned use: * vLLM - high throughput small mod

Omitted 4 additional other signals items from the main section; see raw data and source-specific sections below.

🟢 Incremental

Model Releases

🟢 Are We Underestimating Small Edge AI Models?[D] — score 19 Sources: reddit/r/MachineLearning

A lot of recent discussion around Edge AI focuses on running increasingly larger local LLMs. Meanwhile modern smartphones already have enough compute for many practical computer vision tasks that don't require massive models at all. I recently built and released an Android feature that performs offl

🟢 An agent runtime with persistent memory that fans work out across multiple models. — score 19 Sources: reddit/r/AIAgents

Hey! Finally releasing code I've put the past 4-5 months of my life into, I had an idea and wanted to fix some things that really irritated me with LLMs. Aimee runs agents that actually remember. Self-hosted, your keys. No subscriptions, no costs, purely open source. First public beta release, but t

🟢 Here is my llama.cpp NVFP4/MXFP6 GGUF quantizer tool — score 18 Sources: reddit/r/LocalLLaMA

Hello everyone I wanted to share what I've been working on. I started writing NVFP4 kernels for llama.cpp last year and needed the ability to quantize NVFP4 GGUFs, so this project started as an NVFP4 quantizer. It's since become much larger. I would love to get more help to improve it. This is what

Developer Tools

🟢 RTX Spark Ads: DJT Edition — score 39 Sources: reddit/r/LocalLLaMA

"We’re going to have the most beautiful laptops, they’ll be the slimmest laptops ever. A total masterpiece, look at that green chip. Unbelievably powerful. They’ll be so slim you won’t even see them from the side…believe me…it’s true. A lot of people are saying it. It’s not like those big, clumsy, f

🟢 I built a local AI agent runtime focused on security and UX after being unsatisfied with existing options — here is what I learned — score 36 Sources: reddit/r/AIAgents

I have been using open source AI agent runtimes for a while and kept running into the same two problems. Either the tool was powerful but the security model made me uncomfortable giving it access to my email and projects, or it was safe but too stripped down to do anything genuinely useful. So I bui

🟢 Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? [d] — score 19 Sources: reddit/r/MachineLearning

Hello everyone, Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? I am working on a project idea related to library-specific code generation. The concrete case is a specific Python library used in a technical/scientific domain. The go

🟢 Which AI lip-sync tool are people actually using in 2026? — score 19 Sources: reddit/r/AIAgents

I have been experimenting with faceless videos lately and realized good lip sync is way harder than I expected. Over the last few weeks I tested a bunch of option, HeyGen, InfiniteTalk, and a few smaller tools that kept popping up in recommendations. A lot of tools in the market nail the lips but le

🟢 Why haven't MCP Apps gone viral the way MCP and Skills did? — score 19 Sources: reddit/r/AIAgents

When MCP and Agent Skills came out, they went viral really fast. But why didn't the MCP App gain that same traction? Or at least not anywhere close? For those who don't know, MCP app is a standard that introduces interactive UI for MCPs. Check out this link for more info. [https://modelcontextprotoc

Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟢 NVIDIA/NemoClaw — Run agents like Hermes and OpenClaw more securely inside NVIDIA OpenShell with managed inference — score 18 Sources: github_trending

Run agents like Hermes and OpenClaw more securely inside NVIDIA OpenShell with managed inference

Other Signals

🟢 Gemma 4 12B is my new main squeeze — score 32 Sources: reddit/r/LocalLLaMA

The Unsloth Q5_K_XL is officially my main squeeze for local coding. I started out with the Q4_K_XL, but found myself fixing syntax errors a little too often. It wasn't terrible, but I had one file where I had to make 23 edits just for syntax. With the Q4 I was pulling around 61 t/s, and moving t

🟢 Fine-tuning an LLM to write docs like it's 1995 — score 30 Sources: hackernews

🟢 How LLM-driven NPCs work in Ultima Online (ServUO) — score 25 Sources: reddit/r/LocalLLaMA

🟢 PSA: You may not need to quantize spec draft when using MTP — score 11 Sources: reddit/r/LocalLLaMA

Using `--spec-draft-type-k q4_0 --spec-draft-type-v q4_0` might actually decrease your context size! With quantized spec draft, my context size is 83200. Without it (i.e. using the default fp16 spec draft), context size increased to 91648. I reported this in a llama.cpp discussion and am17an (th

Repo	Description	Stars Today	Language
fathah/hermes-desktop	Desktop Companion for Hermes Agent	387	typescript
cyberpapiii/chipotlai-max	The AI coding agent that runs on stolen Chipotle compute 🌯 Fork of OpenCode with Pepper AI as default model. Community project to add providers from Home Depot, Lowes, Target, Starbucks & more.	201	typescript
mvanhorn/last30days-skill	AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary	199	python
AIDC-AI/Pixelle-Video	🚀 AI 全自动短视频引擎 \| AI Fully Automated Short Video Engine	125	python
NVIDIA/NemoClaw	Run agents like Hermes and OpenClaw more securely inside NVIDIA OpenShell with managed inference	53	typescript

📄 New Papers

Title	Category	Hotness	Link
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints	research_paper	25	Open
Complexity-Balanced Diffusion Splitting	research_paper	13	Open
LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs	research_paper	7	Open
Towards One-to-Many Temporal Grounding	research_paper	4	Open
Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs	research_paper	3	Open
How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment	cs.AI	0	Open
What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems	cs.AI	0	Open
I Know What You Meme, Even If it Emerged Today: Understanding Evolving Memes through Open-World Knowledge Acquisition	cs.AI	0	Open
GITCO: Gated Inference-Time Context Optimization in TSFMs	cs.AI	0	Open
Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory	cs.AI	0	Open
SentinelBench: A Benchmark for Long-Running Monitoring Agents	cs.AI	0	Open
An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)	cs.AI	0	Open
Synthetic Contrastive Reasoning for Multi-Table Q&A	cs.AI	0	Open
Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges	cs.AI	0	Open
Residual Modeling for High-Fidelity Learned Compression of Scientific Data	cs.AI	0	Open

🐦 Twitter/X Highlights

Account	Tweet Summary
OpenAI	What happened when one of our models found a counterexample to an 80-year-old Erdős conjecture? Researchers @alexwei_, @HongxunWu, and @wjmzbmr1 shared the story on the OpenAI Podcast with @AndrewMayne and explained how mathematicians and models can work together to make new discoveries. Post

Repeated From Recent Briefings

chopratejas/headroom — Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server. - first seen 2026-06-03
NousResearch/hermes-agent — The agent that grows with you - first seen 2026-05-11
farion1231/cc-switch — A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io - first seen 2026-05-08
HKUDS/Vibe-Trading — "Vibe-Trading: Your Personal Trading Agent" - first seen 2026-06-03
Open-LLM-VTuber/Open-LLM-VTuber — Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms - first seen 2026-05-08
anomalyco/opencode — The open source coding agent. - first seen 2026-05-09
supermemoryai/supermemory — Memory engine and app that is extremely fast, scalable. The Memory API for the AI era. - first seen 2026-06-01
TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration - first seen 2026-06-04
datawhalechina/hello-agents — 📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程 - first seen 2026-05-09
nesquena/hermes-webui — Hermes WebUI: The best way to use Hermes Agent from the web or from your phone! - first seen 2026-06-01
... plus 485 more repeated items in processed data

AI Watchtower Briefing — 2026-06-05

🔴 High Significance

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Business & Funding

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Other Signals

📈 Trending Repos

📄 New Papers

🐦 Twitter/X Highlights

Repeated From Recent Briefings