AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Cohere's unreleased coding model (early access for localllama) — score 96 Sources: reddit/r/LocalLLaMA

Hey, Nick here from Cohere. Thanks for all the feedback on Command A+ the other week everyone. I read these threads all the time about other releases so it was fun to read one about our own :) w

Developer Tools

🔴 The biggest multi-agent lesson I learned, one agent doing everything usually gets worse not better — score 83 Sources: reddit/r/AIAgents

A thing that finally clicked for me, multi-agent systems only help when each agent has a very clear job. If you split work into 5 agents just because it feels more advanced, you usually get more latency, more weird handoffs, and harder debugging. The failure mode I kept seeing was basically this

🔴 microsoft/VibeVoice — Open-Source Frontier Voice AI — score 75 Sources: github_trending

Open-Source Frontier Voice AI

🔴 The AI Agent Learning Resource I Wish Existed Earlier — score 72 Sources: reddit/r/AIAgents

The best way to learn about different agent architectures is by implementing agents in diverse set of use-cases. I've been contributing agent examples to an open-source repository that's grown into a practical collection of 80+ runnable AI applications: [https://github.com/Arindam200/awesome-ai-apps

Infrastructure & Compute

🔴 Google to pay SpaceX $920M a month for compute capacity at xAI data centers — score 79 Sources: hackernews

Other Signals

🔴 Meta confirms 1000s of Instagram accounts were hacked by abusing its AI chatbot — score 93 Sources: hackernews

🔴 Open models to win ✌ — score 88 Sources: reddit/r/LocalLLaMA

🔴 120 tok/s on 12GB VRAM with Gemma 4 12B QAT MTP — score 81 Sources: reddit/r/LocalLLaMA

Google just released the QAT (Quantization-Aware Training) variant of their Gemma 4 models, including 12B, so it was only natural for me to benchmark it on my 12GB GPU since it fits entirely in VRAM. I was pleasantly surprised with the result! By using llama.cpp patched with the Gemma 4 MTP PR, and

🟡 Notable

Model Releases

🟡 Building a Claude-certified developer network: looking for builders to join (free certification path) — score 50 Sources: reddit/r/AIAgents

[Update] Wow, 32 sign-ups already, thank you all! Still plenty of room (we're aiming for 100), so keep them coming. 🙏 My EU-based agency (rolloutit.net) is a recognized at the moment as "Selected partner" in Anthropic's Claude Services Track, pushing toward Preferred, which takes 100 Claude-certif

🟡 I design with Claude more than Figma now — score 50 Sources: hackernews

🟡 Z.ai, we need Air! GLM GGUF wen? — score 42 Sources: reddit/r/LocalLLaMA

First we never saw an upgraded Air model after 4.5. Then GLM 4.7 Turbo was great, but quickly surpassed for coding. Now GLM 5.1 is a coding beast, but too huge for most to run locally, and even slow on API. Will we ever get another Air model with frontier reasoning and knowledge? Or a turbo model th

Developer Tools

🟡 Harness engineering: Leveraging Codex in an agent-first world — score 64 Sources: hackernews

🟡 supabase/supabase — The Postgres development platform. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications. — score 53 Sources: github_trending

The Postgres development platform. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.

🟡 khoj-ai/khoj — Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free. — score 50 Sources: github_trending

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

🟡 Sources for ML news? [D] — score 44 Sources: reddit/r/MachineLearning

I need a break from social media and all the bots.. Aside from Arxiv are there any sources that do a good job of aggregating the good stuff and filtering out all the junk?

Infrastructure & Compute

🟡 You don't need a GPU to run gemma-4-26B-A4B — score 50 Sources: reddit/r/LocalLLaMA

I've been running LLMs on my old potato i5-8500 with 32GB of RAM and *no GPU* for awhile now, running up to 12B dense models which run slow but perfectly useable. But this Gemma-4-26B-A4B simply flies on this CPU - only machine using Koboldcpp on Linux. That's right, an old used $150 desktop compu

Other Signals

🟡 KV cache quant benchmarks: KVarN 6-bit matches q8_0, 4-bit matches q5_0. Massive! — score 65 Sources: reddit/r/LocalLLaMA

TL;DR Based on long context KLD benchmarks, KVarN appears to be just better than usual llama.cpp KV cache quants. At every size, KVarN matches precision of usual quants of one bit higher. A number of people in the comments under my [previous post](https://www.reddit.com/r/LocalLLaMA/co

🟡 ML reading group to read recent interesting and trending papers from ICML/ICLR/NeurIPS [D] — score 62 Sources: reddit/r/MachineLearning

Hi, I am a PhD student and trying to run a ML reading group focused on interpretability and robustness every weekend. Its always nice to hear different takes and opinions on a paper and this discussion group could serve the purpose. If you are a fellow PhD student or a ML researcher interested in re

🟡 Anyone here with experience submitting to Nature Machine Intelligence? [R] — score 62 Sources: reddit/r/MachineLearning

I'm planning to submit a paper to either NMI, but this will be my first paper to a nature-like venue. Would love a quick chat with anyone that has experience. My paper's specifically more geared towards signal processing with ML for a specific subfield of engineering. But can be interdisciplinary.

🟡 Gemma 4 QAT Unquantized Heretic is here — score 58 Sources: reddit/r/LocalLLaMA

Now someone needs to quantize them to 4bit, also I have intentionally kept the divergence and refusal different from original Gemma 4 heretic collection, so you can even try these as alternative to original model.

🟢 Incremental

Model Releases

🟢 5 Months Later: open-deepthink Now Has Full Knowledge Distillation Mode — score 27 Sources: reddit/r/LocalLLaMA

Hey r/LocalLLaMA, Some of you might remember when I posted about this project back around September last year (it was called local-deepthink then). The core idea was to move past the usual flat multi-agent setups and instead build something that creates depth. It already ran great locally with lla

🟢 I can't wait for all the x250 sample distills of Mythos and GPT-5.6 — score 19 Sources: reddit/r/LocalLLaMA

Just kidding. Are there any distills that actually improve a model's quality? I remember the Qwen R1 8B distill improved the model, but since then, I don't remember ever using a distilled model that was better than the base model. Unless Mythos (or GPT-5.6) is some magical model where only a couple

🟢 Research collection of Arxiv whitepapers [R] — score 6 Sources: reddit/r/MachineLearning

I read and collected Arxiv whitepapers starting after the launch of ChatGPT. I copied and pasted excerpts into Word to track them. Then migrated to Obsidian. That vault of some 1700 papers is now online. I figured it was time to see if others would find the collection useful. My whitepapers were org

🟢 Clustering 3x Jetson Nano Orin Supers — score 4 Sources: reddit/r/LocalLLaMA

Hey everyone! Recently, I released a blog on how to setup a cluster out of your Raspberry Pi 4bs and Mac minis for distributed training and inference Now its time to do the same with Jetson Nano Orin Super! Why ? - 1024 CUDA Cores (Ampere) - 8GB unified memory LPDDR5 - 6x ARM Cortex-A78 @ 1728 MH

Developer Tools

🟢 Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering — score 36 Sources: hackernews

🟢 AI agents + Swagger/OpenAPI = no more copying API docs into chats — score 30 Sources: reddit/r/AIAgents

I got tired of re-explaining my API to AI coding agents, so I built a Swagger MCP server. While working with AI agents, I kept running into the same issue. Whenever I started a backend-related feature, I had to explain the API again: * Which endpoints exist * Request/response structures * DTOs a

🟢 Training-free graph SSL matches GCN with 5× fewer labels — live demo [P] — score 19 Sources: reddit/r/MachineLearning

Hi all, I have been working on this method based on a hunch along with many llm for quite some time. Though first it was being engineered by me but I was learning in supervised ml area but this hunch took to semi-supervised ml and that to too deep. I then became llm orchestrator of sort while 4 llm'

🟢 anthropics/claude-code-action — score 19 Sources: github_trending

🟢 Cool stuff to do with NVIDIA RTX 6000 PRO 96GB VRAM — score 12 Sources: reddit/r/LocalLLaMA

I have been a C++ dev for 3 years as long as have done PyTorch in my free time (not that good in the latter). Now, I was lucky enough to get a brand new GPU from a colleague. What are some cool side projects I can build to learn tons about ML and inference/infra? Please don't respond saying "anythin

Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.

Other Signals

🟢 Gemma 4 31B QAT Q4 vs standard Q4 — Top1 KLD benchmark results have me confused. Someone please explain or poke holes in this. — score 35 Sources: reddit/r/LocalLLaMA

Edited - After digging into this some more and reviewing unsloth post for better understanding, the divergence APPEARS to stem from I did not use the BF16 QAT model as the "reference" model.... The QAT vs standard Q4 comparison in our benchmark is not apples-to-apples. The QAT models were evalua

🟢 Does it make sense to use alternative quantizations of QAT models? [D] — score 31 Sources: reddit/r/MachineLearning

From TF's website: > Quantization aware training emulates inference-time quantization, creating a model that downstream tools will use to produce actually quantized models. So is it designed to work with a very specific quantization method (for Gemma-4, presumably, Google's own)? Or would it make

🟢 Human-Like Neural Nets by Catapulting — score 21 Sources: hackernews

🟢 Arithmetic Without Numbers – How LLMs Do Math — score 7 Sources: hackernews

Repo	Description	Stars Today	Language
microsoft/VibeVoice	Open-Source Frontier Voice AI	216	python
supabase/supabase	The Postgres development platform. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.	56	typescript
khoj-ai/khoj	Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.	46	python
anthropics/claude-code-action		12	typescript
IBM/mcp-context-forge	An AI Gateway, registry, and proxy that sits in front of any MCP, A2A, or REST/gRPC APIs, exposing a unified endpoint with centralized discovery, guardrails and management. Optimizes Agent & Tool calling, and supports plugins.	3	python

Repeated From Recent Briefings

Panniantong/Agent-Reach — Give your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees. - first seen 2026-06-06
Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution - first seen 2026-06-05
CopilotKit/CopilotKit — The Frontend Stack for Agents & Generative UI. React, Angular, Mobile, Slack, and more. Makers of the AG-UI Protocol - first seen 2026-05-09
How do you identify researchers who are good? [D] - first seen 2026-06-06
What are the best Web Search MCPs? I am using Firecrawl but looking for alternatives - first seen 2026-06-06
MemPalace/mempalace — The best-benchmarked open-source AI memory system. And it's free. - first seen 2026-06-06
mvanhorn/last30days-skill — AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary - first seen 2026-06-05
PaddlePaddle/PaddleOCR — Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages. - first seen 2026-05-09
The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset - first seen 2026-06-03
heygen-com/hyperframes — Write HTML. Render video. Built for agents. - first seen 2026-05-10
... plus 33 more repeated items in processed data

AI Watchtower Briefing — 2026-06-07

🔴 High Significance

Model Releases

Developer Tools

Infrastructure & Compute

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Other Signals

📈 Trending Repos

Repeated From Recent Briefings