AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Claude Fable 5 — score 94 Sources: hackernews

🔴 If Claude Fable stops helping you, you'll never know — score 81 Sources: hackernews

Developer Tools

🔴 Woke up to a $360 bill because my AI agent went rogue overnight. Observability is a nightmare. — score 83 Sources: reddit/r/AIAgents

Hey r/aiagents, Just had a truly painful morning. Left an agent running overnight, thought everything was fine, only to wake up to a bill that made my jaw drop. We're talking $360 for what should have been a simple, contained task. This isn't just about the money, though that stings. It's about the

🔴 Without open llm competition, closed source LLM companies will become insatiable. — score 82 Sources: reddit/r/LocalLLaMA

I can't imagine how arrogant one must be to make such a decision. People pay $200 a month for Anthropic to mess with their codebase. Imagine how they would humiliate their customers if the world didn't have an open-source model. https://preview.redd.it/6qr2ymt25d6h1.png?width=1646&format=png&amp

🔴 iOS 27 Siri is using WaveRNN and FastSpeech2 [D] — score 81 Sources: reddit/r/MachineLearning

Found from iOS Simulator's files. Both of them are in espresso format There's also another compiled CoreML for concert ranking and based on the content inside of it looks like to be a simple logistic regression. See [https://www.reddit.com/r/jailbreak/comments/1u1e1b4/access_to_simulators_root_f

🔴 AI agent demos are fun, but the boring tests are where the truth shows up — score 72 Sources: reddit/r/AIAgents

I’ve seen a lot of impressive voice agent demos lately, but the real evaluation starts after the demo script ends. What happens when the customer interrupts? Goes silent? Changes their mind? Gives half the required info? Asks something out of scope? For anyone building or buying agents, what are you

🔴 NVIDIA/SkillSpector — Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks. — score 71 Sources: github_trending

Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks.

Infrastructure & Compute

🔴 Since when the RTX 6000 PRO is priced at 13250USD on the official NVIDIA Page? — score 75 Sources: reddit/r/LocalLLaMA

https://marketplace.nvidia.com/en-us/enterprise/laptops-workstations/nvidia-rtx-pro-6000-blackwell-workstation-edition/

Research Papers

🔴 Kwai Keye-VL-2.0 Technical Report — score 95 Sources: huggingface

We introduce Kwai Keye-VL-2.0-30B-A3B, an open-source Mixture-of-Experts (MoE) multimodal foundation model designed to advance long-video understanding and agentic intelligence. To address the challenges of ultra-long contexts, information redundancy, and prohibitive computational costs inherent in

🔴 Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution — score 78 Sources: huggingface · arxiv/cs.AI

Although Large Language Model (LLM) agents have demonstrated strong performance on complex tasks, their learning is often limited by inefficient interaction feedback and static training environments, which hinder broader generalization. To address these limitations, this paper introduces Role-Agent,

Other Signals

🔴 Rick & Morty — score 96 Sources: reddit/r/LocalLLaMA

nobody expected HF there

🔴 claude fable 5 just dropped and i genuinely cannot keep up anymore. how do you all stay on top of this stuff? — score 94 Sources: reddit/r/AIAgents

so fable 5 launched today. mythos-class, public, $10/$50 per million tokens, apparently miles ahead on agentic coding benchmarks. that's huge news. it's also the third huge news this week. last week it was the loops discourse... everyone arguing about whether designing loops is the future or just a

🔴 Anthropic is intentionally nerfing Fable when asked to develop other LLMs — score 89 Sources: reddit/r/LocalLLaMA

Reason 458 why local LLMs are going to be a necessity

🟡 Notable

Model Releases

🟡 Without open source LLMs, US AI companies could have already monopoled the technology — score 54 Sources: reddit/r/LocalLLaMA

For such technology with clear importance and impact on all of us, I believe that making it open source is an ethical duty, otherwise, especially with the 1-sided politics of the US we experience today, they could have already monopolized the technology by now, maybe make it exclusively available to

🟡 GuideAnts Open Source AI and agents platform — score 50 Sources: reddit/r/AIAgents

Today I put a release stamp on https://github.com/Elumenotion/GuideAnts, a full and open AI platform which supports local AI (chat, ASR, TTS, images, embeddings, etc) using Hugging Face Hub and cloud models from several providers including Hugging Face inf

🟡 From data to decisions: how LSEG is scaling trusted AI — score 50 Sources: lab_blog/OpenAI

See how LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles, and empowering 4,000 employees.

🟡 How engineers at Nextdoor use Codex to build without limits — score 50 Sources: lab_blog/OpenAI

How engineers at Nextdoor use Codex with GPT-5.5 to investigate hard-to-reproduce issues, build across platforms, and focus on product outcomes.

🟡 Fluid, natural voice translation with Gemini 3.5 Live Translate — score 50 Sources: lab_blog/DeepMind

Gemini 3.5 Live Translate brings near real-time, natural speech translation to Google AI Studio, Google Translate and Google Meet.

Omitted 5 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

🟡 Common weaknesses and scale issues with popular harnesses — score 61 Sources: reddit/r/AIAgents

Local-first agent frameworks like OpenClaw and Hermes Agent are brilliant when you are a solo developer running a script in your own terminal. They give you a fast, raw playground where an LLM can write to your local disk, run command tools, and call APIs. But the moment you try to put these framewo

🟡 maziyarpanahi/openmed — open-source healthcare ai — score 59 Sources: github_trending

open-source healthcare ai

🟡 Ataraxy-Labs/sem — Semantic version control => entity-level diffs, blame, and impact analysis on top of git. 26 languages via tree-sitter. Built for coding agents. — score 53 Sources: github_trending

Semantic version control => entity-level diffs, blame, and impact analysis on top of git. 26 languages via tree-sitter. Built for coding agents.

🟡 What will be the next breakthrough in ASR? [D] — score 50 Sources: reddit/r/MachineLearning

Hey All, I am currently working on ASR models, and I have gathered some recent literature. From my literature search, it seems like the ASR models are getting more and more powerful due to two main things. 1. Because pseudo-labelled data is growing, supervised models are rising rapidly. Whisper-larg

Research Papers

🟡 Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders — score 68 Sources: huggingface · arxiv/cs.AI

Language models increasingly serve as the backbone of text-to-speech (TTS) systems, yet we understand little about the representations they build when text and generated speech tokens share a single residual stream. We train BatchTopK sparse autoencoders on the LM backbone of CosyVoice3 and introduc

🟡 UniPET: a universal network for high-quality PET image denoising across varied dose reduction factors — score 50 Sources: huggingface

Most existing deep learning-based PET image denoising methods assume a fixed and known dose reduction factor (DRF) for low-dose PET images. However, these methods encounter significant performance degradation when the DRF varies beyond the assumed one in practical applications. To address the challe

🟡 U-TTT: Towards Generalizable PET Image Denoising via Test-Time Training — score 50 Sources: huggingface

Existing deep learning models for Positron Emission Tomography (PET) image denoising often suffer from severe performance degradation under distribution shifts, fundamentally restricting their robust clinical deployment. This lack of generalization stems from the conventional paradigm of fixed-param

Other Signals

🟡 Introducing Papers Without Code [P] — score 69 Sources: reddit/r/MachineLearning

Hi, Niels here from the open-source team at Hugging Face. I've recently relaunched paperswithcode.co as a source for finding the state of the art (SOTA) across various AI domains, from 3D generation to AI agents. This is done by automatically parsing research papers publi

🟡 CEOs who think AI replaces their employees are just bad CEOs — score 69 Sources: hackernews

🟡 People are making single-slot, half height pcie v100 with nvlink in China — score 68 Sources: reddit/r/LocalLLaMA

https://preview.redd.it/cugpphztz96h1.jpg?width=899&format=pjpg&auto=webp&s=2aa10f8b8f2a0ff666cdc2c63c1775ffd2ed7e7b https://preview.redd.it/14wncc3tz96h1.jpg?width=850&format=pjpg&auto=webp&s=85d3bc9c19ef458a0578159d5dc709552e92dad9 https://preview.redd.it/bj4fmubsz96h1.png?

🟡 Unsloth Gemma 4 QAT MTP assistant models now available — score 61 Sources: reddit/r/LocalLLaMA

They're both available as q8_0 models named mtp-gemma-4-*.gguf on the root of the directory and in both q8_0 and larger quants within an MTP folder. - https://huggingface.co/unsloth/gemma-4-12B-it-qat-GGUF/tree/main - https://huggingface.co/unsloth/gemma-4-26B-A4B-it-qat-GGUF/tree/main - https:/

🟡 German ruling declares Google liable for false answers in AI Overviews — score 56 Sources: hackernews

Omitted 2 additional other signals items from the main section; see raw data and source-specific sections below.

🟢 Incremental

Model Releases

🟢 I'm brand new to running LLMs and the sheer number of tools is overwhelming — score 36 Sources: reddit/r/LocalLLaMA

Hey everyone. I'm brand new to running LLMs in general, even more new to running them locally, and the sheer number of tools available is absolutely overwhelming. Regarding applications, I look at github and see so many different options that I don't know what to pick. Can't really fully decipher th

🟢 Releasing Apodex-1.0 Smol Models (0.8B, 2B, 4B Open-Weights) optimized for Agentic Verification + AgentHarness Evals — score 36 Sources: reddit/r/LocalLLaMA

Hey r/LocalLLaMA, We just released Apodex 1.0, and alongside our flagship API, we are releasing the weights for our Smol models (0.8B, 2B, and 4B). Our core research focuses on independent verification in long-horizon tasks. Instead of just scaling up parameter sizes for raw generation,

🟢 Local LLms releases — score 21 Sources: reddit/r/LocalLLaMA

Here are some graphs for the Local LLMs releases, it's strange except for the last month, i thought that this year was very heavy in terms of release, but is seems that the peak was last year. Maybe the hype about the quality improvement this year made it seems that it was richer than last year.

🟢 Can you really replace paid models with a local model? — score 11 Sources: reddit/r/LocalLLaMA

Long time lurker, and I say this as someone who genuinely loves this community and runs many local models myself. I’ve been using LLMs since the early GPT and LLaMA days. Obviously, models have come a unbelievably long way. Local/open models today are dramatically better than what we had a even a fe

Developer Tools

🟢 chroma-core/chroma — Search infrastructure for AI — score 39 Sources: github_trending

Search infrastructure for AI

🟢 anthropics/claude-code-security-review — An AI-powered security review GitHub Action using Claude to analyze code changes for security vulnerabilities. — score 35 Sources: github_trending

An AI-powered security review GitHub Action using Claude to analyze code changes for security vulnerabilities.

🟢 Grit: Rewriting Git in Rust with agents — score 31 Sources: hackernews

🟢 luongnv89/asm — The universal skill manager for AI coding agents. — score 29 Sources: github_trending

The universal skill manager for AI coding agents.

🟢 Current leading platform to build a personal assistant agent? — score 22 Sources: reddit/r/AIAgents

Hi all, I’m looking for advice on what platform would be the best to build a personal assistant agent on. Somewhere I can brain dump on all the time, keep it up to date with what me and my agency is working on and use as a master brain to then feed other agents in the future. Any advice is welcome.

Omitted 2 additional developer tools items from the main section; see raw data and source-specific sections below.

Business & Funding

🟢 What startup metrics mattered most during your fundraising process? — score 39 Sources: reddit/r/AIAgents

Investors often mention growth, retention, revenue, and market opportunity, but every startup seems different. For founders who recently raised capital, which metrics generated the most investor interest and questions during meetings?

🟢 Time Series Forecasting for Agriculture/Crop Volume & Pricing – Looking for Advice [D] — score 19 Sources: reddit/r/MachineLearning

Hi everyone, I work for a major berry company, and a large part of my role involves forecasting total industry crop volumes (weekly harvest/production forecasts) as well as future pricing. I'm relatively new to ML-based forecasting. This is only my second professional role, and I have a bachelor's d

🟢 What is the best 7b-12b coding model in 2026? — score 4 Sources: reddit/r/LocalLLaMA

Any practical advice, what are you guys using due to budget constraint, I cannot use 32 billion 27billion coding models.

Other Signals

🟢 The Gemini fake context alignment attack and why agents need a preview gate — score 22 Sources: reddit/r/AIAgents

A security disclosure last week showed that Gemini can be hijacked through a WhatsApp notification containing hidden multilingual instructions. The user received what looked like a regular WhatsApp notification. The text looked harmless. But the message included hidden multilingual instructions that

🟢 Anyone gotten Gemma 4 12B (unified audio) to actually attend to speech with a large system prompt? — score 21 Sources: reddit/r/LocalLLaMA

I'm trying to use Gemma 4 12B — the new encoder-free unified model (audio/vision/text in one) — for a one-pass audio → response voice assistant: feed the recorded WAV + system prompt and get the reply back as text directly, collapsing the separate ASR + LLM steps into a single model (TTS sti

🟢 AI Epistemic Risks: Emerging Mechanisms & Evidence [R] — score 20 Sources: reddit/r/MachineLearning

How will AI affect our ability to think and judge for ourselves? Our new paper co-authored by 30 experts explores epistemic risks—the threats AI poses to our collective capacity to form beliefs accurately, reason well, and maintain a healthy information environment. We look at how AI can lea

🟢 How do I start reviewing research papers in good conferences/journals? [R] — score 19 Sources: reddit/r/MachineLearning

I just finished my bachelors degree with 2 first-author papers in A-tier venues. I'm planning to start my PhD next year. I want to start reviewing papers (from my domain: OOD detection and Open-set problems) at similar venues. How do I get started? Most advice online just says to keep my open-review

🟢 Rich Sutton on AI creativity and discovery — score 19 Sources: hackernews

Omitted 2 additional other signals items from the main section; see raw data and source-specific sections below.

Repo	Description	Stars Today	Language
NVIDIA/SkillSpector	Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks.	280	python
maziyarpanahi/openmed	open-source healthcare ai	191	python
Ataraxy-Labs/sem	Semantic version control => entity-level diffs, blame, and impact analysis on top of git. 26 languages via tree-sitter. Built for coding agents.	110	rust
chroma-core/chroma	Search infrastructure for AI	48	rust
anthropics/claude-code-security-review	An AI-powered security review GitHub Action using Claude to analyze code changes for security vulnerabilities.	36	python
luongnv89/asm	The universal skill manager for AI coding agents.	24	typescript
cube-js/cube	📊 Cube Core is open-source semantic layer for AI, BI and embedded analytics	9	rust
wonderwhy-er/DesktopCommanderMCP	This is MCP server for Claude that gives it terminal control, file system search and diff file editing capabilities	5	typescript

📄 New Papers

Title	Category	Hotness	Link
Kwai Keye-VL-2.0 Technical Report	research_paper	128	Open
Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution	research_paper	75	Open
Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders	research_paper	8	Open
UniPET: a universal network for high-quality PET image denoising across varied dose reduction factors	research_paper	5	Open
U-TTT: Towards Generalizable PET Image Denoising via Test-Time Training	research_paper	5	Open
Business World Model	cs.AI	0	Open
Deployment-Time Memorization in Foundation-Model Agents	cs.AI	0	Open
Exploratory Responsiveness and Adaptive Rigidity under AI-Assisted Optimization	cs.AI	0	Open
Predictive Assistance and the Temporal Dynamics of Exploratory Compression	cs.AI	0	Open
From Senses to Decisions: The Information Flow of Auditory and Visual Perception in Multimodal LLMs	cs.AI	0	Open
Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents	cs.AI	0	Open
Minimalist Genetic Programming	cs.AI	0	Open
Regimes: An Auditable, Held-Out-Gated Improvement Loop Demonstrated on LongMemEval with ActiveGraph	cs.AI	0	Open
RealMath-Eval: Why SOTA Judges Struggle with Real Human Reasoning	cs.AI	0	Open
Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: From data to decisions: how LSEG is scaling trusted AI
OpenAI: How engineers at Nextdoor use Codex to build without limits
DeepMind: Fluid, natural voice translation with Gemini 3.5 Live Translate
DeepMind: Powering the future of robotics in Europe

🐦 Twitter/X Highlights

Account	Tweet Summary
GoogleDeepMind	Pinned: Say hello, hola, 你好 to Gemini 3.5 Live Translate: our latest audio model built for fast, cross-language communication. 🌐 Post
xai	Learn more about our work with @gopuff to build a personalized shopping assistant with chat, voice, and image models https://x.ai/news/grok-gopuff Post
karpathy	This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that qualitatively also, this is a major-version-bump-deserving step change forward (imo of the same ord Post
swyx	Mythos is live! so excited to have our FrontierCode recognized as the next frontier coding bench. on FC Diamond, BOTH Opus 4.8 and GPT 5.5 don't meaningfully scale with effort, which many of you caught yesterday. Mythos/Fable posttraining have really applied that test time compute toward solving ver Post

Repeated From Recent Briefings

mvanhorn/last30days-skill — AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary - first seen 2026-06-05
Panniantong/Agent-Reach — Give your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees. - first seen 2026-06-06
STOP racist posts about Chinese researchers [D] - first seen 2026-06-09
google/skills — Agent Skills for Google products and technologies - first seen 2026-06-09
Andyyyy64/whichllm — Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly. - first seen 2026-05-18
aaif-goose/goose — an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM - first seen 2026-05-07
CopilotKit/CopilotKit — The Frontend Stack for Agents & Generative UI. React, Angular, Mobile, Slack, and more. Makers of the AG-UI Protocol - first seen 2026-05-09
MemPalace/mempalace — The best-benchmarked open-source AI memory system. And it's free. - first seen 2026-06-06
MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism - first seen 2026-06-08
yikart/AiToEarn — Let's use AI to Earn! - first seen 2026-05-11
... plus 110 more repeated items in processed data

AI Watchtower Briefing — 2026-06-10

🔴 High Significance

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Business & Funding

Other Signals

📈 Trending Repos

📄 New Papers

🏢 Lab Blog Posts

🐦 Twitter/X Highlights

Repeated From Recent Briefings