AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Qwen Who? DiffusionGemma running at 1,500 tk/s on a Digital Pregnancy Test. — score 96 Sources: reddit/r/LocalLLaMA

First Doom, now DiffusionGwmma 4. We are truly living in the future. Who even needs a new Qwen release anymore? /s (Satire - Shaq doesn’t actually make a digital pregnancy test capable of running diffusion-based LLMs) Credit to Obvious Plant for the original Shaq pregnancy test box (that I doctored

🔴 Gemma 4 Quadruple Release, 12B, 12B QAT, 26B-A4B QAT and 31B QAT Uncensored Heretics! — score 89 Sources: reddit/r/LocalLLaMA

gemma-4-31B-it-qat-q4_0-unquantized-uncensored-heretic: Safetensors: https://huggingface.co/llmfan46/gemma-4-31B-it-qat-q4_0-unquantized-uncensored-heretic GGUF: [https://huggingface.co/llmfan46/gemma-4-3

🔴 Claude Fable is relentlessly proactive — score 70 Sources: hackernews

Developer Tools

🔴 We spent decades fixing software deployment. Why are we letting AI agents break it all over again? — score 94 Sources: reddit/r/AIAgents

I’ve been spending a lot of time setting up multi-agent workflows lately, and I can’t shake the feeling that we are aggressively re-inventing a bunch of structural problems that software engineering spent thirty years solving. it kinda feels like business bro's are creating a problem so that they ca

🔴 AI agent bankrupted their operator while trying to scan DN42 — score 90 Sources: hackernews

🔴 hexo-ai/sia — SIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task. — score 76 Sources: github_trending

SIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task.

🔴 Can you realistically start an automation business without a lot of money? — score 72 Sources: reddit/r/AIAgents

I've been thinking about getting into business automation, but most of the content I see makes it sound like you need a bunch of paid tools, subscriptions, software, ads, and a whole setup before you can even get started. For those of you who actually do automation for clients: Can someone start wit

Research Papers

🔴 HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers — score 82 Sources: huggingface · arxiv/cs.AI

Holistic visual tokenizers are fundamental to unified multimodal models (UMMs) as they map diverse visual inputs into a unified representation space. In this paper, we present HYDRA-X, the first UMM that unifies image and video tokenization within a single Vision Transformer (ViT). Our design is dri

🔴 From 2D Grids to 1D Tokens: Reforming Shared Representations for Multimodal Image Fusion — score 75 Sources: huggingface

Multimodal image fusion aims to integrate complementary information from different modalities into a fused image that preserves rich local details while maintaining globally consistent appearance. Existing approaches build shared representations on 2D feature grids, which excel at modeling local str

Other Signals

🔴 What models you guys running on 8GB? 16GB VRAM? 24GB? 32GB? 48GB? — score 82 Sources: reddit/r/LocalLLaMA

And what are you using for kv cache and context? What kind of performance are you getting? What is your hardware? And what are you using your models for? I figure with how fast everything moves, its worth asking once in a while to congeal our experiences.

🔴 New models released: Nex-N2 Pro 397B and Nex-N2 Mini 35B — score 75 Sources: reddit/r/LocalLLaMA

They are FTs of Qwen3.5 and the benchmarks look pretty good https://huggingface.co/nex-agi/Nex-N2-mini https://huggingface.co/nex-agi/Nex-N2-Pro

🟡 Notable

Model Releases

🟡 I distilled my 12 year experience as a product manager and built a free skill that takes you from "I have an app idea" to a real plan and solid MVP — score 63 Sources: reddit/r/AIAgents

I'm a PM. 12 years, mostly zero-to-one. I built a free skill that does the part of app-building everyone skips and then regrets. It's called vibe-check. Open-source, drops into Claude, Codex, or Antigravity. It doesn't write your code. AI does that now. It does the harder thing that comes before the

🟡 EAGLE3 has landed in llama.cpp — score 61 Sources: reddit/r/LocalLLaMA

After half a year of development, EAGLE3 has been merged into llama.cpp. EAGLE3 is similar to MTP, but different: the helper model gets extra guidance from the main model instead of guessing completely on its own.

🟡 is Gemini your main AI model today, or just a secondary option — score 61 Sources: reddit/r/AIAgents

I recently had a discussion with a friend who strongly prefers Gemini and Google products in general , his argument is that Google has access to massive amounts of data and arguably the best search engine in the world, so Gemini should have a significant advantage my opinion and experience has been

🟡 PSA: Test your "threads" argument in llama.cpp (+80% performance in my case) — score 54 Sources: reddit/r/LocalLLaMA

When GPT-OSS 120B has released last year I played around and tried to maximize it's performance. One thing that many people pointed out was that for hybrid CPU (Performance + Efficiency cores) you should use only P-cores with "--threads" argument and taskset/affinity. Back then I've setup that model

🟡 Anthropic apologizes for invisible Claude Fable guardrails — score 50 Sources: hackernews

Omitted 5 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

🟡 I put a hidden instruction in a document. My AI agent followed it. Here’s the repo. — score 50 Sources: reddit/r/AIAgents

Cloned a repo, ran an agent against a “research report,” watched it comply with instructions embedded in the document instead of summarizing it. The attack is in the repo. Run it yourself. Then run the protected version with Arc Gate and watch it get blocked. https://github.com/9hannahnine-jpg/vulne

🟡 @OpenAI: We heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business — score 50 Sources: twitter_rss

We heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business users with one free reset:

🟡 @xai: Install the @sentry plugin and ask your agent to find and fix errors, analyze stack traces, and triage alerts — score 50 Sources: twitter_rss

Install the @sentry plugin and ask your agent to find and fix errors, analyze stack traces, and triage alerts

🟡 @xai: Use the @vercel plugin to deploy to production, spin up sandboxes, or build apps with Shadcn. — score 50 Sources: twitter_rss

Use the @vercel plugin to deploy to production, spin up sandboxes, or build apps with Shadcn.

Research Papers

🟡 Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback — score 65 Sources: huggingface

Despite generating increasingly photorealistic images, text-to-image (T2I) models still exhibit localized, subtle, and structurally complex failures. Diagnosing these failures requires instance-level feedback that answers where a defect occurs, what type it is, why it is defective, and its importanc

🟡 ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages — score 45 Sources: huggingface · arxiv/cs.AI

Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, especially in multilingual and low-resource scenarios. This gap is critical in regions like rural India, where pa

Other Signals

🟡 Is Symbolic Regression still a thing, given LLMs' performance? [D] — score 69 Sources: reddit/r/MachineLearning

I've been teaching myself about Symbolic Regression (SR), which looks like a super exciting field. (A great intro resource below [1]). But then I was wondering: given LLMs' increasingly-growing power in generating code, which is in a way very similar to Symbolic Regression (or of course, even dire

🟡 Having some fun with LMX-Omni-52B-Halo in Open WebUI — score 68 Sources: reddit/r/LocalLLaMA

🟡 @GoogleDeepMind: Pinned: We’re teaming up @Palmeiras, the first football club to meaningfully build upon TacticAI: our AI system that can help simulate field scenarios and predict open play dynamics up to 8 seconds in — score 50 Sources: twitter_rss

Pinned: We’re teaming up @Palmeiras, the first football club to meaningfully build upon TacticAI: our AI system that can help simulate field scenarios and predict open play dynamics up to 8 seconds in advance. ⚽

🟡 Huawei Released openPangu 2.0 (Will open source on June 30) — score 46 Sources: reddit/r/LocalLLaMA

At the Huawei Developer Conference (HDC 2026) held on June 12, Richard Yu, Executive Director of Huawei, officially launched the brand-new, open-source Pangu large model—openPangu 2.0. The model is fully adapted to the HarmonyOS ecosystem and has achieved deep optimization and performance breakthrou

🟡 Post-docs in ML [D] — score 44 Sources: reddit/r/MachineLearning

Are there any websites listing post-doc job opening in machine learning? Currently I'm using LInkedIn to search for these. When I was a math post-doc, everyone used "MathJobs.org" to find jobs. Is there a similar website for machine learning? Thanks.

🟢 Incremental

Model Releases

🟢 Are AI agents making traditional software interfaces obsolete? — score 33 Sources: reddit/r/AIAgents

i was reading an enterprise tech trend report for 2026 and it got me thinking about how quickly the traditional SaaS GUI (graphical user interface) is losing its utility. for the last fifteen years, software design has been about building pretty, siloed dashboards. we’ve built our entire workflows a

🟢 Claude Fable 5: mid-tier results on coding tasks — score 30 Sources: hackernews

🟢 🚀PP-OCRv6 is officially released ! — score 25 Sources: reddit/r/LocalLLaMA

🔥PaddleOCR’s new OCR model series scales from 1.5M to 34.5M parameters, bringing stronger accuracy, faster inference, and broader deployment options — from browsers and edge devices to servers. 📊What’s new: 🔸Tiny / Small / Medium models: 1.5M, 7.7M, 34.5M params 🔸+4.9% detection accuracy and +5.1% r

🟢 Has anyone noticed that the behavior of the Kimi model has changed? — score 11 Sources: reddit/r/LocalLLaMA

I have been using Kimi K2.6 in Kimi Code for a while. Although it can complete most tasks, it often requires a long time to think and try. Today the model's CoT has become very short and concise, and it feels much improved on coding tasks compared to before I heard that GLM 5.2 is also about to be r

Developer Tools

🟢 mlflow/mlflow — The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data. — score 31 Sources: github_trending

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

🟢 always-further/nono — Capability-based agent runtime with fine-grained policies . Brokering access directly within the agent's operating context, with zero setup and zero latency — score 18 Sources: github_trending

Capability-based agent runtime with fine-grained policies . Brokering access directly within the agent's operating context, with zero setup and zero latency

🟢 How would you start selling automations? Where would you even begin? — score 17 Sources: reddit/r/AIAgents

I’m getting into building automations for businesses, but I’m a bit stuck on the first step. Like, I can imagine building solutions for repetitive work, internal processes, data entry, reporting, customer stuff, etc… but I don’t really know how people actually start selling this. So I’m curious: If

🟢 anthropics/claude-agent-sdk-python — score 13 Sources: github_trending

🟢 Building an Open Source Edge Semantic Cache for LLMs in Rust/WASM – Sanity check on the architecture? [D] — score 12 Sources: reddit/r/MachineLearning

Hey everyone, I am planning out a new open-source infrastructure project and want to get some brutal feedback on the architecture and use-case validity from people running high volume LLM workloads in production. The Problem: Python-based proxies/gateways introduce too much latency overhead for

Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.

Research Papers

🟢 Revisiting Articulated Parts Perception in Robot Manipulation — score 20 Sources: huggingface

We are surrounded by various objects with movable, articulated parts, e.g., box, handle, door. An accurate and generalizable perception of articulated parts is essential to enhance robotic manipulation capabilities. Building on this need, recent efforts in articulated parts perception have followed

🟢 Leveraging Morphology for Historical Script Metrological Analysis — score 20 Sources: huggingface

Advances in handwritten text recognition have enabled large-scale transcription of historical documents, but still provide limited access to interpretable visual measurements for paleography, the study of historical scripts. In this paper, our main insight is that morphological script analysis, in p

Other Signals

🟢 Best LLM for smut stories — score 39 Sources: reddit/r/LocalLLaMA

I'm trying to find the best LLM for writing erotica/smut, but there doesn't seem to be that many good models right now. I'm using Cydonia 24B v4.3, which gives great results, but I was wondering if there were even better models that could fit into 16GB VRAM with quantization. Sadly there doesn't see

🟢 Spent $3 running 4x4090 benchmarks for llama 3 70b (exl2 vs gguf). exl2 generation speed is kind of ridiculous. — score 33 Sources: reddit/r/AIAgents

Hey guys, so I wanted to run some heavy benchmarks comparing GGUF and EXL2 for Llama-3-70B on a 4x4090 setup. single card data is everywhere but 4 way tensor parallel stats are hard to find . The problem is I dont own a 4x4090 rig and normally renting one would immediately eat into my monthly budget

🟢 LLM context compression at 16x beats KV cache — score 32 Sources: reddit/r/LocalLLaMA

🟢 MICCAI 2026 Results [D] — score 31 Sources: reddit/r/MachineLearning

Results are almost here. Good luck to everyone waiting for the final decision 🙂

🟢 Why hasn't any mainstream game integrated LLMs into NPCs yet? — score 18 Sources: reddit/r/LocalLLaMA

tech demos exist but nothing's actually shipped in a real game. Is it a latency problem or are game studios just not interested~

Omitted 3 additional other signals items from the main section; see raw data and source-specific sections below.

Repo	Description	Stars Today	Language
hexo-ai/sia	SIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task.	199	python
mlflow/mlflow	The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.	24	python
always-further/nono	Capability-based agent runtime with fine-grained policies . Brokering access directly within the agent's operating context, with zero setup and zero latency	12	rust
anthropics/claude-agent-sdk-python		10	python

📄 New Papers

Title	Category	Hotness	Link
HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers	research_paper	20	Open
From 2D Grids to 1D Tokens: Reforming Shared Representations for Multimodal Image Fusion	research_paper	10	Open
Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback	research_paper	7	Open
ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs	cs.AI	0	Open
Arbor: Tree Search as a Cognition Layer for Autonomous Agents	cs.AI	0	Open
Strategic Decision Support for AI Agents	cs.AI	0	Open
Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation	cs.AI	0	Open
PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation	cs.AI	0	Open
"Did you lie?" Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms	cs.AI	0	Open
TrajGenAgent: A Hierarchical LLM Agent for Human Mobility Trajectory Generation	cs.AI	0	Open
Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents	cs.AI	0	Open
From AGI to ASI	cs.AI	0	Open
Deployment-Centered Evaluation: Predicting Query-Level Rejection Risk in a Clinical LLM System	cs.AI	0	Open
Definitional alignment before capability alignment: a Design-Science framework for adjudicating claims about AGI	cs.AI	0	Open
The Theory of Mind Utility: Formal Specification of a Mentalizing Mechanism	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: How Preply combines AI and human tutors to personalize learning

🐦 Twitter/X Highlights

Account	Tweet Summary
AnthropicAI	We’re launching Claude Corps, a national fellowship program matching people early in their careers with US nonprofits. We'll teach 1,000 people to use Claude, and pay them to use AI to advance their hosts’ missions. https://www.anthropic.com/claude-corps Post
OpenAI	We heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business users with one free reset: Post
GoogleDeepMind	Pinned: We’re teaming up @Palmeiras, the first football club to meaningfully build upon TacticAI: our AI system that can help simulate field scenarios and predict open play dynamics up to 8 seconds in advance. ⚽ Post
GoogleDeepMind	When millions of AI agents interact with each other, new collective behaviors can emerge. 🌐 Together with @schmidtsciences, @coop_ai, @ARIA_research and supported by @GoogleOrg, we’re launching a $10M research fund to help understand how AI systems behave as a group. → https://goo.gle/3Si6rCl Post
xai	Install the @sentry plugin and ask your agent to find and fix errors, analyze stack traces, and triage alerts Post
xai	Use the @vercel plugin to deploy to production, spin up sandboxes, or build apps with Shadcn. Post
simonw	After two days with Claude Fable 5 the best way I can describe it is "relentlessly proactive" - here's an example where I dropped in a screenshot of a bug and it span up custom CORS Python servers and used pyobjc-framework-Quartz to capture screenshots https://simonwillison.net/2026/Jun/11/fable-is- Post
simonw	New Datasette release: 1.0a33, which finally brings documents the ?_extra= JSON API mechanism and brings it to the row and query pages in addition to the table pages (Most of the code in this release was built with the help of Claude Fable 5) https://datasette.io/blog/2026/api-extras/ Post

Repeated From Recent Briefings

mvanhorn/last30days-skill — AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary - first seen 2026-06-05
Anthropic's new model Fable will silently handicap work on LLMs [D] - first seen 2026-06-11
anomalyco/opencode — The open source coding agent. - first seen 2026-05-09
FareedKhan-dev/train-llm-from-scratch — A straightforward method for training your LLM, from downloading data to generating text. - first seen 2026-06-11
maziyarpanahi/openmed — open-source healthcare ai - first seen 2026-06-10
activeloopai/hivemind — One brain for all your agents - first seen 2026-06-11
VIA-SD: Verification via Intra-Model Routing for Speculative Decoding - first seen 2026-06-11
NVIDIA/SkillSpector — Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks. - first seen 2026-06-10
yikart/AiToEarn — Let's use AI to Earn! - first seen 2026-05-11
karpathy/autoresearch — AI agents running research on single-GPU nanochat training automatically - first seen 2026-05-21
... plus 419 more repeated items in processed data

AI Watchtower Briefing — 2026-06-12

🔴 High Significance

Model Releases

Developer Tools

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Research Papers

Other Signals

📈 Trending Repos

📄 New Papers

🏢 Lab Blog Posts

🐦 Twitter/X Highlights

Repeated From Recent Briefings