AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Most of the software you rely on was hacked together fast — score 94 Sources: reddit/r/AIAgents

Shipped ugly, and only rebuilt properly once it actually mattered. Twitter launched on Ruby on Rails because a tiny team could move fast. Then its audience grew ~1,450% in a year (Nielsen clocked it at 1.2M 18.2M visitors) and Rails buckled. That's where the "fail whale" came from. Once demand was

Developer Tools

🔴 chopratejas/headroom — Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server. — score 89 Sources: github_trending

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

🔴 AI agents are genuinely weird to debug compared to everything else in ML — score 83 Sources: reddit/r/AIAgents

been poking at AI agents for a bit and the thing that caught me off guard wasn't building them, it was figuring out why they break. with a regular model something goes wrong, you have a place to look. wrong output, check your prompt, check your data, trace it back. with agents the failure shows up t

🔴 I Put a Datacenter GPU in My Gaming PC for £200 — score 77 Sources: reddit/r/LocalLLaMA

Hey there! I wrote a blogpost about my experience running local models on a V100 from a newbie perspective and got loads of views outside of reddit, so I thought I'd share it here too!

Research Papers

🔴 World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning — score 78 Sources: huggingface · arxiv/cs.CL

World models and multimodal large language models (MLLMs) provide complementary capabilities for predicting future outcomes from static visual observations. World models can generate concrete visual rollouts of possible futures, while MLLMs can reason abstractly over questions, goals, and rules. How

Other Signals

🔴 Minimax M3 appears to have no political censorship — score 90 Sources: reddit/r/LocalLLaMA

I'm currently working on a chinese/CCP AI bias benchmark, and this has stood out as an outlier. All the other Minimax models are censored as is typical for chinese LLMs.

🔴 I have become George Jetson: my job is now Yes/No supervision for a machine I don’t fully understand. — score 83 Sources: reddit/r/LocalLLaMA

🔴 Trump signs downsized AI order after weeks of reversals — score 83 Sources: hackernews

🔴 MiniMax dropped a new attention architecture. [N] — score 81 Sources: reddit/r/MachineLearning

It contains something interesting about context windows. They’re natively scaling to 1M tokens with MiniMax Sparse Attention (MSA), bypassing standard quadratic complexity by completely restructuring the memory access patterns at the operator level. Instead of relying on typical sparse approxima

🔴 Replaced Claude with local Qwen3.6-27B in my multi-agent orchestrator for 2 weeks — score 70 Sources: reddit/r/LocalLLaMA

For two weeks I ran my multi-agent orchestrator OpenYabby entirely on Qwen3.6-27B via Ollama, on a single 3090. The goal: see if a local model could replace Claude as the reasoning layer for the lead/manager/sub-agent loop. Here's where it worked and where i

🟡 Notable

Model Releases

🟡 Jun 3, 2026 Policy What we learned mapping a year’s worth of AI-enabled cyber threats — score 50 Sources: lab_blog/Anthropic

Jun 2, 2026 Announcements Expanding Project Glasswing Jun 1, 2026 Announcements Anthropic confidentially submits draft S-1 to the SEC May 28, 2026 Announcements Anthropic raises $65B in Series H funding at $965B post-money valuation May 28, 2026 Product Introducing Claude Opus 4.8 May 27, 2026 Annou

🟡 Jun 2, 2026 Announcements Expanding Project Glasswing — score 50 Sources: lab_blog/Anthropic

Jun 3, 2026 Policy What we learned mapping a year’s worth of AI-enabled cyber threats Jun 1, 2026 Announcements Anthropic confidentially submits draft S-1 to the SEC May 28, 2026 Announcements Anthropic raises $65B in Series H funding at $965B post-money valuation May 28, 2026 Product Introducing Cl

🟡 Jun 1, 2026 Announcements Anthropic confidentially submits draft S-1 to the SEC — score 50 Sources: lab_blog/Anthropic

Jun 3, 2026 Policy What we learned mapping a year’s worth of AI-enabled cyber threats Jun 2, 2026 Announcements Expanding Project Glasswing May 28, 2026 Announcements Anthropic raises $65B in Series H funding at $965B post-money valuation May 28, 2026 Product Introducing Claude Opus 4.8 May 27, 2026

🟡 May 28, 2026 Announcements Anthropic raises $65B in Series H funding at $965B post-money valuation — score 50 Sources: lab_blog/Anthropic

Jun 3, 2026 Policy What we learned mapping a year’s worth of AI-enabled cyber threats Jun 2, 2026 Announcements Expanding Project Glasswing Jun 1, 2026 Announcements Anthropic confidentially submits draft S-1 to the SEC May 28, 2026 Product Introducing Claude Opus 4.8 May 27, 2026 Announcements Anth

🟡 May 27, 2026 Announcements Anthropic opens Milan office to support Italian enterprise, research, and developers — score 50 Sources: lab_blog/Anthropic

Jun 3, 2026 Policy What we learned mapping a year’s worth of AI-enabled cyber threats Jun 2, 2026 Announcements Expanding Project Glasswing Jun 1, 2026 Announcements Anthropic confidentially submits draft S-1 to the SEC May 28, 2026 Announcements Anthropic raises $65B in Series H funding at $965B po

Omitted 5 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

🟡 HKUDS/Vibe-Trading — "Vibe-Trading: Your Personal Trading Agent" — score 68 Sources: github_trending

"Vibe-Trading: Your Personal Trading Agent"

🟡 How AI Voice Automation Is Being Used Across Healthcare, Real Estate & Local Services — score 67 Sources: reddit/r/AIAgents

Different industries in the US face the same core problem: missed calls = lost revenue. We tested AI voice systems across multiple verticals to evaluate adaptability. LuMay Voice Agent was deployed in scenarios including: Healthcare: * Appointment booking * Patient FAQ handling * Clinic sche

🟡 Travelers deploys AI-powered claims countrywide with OpenAI — score 50 Sources: lab_blog/OpenAI

Travelers built an AI-powered Claim Assistant with OpenAI to guide customers through filing claims, provide 24/7 support, and scale operations during peak demand.

🟡 @AnthropicAI: This Executive Order is an important step in strengthening America’s leadership in AI. We look forward to collaborating with the White House to support its implementation. https://www.whitehouse.go — score 50 Sources: twitter_rss

This Executive Order is an important step in strengthening America’s leadership in AI. We look forward to collaborating with the White House to support its implementation. https://www.whitehouse.gov/presidential-actions/2026/06/promoting-advanced-artificial-intelligence-innovation-and-security/

🟡 I built an AI agent because I was tired of missing important information at work — score 47 Sources: reddit/r/AIAgents

One thing nobody told me about being a psych RA is that a surprising amount of the job has nothing to do with actual research. When I first joined my lab, I imagined I'd spend most of my time helping with data collection, reading papers, and maybe learning some statistics. In reality, a huge part of

Omitted 2 additional developer tools items from the main section; see raw data and source-specific sections below.

Research Papers

🟡 Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation — score 60 Sources: huggingface

We propose Decoupled Residual Denoising Diffusion models (DRDD) for unified and data-efficient image-to-image (I2I) translation. While diffusion models have advanced I2I translation in terms of quality and diversity, we uncover a previously under-explored property in diffusion models. Crucially, bey

🟡 Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling — score 58 Sources: huggingface · arxiv/cs.CL

Test-time scaling improves the reasoning performance of large language models but incurs substantial cost in both total computation and latency. Existing adaptive sampling methods partially mitigate this issue by dynamically deciding when to stop sampling, yet they typically rely on heuristic rules

Other Signals

🟡 Nous Research — Hermes Desktop — score 57 Sources: reddit/r/LocalLLaMA

🟡 Microsoft Aion 1.0 Instruct and Aion 1.0 Plan models! — score 50 Sources: reddit/r/LocalLLaMA

Microsoft announced 2 new on-device models at Microsoft Build 2026. > Aion 1.0 Instruct: efficiency at scale. Aion 1.0 Instruct is our next-generation small language mod

🟡 How we index images for RAG — score 50 Sources: hackernews

🟢 Incremental

Model Releases

🟢 Another shout out to llama.cpp build b9455 2x3090 — score 37 Sources: reddit/r/LocalLLaMA

https://preview.redd.it/xyvtkzwr005h1.png?width=645&format=png&auto=webp&s=aebd5b5ef79255247c9bc91fb69d8423a0c61f86 As you guys know, the next highest quant is Unsloth's /Qwen3.6-27B-UD-Q8_K_XL.gguf. With llama.cpp before, i was getting 30-50 tk/s. vllm was kicking llama's ass with its

🟢 I open-sourced a multi-tenant agent memory framework — zero tokens, shared namespaces, self-improving loops — score 33 Sources: reddit/r/AIAgents

The problem it solves: langChain, CrewAI, AutoGen — they all have memory. But it dies when the process ends. Agents can't share state without message passing. Nothing persists across sessions or LLMs. What becomer-agents adds: Each agent gets a namespace: {task_id}.{role} `python # Researc

🟢 What memory system are you using for your agents? — score 30 Sources: reddit/r/LocalLLaMA

Are you using a specific third party memory system for your agents, like claude code but also Hermes and OpenClaw? Or are you using the memory system that ships with it? Curious to see if people here have made good experiences with third party memory systems such as Memo0 or Supermemory or any other

🟢 Holo3.1 35B/9B/4B/0.8B (Qwen 3.5 finetunes) — score 23 Sources: reddit/r/LocalLLaMA

from Hcompany (which seems to be a French company): # Holo3.1: Fast & Local Computer Use Agents # Model Description Holo3.1 is our latest family of Vision-Language Models (VLMs) for computer use agents. Building on Holo3, it expands support beyond browser and desktop automation to mobile env

🟢 Mellum & Granite Embedding models are ready on llama.cpp — score 10 Sources: reddit/r/LocalLLaMA

https://github.com/ggml-org/llama.cpp/pull/23966 https://github.com/ggml-org/llama.cpp/pull/22716 Use llama.cpp version.

Developer Tools

🟢 agent on your old android Phone — score 33 Sources: reddit/r/AIAgents

A fork of RikkaHub that turns the native Android LLM chat client into a real on-device agent: 80+ device tools, AI-authored workflows, scheduled jobs, an in-app browser the AI drives, SSH, screen automation, file manager, music player, voice transcription, dow

🟢 We Tested Multiple Healthcare AI Voice Agents — LuMay Was an Interesting Surprise — score 33 Sources: reddit/r/AIAgents

Over the past few months, we evaluated several AI voice agent platforms for healthcare use cases. Our evaluation criteria included: * Natural conversation quality * Appointment booking accuracy * Patient experience * Call reliability * Workflow automation * Integration flexibility One platform that

🟢 MTPAMI Survey Paper Length for submission time? [D] — score 19 Sources: reddit/r/MachineLearning

My paper is around 33 pages including but tpami guideline said it should be 20 pages Does anyone know which is correct? Its mistake it’s TPAMI

🟢 my agent saved the day! (maybe) — score 6 Sources: reddit/r/AIAgents

cool experience: during our morning briefing, my agent told me that i had a demo session booked with a potential client but the prospect had made a typo in their email address. my agent pointed out that maybe the

Infrastructure & Compute

🟢 Why do we benchmark quants on perplexity and prose but never on tool call validity? — score 17 Sources: reddit/r/LocalLLaMA

The mixed precision quant discussion here lately, MoE aware stuff that keeps shared experts and the edge layers at higher precision is great, but it's almost all measured against perplexity and general output quality. What I never see is structured output. Tool call JSON, function schemas, constrain

🟢 EricLBuehler/mistral.rs — Fast, flexible LLM inference — score 11 Sources: github_trending

Fast, flexible LLM inference

🟢 Backpropagation destroys V1 brain alignment in one epoch, tracking RSA alignment to fMRI across training for BP, FA, predictive coding, and STDP [R] — score 6 Sources: reddit/r/MachineLearning

Third in a series of papers tracking learning rules vs. human fMRI (THINGS dataset, V1–IT, N=3 subjects). Previous finding: untrained CNNs match backprop at V1. This paper asks: when does training break that, and does the learning rule matter? Setup: RSA alignment measured at 8 checkpoints (epoc

Business & Funding

🟢 ICML Conference Ticket (looking to purchase) [D] — score 31 Sources: reddit/r/MachineLearning

Hi everyone, I missed the ICML conference tickets because I was waiting for some travel funding confirmation and now they are sold out. Do you know any other ways I could still purchase one? There seems to be no waiting list… or if you know anyone who needs to cancel theirs, please let me know 🙏🏻

Research Papers

🟢 PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training — score 25 Sources: huggingface

We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision i

🟢 αDepth: Learning Single-Pass Soft Boundary Decomposition for Stereo Conversion — score 5 Sources: huggingface

Accurately modeling soft boundaries, e.g., hair and defocus blur, is a fundamental challenge in stereo conversion due to the ambiguous blending of foreground and background. Existing depth models primarily predict single-layer depth, leading to ambiguity in depth correspondence at soft boundaries. W

Other Signals

🟢 Calling it now Microsoft is buying Unsloth. — score 33 Sources: reddit/r/LocalLLaMA

I am going to be honest, I am leery of this new partnership with Unsloth. Microsoft historically hated open source, and this will not benefit the community in the end. It will look great at first. They will drop updates, play nice, and everyone will celebrate. But if you have been around the block,

🟢 U of T researchers demonstrate AI worm could target any online device — score 17 Sources: hackernews

🟢 Major labs timeshift between the research they publish on Arxiv and implementation in models — score 3 Sources: reddit/r/LocalLLaMA

Hi guys, just wanted to ask if you know whether if Google Deepmind publishes an interesting paper on Arxiv on RL then it means it already is implemented in 3.5 flash and is gonna be implemented in 3.5 pro, or not? Basically, do these huge players publish before they test it at large scale or only af

🟢 [Project update] Dunetrace: live monitoring of production AI Agents — score 3 Sources: reddit/r/AIAgents

I have been working on Dunetrace, an open-source tool for live monitoring of AI Agents. Here is the latest updates since the last post: * MCP server: Claude Code / Cursor / Codex can now query your agent directly inside the IDE. * Runtime Policy Engine: You can now set guardrails that fire m

Repo	Description	Stars Today	Language
chopratejas/headroom	Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.	1265	python
HKUDS/Vibe-Trading	"Vibe-Trading: Your Personal Trading Agent"	221	python
JCodesMore/ai-website-cloner-template	Clone any website with one command using AI coding agents	118	typescript
EricLBuehler/mistral.rs	Fast, flexible LLM inference	23	rust

📄 New Papers

Title	Category	Hotness	Link
World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning	research_paper	17	Open
Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation	research_paper	10	Open
Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling	research_paper	7	Open
Visual Graph Scaffolds for Structural Reasoning in Large Language Models	cs.AI	0	Open
AURA: Action-Gated Memory for Robot Policies at Constant VRAM	cs.AI	0	Open
Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins	cs.AI	0	Open
BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces	cs.AI	0	Open
ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning	cs.AI	0	Open
Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection	cs.AI	0	Open
An Exploration of Collision-based Enemy Morphology Generation	cs.AI	0	Open
Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models	cs.AI	0	Open
Toward a Modular Architecture for Embedded AI Agent Systems at the Edge	cs.AI	0	Open
Don't Gamble, GAMBLe: An Analytical Framework for AI-Driven Research Systems	cs.AI	0	Open
When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning	cs.AI	0	Open
Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks	cs.AI	0	Open

🏢 Lab Blog Posts

Anthropic: Jun 3, 2026 Policy What we learned mapping a year’s worth of AI-enabled cyber threats
Anthropic: Jun 2, 2026 Announcements Expanding Project Glasswing
Anthropic: Jun 1, 2026 Announcements Anthropic confidentially submits draft S-1 to the SEC
Anthropic: May 28, 2026 Announcements Anthropic raises $65B in Series H funding at $965B post-money valuation
Anthropic: May 27, 2026 Announcements Anthropic opens Milan office to support Italian enterprise, research, and developers
Anthropic: May 26, 2026 Announcements Anthropic appoints KiYoung Choi as Representative Director of Korea ahead of Seoul office opening
Anthropic: May 25, 2026 Announcements Anthropic co-founder Chris Olah's remarks on Pope Leo XIV's encyclical "Magnifica humanitas"
Anthropic: May 19, 2026 Announcements Widening the conversation on frontier AI
OpenAI: Travelers deploys AI-powered claims countrywide with OpenAI

🐦 Twitter/X Highlights

Account	Tweet Summary
AnthropicAI	This Executive Order is an important step in strengthening America’s leadership in AI. We look forward to collaborating with the White House to support its implementation. https://www.whitehouse.gov/presidential-actions/2026/06/promoting-advanced-artificial-intelligence-innovation-and-security/ Post
AnthropicAI	We’re expanding Project Glasswing. We’ve extended access to Claude Mythos Preview to approximately 150 additional organizations, based in more than fifteen countries. Read more about this expansion and our future plans for Project Glasswing: https://www.anthropic.com/news/expanding-project-glasswing Post
GoogleDeepMind	Pinned: We believe AI can be a dedicated research partner to help discover the next breakthrough. Enter Co-Scientist: our latest Gemini-based multi-agent system that can generate, debate and evolve novel hypotheses for complex scientific problems 🧵 Post

Repeated From Recent Briefings

Stop asking what model to run. There are literally only two. - first seen 2026-06-02
nesquena/hermes-webui — Hermes WebUI: The best way to use Hermes Agent from the web or from your phone! - first seen 2026-06-01
A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL - first seen 2026-06-02
Browse CVPR 2026 papers on PapersWithCode [P] - first seen 2026-06-02
farion1231/cc-switch — A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io - first seen 2026-05-08
harry0703/MoneyPrinterTurbo — 利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM. - first seen 2026-05-28
TauricResearch/TradingAgents — TradingAgents: Multi-Agents LLM Financial Trading Framework - first seen 2026-05-02
supermemoryai/supermemory — Memory engine and app that is extremely fast, scalable. The Memory API for the AI era. - first seen 2026-06-01
dmtrKovalenko/fff — The fastest and the most accurate file search toolkit for AI agents, Neovim, Rust, C, and NodeJS - first seen 2026-05-20
AutoMedBench: Towards Medical AutoResearch with Agentic AI Models - first seen 2026-06-02
... plus 139 more repeated items in processed data

AI Watchtower Briefing — 2026-06-03

🔴 High Significance

Model Releases

Developer Tools

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Business & Funding

Research Papers

Other Signals

📈 Trending Repos

📄 New Papers

🏢 Lab Blog Posts

🐦 Twitter/X Highlights

Repeated From Recent Briefings