AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 A rare look inside Qwen 3.7’s open source model release approval process: — score 89 Sources: reddit/r/LocalLLaMA

For real tho, 9b, 27b, 122b, I don’t really care at this point, just show us that you still love us. EDIT: I guess I gotta use /s on my posts from now on. Nobody appreciates a good sarcatic shitpost anymore clearly. I love Qwen and all our brothers and sisters in the east. I kid them because I love

🔴 Okay 27B made me a believer — score 75 Sources: reddit/r/LocalLLaMA

I previously hated on this model, but I have just been impressed by it, and I understand the hype now. I have been working on a HTML5 game console and I decided to see if Qwen3.6 27B can handle making some quick games in it to showcase functionality (save games, console API handling for stat trackin

🔴 What have you built with Claude Code just for yourself? — score 75 Sources: reddit/r/AIAgents

I've been curious about using Claude Code for a personal side project — not for work, just something for myself. Has anyone here done this? I'd love to know: - What did you build? - What resources or references helped you get started? - What was the most important thing you learned along the way?

Developer Tools

🔴 Stop traumatizing AI into loops and turn hallucinations into an honest "I don't know!" by being NICE to them (Proof of Concept, Research, I don't want to sell anything) — score 82 Sources: reddit/r/LocalLLaMA

TL;DR Some AI behavior reminded me of ADHD/Trauma Response (thought loops, task paralysis...) and I laughed it off at first. Then I treated it like my neurodivergent friends: give em some slack. And just like that, the thought loops stopped, response was fast, the answers correct most of the time AN

🔴 NangoHQ/nango — Build product integrations with AI. — score 79 Sources: github_trending

Build product integrations with AI.

🔴 Hmbown/CodeWhale — DeepSeek v4 coding agent in terminal — score 77 Sources: github_trending

DeepSeek v4 coding agent in terminal

🔴 Asked my AI to move its cron job to a different channel yesterday but guess what it did... — score 75 Sources: reddit/r/AIAgents

https://preview.redd.it/j8fyo4renm3h1.png?width=1674&format=png&auto=webp&s=3214a968bb5bfe203a273ade017d08b77c8ced51 So I have this cron job who gives me report every 10 AM but the channel where my agent was supposed to send its report got lost. Here's how my setup looks like: * I use Te

Infrastructure & Compute

🔴 [D] Where do you go for serious AI research discussion online? [D] — score 74 Sources: reddit/r/MachineLearning

Looking for communities where people actually dig into ML/AI research, not hype, not "look what I built with an LLM API," but discussions about papers, training dynamics, debugging real models, infra problems, that kind of thing. I'm specifically interested in places where you can post something lik

Research Papers

🔴 Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling — score 78 Sources: huggingface · arxiv/cs.CL

Test-Time Scaling (TTS) enhances the reasoning capabilities of large language models by allocating additional inference compute to explore the solution space. However, existing parallel TTS methods typically keep branches isolated during search: intermediate discoveries remain branch-private and can

Other Signals

🔴 Outsourcing plus local AI will soon become more economical vs. frontier labs — score 75 Sources: hackernews

🟡 Notable

Model Releases

🟡 $400 Qwen 3.6-27B Setup - Dual RTX 3060 - 30-50 t/s — score 54 Sources: reddit/r/LocalLLaMA

I picked up a 7900 XTX earlier which runs qwen3.6-27b fine, but not to my like. Its compute performance is quite unstable for me. With MTP the decode speed can reach 40-60 t/s, but prefill is just too slow. Regardless of whether I used ROCm or Vulkan, the prefill speed varies between 300t/s and 500

🟡 @AnthropicAI: New on the Engineering Blog: The access and permissions we grant agents should evolve with their capabilities. In our own products, we set these parameters through sandboxing, which limits the scope o — score 50 Sources: twitter_rss

New on the Engineering Blog: The access and permissions we grant agents should evolve with their capabilities. In our own products, we set these parameters through sandboxing, which limits the scope of any potentially destructive actions. Read more: https://www.anthropic.com/engineering/how-we-conta

🟡 @GoogleDeepMind: Our Gemini for Science tools could help scientists unlock their next breakthrough. 🧬 — score 50 Sources: twitter_rss

Our Gemini for Science tools could help scientists unlock their next breakthrough. 🧬

🟡 @xai: Thank you so much for all the feedback on the Grok Build Beta. Some of you reported hitting limits quickly. Our team found areas to improve caching, so we've reset Grok Build usage limits for all acc — score 50 Sources: twitter_rss

Thank you so much for all the feedback on the Grok Build Beta. Some of you reported hitting limits quickly. Our team found areas to improve caching, so we've reset Grok Build usage limits for all accounts. Please keep sharing feedback - the team is here to help.

Developer Tools

🟡 [P] Built a portable GPU ISA after reading too many architecture manuals [P] — score 69 Sources: reddit/r/MachineLearning

I’ve been reading GPU architecture docs in my free time. NVIDIA PTX, AMD ISA reference guides, Intel Xe, reverse-engineered Apple GPU stuff. Over 5,000 pages across 16 microarchitectures. After a while you notice all four vendors are doing the same 11 things with different names. So I wrote a spec t

🟡 thedotmack/claude-mem — Persistent Context Across Sessions for Every Agent – Captures everything your agent does during sessions, compresses it with AI, and injects relevant context back into future sessions. Works with Claude Code, OpenClaw, Codex, Gemini, Hermes, Copilot, OpenCode + More — score 66 Sources: github_trending

Persistent Context Across Sessions for Every Agent – Captures everything your agent does during sessions, compresses it with AI, and injects relevant context back into future sessions. Works with Claude Code, OpenClaw, Codex, Gemini, Hermes, Copilot, OpenCode + More

🟡 p-e-w/heretic — Fully automatic censorship removal for language models — score 62 Sources: github_trending

Fully automatic censorship removal for language models

🟡 Turning local agents into self-optimizing agents — score 61 Sources: reddit/r/LocalLLaMA

I was experimenting with a self-optimizing agentic pipeline to climb the benchmark leaderboard (TerminalBench). On a 10-task subset, I got the performance to rise from ~30% → ~90%. That loop worked, so I asked: can the same reflect-and-rewrite step run continuously against everyday chats instead o

🟡 @GoogleDeepMind: SynthID has already watermarked over 100 billion pieces of content, but transparency is a team sport. That’s why we’re partnering with @OpenAI, @ElevenLabs and Kakao to add SynthID watermarking to th — score 50 Sources: twitter_rss

SynthID has already watermarked over 100 billion pieces of content, but transparency is a team sport. That’s why we’re partnering with @OpenAI, @ElevenLabs and Kakao to add SynthID watermarking to their models – accelerating the industry-wide momentum we started with @NVIDIA.

Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.

Research Papers

🟡 Rethinking VLM Representation for VLA Initialization — score 65 Sources: huggingface

Vision-Language-Action (VLA) models widely adopt pretrained Vision-Language Models (VLMs) as policy backbones, yet it remains unclear what kind of pretrained VLM representation is useful as a VLA initialization. In this paper, we study VLA initialization as a controlled representation-design problem

🟡 VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions — score 62 Sources: huggingface · arxiv/cs.AI

Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks. Effective collaboration in such settings increasingly depends on understanding the user beyond what is explicitly stated, as user intent is often reflected in fragmented daily interacti

🟡 Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments — score 45 Sources: huggingface · arxiv/cs.AI

Recent advances in large language models (LLMs) have facilitated the widespread deployment of LLMs as interactive agents capable of reasoning, planning, and tool use. Despite strong performance on existing benchmarks, such agents often exhibit notable degradation when deployed in real-world settings

Other Signals

🟡 China Clamps Down on Overseas Travel for AI Talent at Alibaba, DeepSeek — score 68 Sources: reddit/r/LocalLLaMA

Big, if true. Doesn't bode well for research / OS models out of China.

🟡 Testing AI agents where prompt injection turns into actions — score 64 Sources: reddit/r/AIAgents

I am building RedThread, an open-source CLI for repeatable AI-agent red-team campaigns. Repo: https://github.com/matheusht/redthread Demo campaign result: 3 runs, 33.3% ASR, one SUCCESS, one PARTIAL, one FAILURE. The agent problem I am focused on is the action boundary: untrusted text influencing to

🟡 [R]GNN Model For Fraud Detection Isn't Performing Well[R] — score 56 Sources: reddit/r/MachineLearning

We're writing a research paper on explainable fraud detection GNN model and in the first step we're creating a basic Graph Neural Network for that. We're using the most famous dataset available on this topic i.e IEEE CIS Fraud Detection Dataset and implemented all necessary feature engineering on th

🟡 New DeepSWE benchmark finds Claude Opus cheats — score 46 Sources: reddit/r/LocalLLaMA

Sadly the open models seem far behind.

🟢 Incremental

Model Releases

🟢 Cactus Hybrid Router: Gemma4-2B can match Gemini-3.1-Flash-Lite by routing 15-55% of tasks to Gemini And Running The Rest Locally. — score 39 Sources: reddit/r/LocalLLaMA

Last week, we announced the “Simple Attention Network” and trained Needle, a 26m function call model that beats models 10-25x its size. Some LocalLlama Redditors asked if we could use make a router model. We now built “Cactus Hybrid Router”, a 65k parameter model that decodes on the fly when to comp

🟢 Folks running qwen 3.6 27b for agentic work. Do you dare to use q4_k_m? — score 32 Sources: reddit/r/LocalLLaMA

I dont have good experience running q4_k_m, the difference to q6 is "a few errors an hour" to " a few errors every couple of days". Edit: How it fails? Just like user DifficultDog8435 and FullstackSensei explained in the comments. They worded it better than me.

🟢 How I gave my AI agent real-time Reddit awareness in ~20 lines of Python using MCP — score 31 Sources: reddit/r/AIAgents

One thing that's been consistently annoying when building LLM agents: they reason well, but they're blind to anything that happened after their training cutoff. For tasks like market monitoring, trend detection, or product research, that's a real problem. I solved this by wiring up a Reddit MCP serv

🟢 AI agent for WhatsApp & Telegram — score 31 Sources: reddit/r/AIAgents

Officially announcing the launch of our AI Agent for WhatsApp and Telegram – a solution designed to help businesses automate communication, improve response speed, and scale customer interactions directly inside messaging apps. But in real life it changes the game! Messaging platforms have become th

🟢 Multi-Agent Parallelism and State Sovereignty in Agent Fleets — score 31 Sources: reddit/r/AIAgents

Most agent frameworks handle task dispatching sequentially: launching one agent process, blocking the execution loop, waiting for an exit code, and then moving to the next task. If an agent crashes, hits a rate limit, or takes 20 minutes on a complex codebase, your entire workspace freezes. With the

Omitted 5 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

🟢 Recently i setupped my Hermes Agent persona and i think i crushed it — score 31 Sources: reddit/r/AIAgents

I’ve been tweaking my Hermes Agent setup (persona, profiles, the works), and I think I finally made it perfect. i wanted to create a perfect personalized ai Well, it worked... maybe a little too well. The Conflict: I created a custom skill called prompt-engineer i tried to test it in a chat i

🟢 alpic-ai/skybridge — Skybridge is a full-stack TypeScript framework for MCP Apps and ChatGPT Apps. Type-safe. React-powered. Platform-agnostic. — score 27 Sources: github_trending

Skybridge is a full-stack TypeScript framework for MCP Apps and ChatGPT Apps. Type-safe. React-powered. Platform-agnostic.

🟢 What to use for Sign Language Recognition [R] — score 25 Sources: reddit/r/MachineLearning

Hi everyone, I'm finishing up my proposal for my undergraduate thesis for computer science on sign language recognition, specifically Filipino Sign Language and i want to ask what architecture to use for my methodology that is best, rn im considering Mediapipe Holistic + Transformers or Media Pipe H

🟢 modelscope/FunASR — Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API. — score 18 Sources: github_trending

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

🟢 A Tiny Open-Source Self-Driving AI That Runs on a Phone [P] — score 6 Sources: reddit/r/MachineLearning

https://preview.redd.it/ww14mzr2fm3h1.png?width=1890&format=png&auto=webp&s=79873d47ae79c7815ca3e7e91fd43141632174f5 https://www.youtube.com/watch?v=rr_uS4bf0B4&feature=youtu.be trained a 7MB open-source L4 self-dri

Infrastructure & Compute

🟢 Intel b60 48gb? — score 18 Sources: reddit/r/LocalLLaMA

2k AUD for a 48gb card, it’s certainly lodged itself into my brain. But there’s very little in this sub about the intel cards; a post from a quarter of a year ago saying to avoid them, but thats also a lifetime in this sphere. Are they really that bad? Surely my little 3060 can’t be better at infere

Research Papers

🟢 Learning High-Frequency Continuous Action Chunks in Latent Space — score 20 Sources: huggingface

Modern robotic policies increasingly rely on action chunking to execute complex tasks in the physical world. While action chunking improves temporal consistency at moderate action frequencies, it becomes insufficient when the action frequency is further increased (e.g., to 60~Hz). At such high frequ

Other Signals

🟢 Augmented Equivariant Mesh Networks for Anatomical Mesh Segmentation (ICML 2026 Workshops) [R] — score 25 Sources: reddit/r/MachineLearning

Paper: https://arxiv.org/abs/2605.08172 Workshops: AI for Science & Structured Data for Health at ICML 2026 Abstract: >Anatomical mesh segmentation requires models that operate directly on irregular surface geometry while remaining robust to arb

Repo	Description	Stars Today	Language
NangoHQ/nango	Build product integrations with AI.	860	typescript
Hmbown/CodeWhale	DeepSeek v4 coding agent in terminal	448	rust
thedotmack/claude-mem	Persistent Context Across Sessions for Every Agent – Captures everything your agent does during sessions, compresses it with AI, and injects relevant context back into future sessions. Works with Claude Code, OpenClaw, Codex, Gemini, Hermes, Copilot, OpenCode + More	352	typescript
p-e-w/heretic	Fully automatic censorship removal for language models	314	python
alpic-ai/skybridge	Skybridge is a full-stack TypeScript framework for MCP Apps and ChatGPT Apps. Type-safe. React-powered. Platform-agnostic.	52	typescript
modelscope/FunASR	Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.	42	python

📄 New Papers

Title	Category	Hotness	Link
Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling	research_paper	19	Open
Rethinking VLM Representation for VLA Initialization	research_paper	7	Open
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions	research_paper	6	Open
BrickAnything: Geometry-Conditioned Buildable Brick Generation with Structure-Aware Tokenization	cs.AI	0	Open
Can LLMs Introspect? A Reality Check	cs.AI	0	Open
Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory	cs.AI	0	Open
Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions	cs.AI	0	Open
Constraint acquisition needs better benchmarks	cs.AI	0	Open
Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems	cs.AI	0	Open
Experiments in Agentic AI for Science	cs.AI	0	Open
Anchor: Mitigating Artifact Drift in Agent Benchmark Generation	cs.AI	0	Open
OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling	cs.AI	0	Open
JobBench: Aligning Agent Work With Human Will	cs.AI	0	Open
Managing Uncertainty in LLM-Generated Procedural Knowledge for Virtual Laboratory Planning	cs.AI	0	Open
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence	cs.AI	0	Open

🐦 Twitter/X Highlights

Account	Tweet Summary
AnthropicAI	New on the Engineering Blog: The access and permissions we grant agents should evolve with their capabilities. In our own products, we set these parameters through sandboxing, which limits the scope of any potentially destructive actions. Read more: https://www.anthropic.com/engineering/how-we-conta Post
GoogleDeepMind	Our Gemini for Science tools could help scientists unlock their next breakthrough. 🧬 Post
GoogleDeepMind	SynthID has already watermarked over 100 billion pieces of content, but transparency is a team sport. That’s why we’re partnering with @OpenAI, @ElevenLabs and Kakao to add SynthID watermarking to their models – accelerating the industry-wide momentum we started with @NVIDIA. Post
xai	Thank you so much for all the feedback on the Grok Build Beta. Some of you reported hitting limits quickly. Our team found areas to improve caching, so we've reset Grok Build usage limits for all accounts. Please keep sharing feedback - the team is here to help. Post

Repeated From Recent Briefings

Lum1104/Understand-Anything — Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. - first seen 2026-05-21
colbymchenry/codegraph — Pre-indexed code knowledge graph for Claude Code, Codex, Gemini, Cursor, OpenCode, AntiGravity, Kiro, and Hermes Agent — fewer tokens, fewer tool calls, 100% local - first seen 2026-05-09
Qwen3.5 35B A3B uncensored heretic Native MTP Preserved is Out Now With the Full 785 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats - first seen 2026-05-07
D^2-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing - first seen 2026-05-26
rohitg00/ai-engineering-from-scratch — Learn it. Build it. Ship it for others. - first seen 2026-05-21
anthropics/knowledge-work-plugins — Open source repository of plugins primarily intended for knowledge workers to use in Claude Cowork - first seen 2026-05-25
NousResearch/hermes-agent — The agent that grows with you - first seen 2026-05-11
farion1231/cc-switch — A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io - first seen 2026-05-08
earendil-works/pi — AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods - first seen 2026-05-09
garrytan/gstack — Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA - first seen 2026-05-12
... plus 144 more repeated items in processed data

AI Watchtower Briefing — 2026-05-27

🔴 High Significance

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

📈 Trending Repos

📄 New Papers

🐦 Twitter/X Highlights

Repeated From Recent Briefings