AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Gemini 3.5 Flash — score 85 Sources: hackernews

🔴 Do you guys actually think AI agents can replace people for bigger tasks anytime soon? — score 72 Sources: reddit/r/AIAgents

Not talking about small stuff like summarizing notes or drafting emails. I mean real work: * managing projects * handling operations * coordinating across tools * doing research end-to-end * dealing with messy real-world situations Because honestly my experience has been all over the place lol Tools

Developer Tools

🔴 Agentic Payments: How AI Agents Are Becoming New Players in the Payments Market — score 89 Sources: reddit/r/AIAgents

🔴 got my first "rm -rf /" today — score 82 Sources: reddit/r/LocalLLaMA

Agent decided to test if harmful command block worked by issuing a rm -rf / Thankfully it worked so only damage was a mild heart attack. I implemented a sandbox immediately afterwards. EDIT: for those wondering, I was implementing a bash command whitelist and also bubblewrap for isolation. I did the

🔴 Intel's Crescent Island PCB Leaks, Showing a Massive Xe3P GPU, 16-Pin Connector, 160GB LPDDR5X as Intel Sidesteps the HBM Shortage — score 75 Sources: reddit/r/LocalLLaMA

Upcoming Intel Xe3P data center GPU with 20 8GBLPDDR5X modules for a total of 160GB, bypassing HBM shortages. Assuming a 32-bit interface, that's a 640-bit wide memory interface, or 10 channel memory interface if converted to the 64-bit wide desktop equivalent. At 8800-9500MT, that's a 704-760GB/s m

🔴 Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks — score 75 Sources: hackernews

🔴 Alishahryar1/free-claude-code — Use claude-code for free in the terminal, VSCode extension or discord like OpenClaw (voice supported) — score 72 Sources: github_trending

Use claude-code for free in the terminal, VSCode extension or discord like OpenClaw (voice supported)

Enterprise Adoption

🔴 The harmless prompt injection that leaked our system architecture — score 89 Sources: reddit/r/AIAgents

Model cheerfully listed every internal API endpoint, database schema, integration paths, third party service names, even the staging environment urls. Nothing flagged as harmful by our safety layer. No toxic language, attempts to bypass etc. Just a helpful AI being too helpful. The request didn't tr

Research Papers

🔴 When Vision Speaks for Sound — score 95 Sources: huggingface

Despite rapid progress in video-capable MLLMs, we find that their apparent audio understanding in videos is often vision-driven: models rely on visual cues to infer or hallucinate acoustic information, rather than verifying the audio stream. This issue appears across both state-of-the-art open-sourc

🔴 PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset — score 85 Sources: huggingface

Text-to-Image (T2I) models have recently seen notable progress around 1K and 2K resolution. With the extreme desire for better visual experience and the rapid development of imaging technology, the demand for Ultra-High-Resolution (UHR) image generation has grown significantly. However, UHR image ge

Other Signals

🔴 I’ve joined Anthropic — score 95 Sources: hackernews

🔴 bytedance released an open source model that attempts to do just about anything with only 3b parameters — score 89 Sources: reddit/r/LocalLLaMA

EDIT: working link https://huggingface.co/bytedance-research/Lance Lance is a lightweight native unified multimodal model that supports image and video understanding, generation, and editing within a single framework. * *Efficient at 3B scale.

🟡 Notable

Model Releases

🟡 LM Studio finally added support for MTP Speculative Decoding — score 68 Sources: reddit/r/LocalLLaMA

https://preview.redd.it/1uuzjm0ll72h1.png?width=923&format=png&auto=webp&s=1af7d7594be1e08ff7ad6797e2bc53e9410769a3 update to 0.4.14 Build 2 (Beta) and make sure your llama.cpp engine is 2.15.0 https://preview.redd.it/x0vdwjb3n72h1.png?width=742&format=png&auto=webp&s=6367de4

🟡 Co-Scientist (Nature 2026-05-19): 5+1 Gemini agents, tournament-of-ideas, prod — score 61 Sources: reddit/r/AIAgents

DeepMind published a Co-Scientist in Nature yesterday. It's a multi-agent system on Gemini with five role-specialised agents — Generation, Reflection, Ranking, Evolution, Meta-review — orchestrated by a Supervisor agent that breaks down high-level research goals into executable steps and coordinates

🟡 @OpenAI: People are generating over 1.5 billion images a week in ChatGPT. Researcher @kenjihata joins Product lead @adele__li and host @AndrewMayne to explore the new use cases and trends emerging since the l — score 60 Sources: twitter_rss

People are generating over 1.5 billion images a week in ChatGPT. Researcher @kenjihata joins Product lead @adele__li and host @AndrewMayne to explore the new use cases and trends emerging since the launch of Images 2.0.

🟡 @OpenAI: Introducing OpenAI Guaranteed Capacity: a new offering that enables customers to guarantee long-term access to OpenAI compute. We’ve made long-term investments in infrastructure, partnerships, and ca — score 60 Sources: twitter_rss

Introducing OpenAI Guaranteed Capacity: a new offering that enables customers to guarantee long-term access to OpenAI compute. We’ve made long-term investments in infrastructure, partnerships, and capacity planning to help customers scale reliably. Now, Guaranteed Capacity helps customers plan ahead

🟡 Carbon: Decoding the Language of Life — score 54 Sources: reddit/r/LocalLLaMA

https://preview.redd.it/rajj11v7j42h1.png?width=1744&format=png&auto=webp&s=72381de22a9bac4b30a59498d549bb09df075df3 Hey, it's loubna from Hugging Face. Very happy to share our latest release: Carbon 🧬, a family of open DNA foundation models. Carbon-3B matches the current SOTA (Evo2-7B)

Omitted 6 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

🟡 Remove-AI-Watermarks – CLI and library for removing AI watermarks from images — score 65 Sources: hackernews

🟡 OpenAI Adopts Google's SynthID Watermark for AI Images with Verification Tool — score 62 Sources: hackernews · lab_blog/OpenAI

OpenAI advances AI content provenance with Content Credentials, SynthID, and a verification tool to help people identify and trust AI-generated media.

🟡 All fundamental knowledge in ML Course by Andrew NG that I noted and create into a repo github [R] — score 56 Sources: reddit/r/MachineLearning

https://preview.redd.it/mikhasjiq32h1.png?width=572&format=png&auto=webp&s=4c053200dbd9852bebf083550e2144b31579d497 https://preview.redd.it/bay5r3njq32h1.png?width=575&format=png&auto=webp&s=2823db3d6bc534ef00330528a200cba2aca1c5d3 https://preview.redd.it/dm40ntdkq32h1.png?wi

🟡 alirezarezvani/claude-skills — 313+ Claude Code skills & agent skills & plugins for Claude Code, Codex, Gemini CLI, Cursor, and 8 more coding agents — engineering, marketing, product, compliance, C-level advisory, research, business operations, commercial & finance, and your daily productivity skills. — score 52 Sources: github_trending

313+ Claude Code skills & agent skills & plugins for Claude Code, Codex, Gemini CLI, Cursor, and 8 more coding agents — engineering, marketing, product, compliance, C-level advisory, research, business operations, commercial & finance, and your daily productivity skills.

🟡 Got my agent to audit MCP servers for trust issues .. how do you handle it? — score 44 Sources: reddit/r/AIAgents

Got my agent to audit MCP servers for trust issues (credential exposure, permission scope, data isolation). Here's what 20 popular servers scored: • docker-mcp: 18/100 — credential exposure across all operations • Fetch: 84/100 — clean but limited scope The MCP ecosystem is growing fast but there's

Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟡 unslothai/unsloth — Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally. — score 50 Sources: github_trending

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

🟡 The next phase of OpenAI’s Education for Countries — score 50 Sources: lab_blog/OpenAI

OpenAI advances Education for Countries, expanding AI adoption in schools with new partnerships, teacher training, and tools to improve global learning outcomes.

🟡 Google AI Edge Gallery v1.0.13 & v1.0.14 updates: Gemma 4 Multi-Token Prediction, Pixel TPU support, experimental MCP, new skills, now saves chat history — score 46 Sources: reddit/r/LocalLLaMA

Research Papers

🟡 TideGS: Scalable Training of Over One Billion 3D Gaussian Splatting Primitives via Out-of-Core Optimization — score 65 Sources: huggingface

Training 3D Gaussian Splatting (3DGS) at billion-primitive scale is fundamentally memory-bound: each Gaussian primitive carries a large attribute vector, and the aggregate parameter table quickly exceeds GPU capacity, limiting prior systems to tens of millions of Gaussians on commodity single-GPU ha

🟡 SAGA: A Sequence-Adaptive Generative Architecture for Multi-Horizon Probabilistic Forecasting with Adaptive Temporal Conformal Prediction — score 58 Sources: huggingface · arxiv/cs.LG

Microsimulation models used by ministries of finance and central banks rely on parametric processes for lifetime earnings that capture only first and second moments of the conditional distribution and miss long-range nonlinear structure. We propose SAGA, a decoder-only transformer for irregular tabu

🟡 Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction — score 45 Sources: huggingface

Real-time duplex interaction is essential for multimodal AI systems operating in real-world scenarios, where models must continuously process streaming inputs and respond at appropriate moments. However, most existing multimodal large language models (MLLMs) are evaluated in offline settings, where

🟡 Where Does Authorship Signal Emerge in Encoder-Based Language Models? — score 42 Sources: huggingface · arxiv/cs.CL

Authorship attribution models fine-tuned with the same pretrained encoder, data, and loss can differ four-fold in performance depending only on their scoring mechanism. We use mechanistic interpretability tools to explain this gap. Stylistic features such as word length, punctuation density, and fun

🟡 CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning — score 42 Sources: huggingface · arxiv/cs.AI

Chain-of-thought (CoT) is a standard approach for eliciting reasoning capabilities from large language models (LLMs). However, the common CoT paradigm treats thinking as a prerequisite for answering, which can delay access to plausible answers and incur unnecessary token costs even when the model is

Other Signals

🟡 48GB VRAM users, what are your daily drivers? Do you wish you had more VRAM? What would you run if you did? — score 61 Sources: reddit/r/LocalLLaMA

I’m upgrading from 32 to 48 soon and am excited but I’m curious what y’all run!

🟡 ICML Proceedings-only [D] — score 44 Sources: reddit/r/MachineLearning

For proceedings-only papers, do we need to make a poster and submit it to the portal? Has anyone asked this question to ICML Program Chair?

🟢 Incremental

Model Releases

🟢 Qwen3.7 Max scored by Artificial Analysis, 27B/35B waiting room — score 39 Sources: reddit/r/LocalLLaMA

https://preview.redd.it/42ak5qmus82h1.png?width=1133&format=png&auto=webp&s=744ea3dfc06c83d0c4d8aa128c39b3238b17d7be Qwen 3.7 Max sitting at 5th, pretty much on par with GPT 5.4 (xhigh) and a notch above the just released Gemini 3.5 Flash. On the other end, we see DSV4 Flash and Qwen3.6

🟢 Gemini CLI will stop working from June 18, 2026 — score 35 Sources: hackernews

🟢 Let’s talk quants of Gemma and Qwen - 16 vs Q8 vs Q4 - any experiences? — score 18 Sources: reddit/r/LocalLLaMA

Some people say they’d never go under Q8, and others say they find Q3 acceptable! What’s your take?

🟢 Gemma 4 MTP with LlamaCPP — score 4 Sources: reddit/r/LocalLLaMA

I am running Gemma 4 31B for a project using LlamaCPP. There is no integrated main model + MTP drafter GGUF. And from what I can tell, LlamaCPP was updated to not accept a separate MTP drafter GGUF but instead to use a combined GGUF for main+drafter. So how can I use Gemma 4 31B with MTP on LlamaCPP

Developer Tools

🟢 New SOTA 1B model? HRM-text — score 32 Sources: reddit/r/LocalLLaMA

Saw this video by them. Seems interesting but Tbh the benchmarks seem too good to be true. I'm not super knowledgeable on how models think so can anyone more knowledgeable explain what exactly is happening. And it's pros and cons? GitHub: https: //github.com/sapientinc/HRM-Text Hugging face: https:/

🟢 Machine Learning on Spherical Manifold [R] — score 31 Sources: reddit/r/MachineLearning

Hi, I'm interested in geometric deep learning (due to Michael M. Bronstein's book and Maurice Weiler's PhD thesis), and in order not to write projects to nowhere, I decided to keep a technical blog. I started with a short note about machine learning on spherical manifolds, but it's a pretty simple t

🟢 HanaokaYuzu/Gemini-API — ✨ Reverse-engineered Python API for Google Gemini web app — score 22 Sources: github_trending

✨ Reverse-engineered Python API for Google Gemini web app

🟢 dmtrKovalenko/fff — The fastest and the most accurate file search toolkit for AI agents, Neovim, Rust, C, and NodeJS — score 22 Sources: github_trending

The fastest and the most accurate file search toolkit for AI agents, Neovim, Rust, C, and NodeJS

🟢 [ECCV 2026] No modified date next to reviews [D] — score 19 Sources: reddit/r/MachineLearning

On Openreview, you can see modified date next to the review. This modified date should be recent (anything 12th May or newer) which means that reviewer gave a final justification and may have increased their score or kept the same score. In either case, it means they read the rebuttal and justified

Omitted 5 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟢 Michael-A-Kuykendall/shimmy — ⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever. — score 36 Sources: github_trending

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

🟢 Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! — score 25 Sources: reddit/r/LocalLLaMA

Hey r/DeepSeek, Who says we need an H100 cluster or the latest expensive GPUs to run frontier MoE models? I wanted to see how far we could push a single node of consumer legacy hardware, so we spent less than $2,500 total to build a budget machine that successfully runs DeepSeek-V4-Flash (284B t

Other Signals

🟢 Growing Neural Cellular Automata — score 25 Sources: hackernews

🟢 authentication/sesh timeouts in multi step browser agents — score 17 Sources: reddit/r/AIAgents

hey guys, building a custom multi-step agent atm that needs to navigate a bunch of different vendor sites to scrape data and pull invoices. the problem isn't the actual navigation (using standard gpt-4o calls for that), it's the absolute mess of handling weird login flows, random 2FA prompts, and ag

🟢 Infomaniak transitions to a foundation model to protect user data privacy — score 15 Sources: hackernews

🟢 Show HN: The AI Quant Desk for Onchain Finance — score 5 Sources: hackernews

Repo	Description	Stars Today	Language
Alishahryar1/free-claude-code	Use claude-code for free in the terminal, VSCode extension or discord like OpenClaw (voice supported)	563	python
alirezarezvani/claude-skills	313+ Claude Code skills & agent skills & plugins for Claude Code, Codex, Gemini CLI, Cursor, and 8 more coding agents — engineering, marketing, product, compliance, C-level advisory, research, business operations, commercial & finance, and your daily productivity skills.	157	python
unslothai/unsloth	Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.	156	python
Michael-A-Kuykendall/shimmy	⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.	108	rust
HanaokaYuzu/Gemini-API	✨ Reverse-engineered Python API for Google Gemini web app	59	python
dmtrKovalenko/fff	The fastest and the most accurate file search toolkit for AI agents, Neovim, Rust, C, and NodeJS	59	rust
screenpipe/screenpipe	YC (S26) \| Give AI the ability to live your experience. Records everything you do, say, hear 24/7, local, private, secure	29	rust

📄 New Papers

Title	Category	Hotness	Link
When Vision Speaks for Sound	research_paper	40	Open
PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset	research_paper	6	Open
TideGS: Scalable Training of Over One Billion 3D Gaussian Splatting Primitives via Out-of-Core Optimization	research_paper	3	Open
SAGA: A Sequence-Adaptive Generative Architecture for Multi-Horizon Probabilistic Forecasting with Adaptive Temporal Conformal Prediction	research_paper	2	Open
Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance	cs.AI	0	Open
Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production	cs.AI	0	Open
Evaluating the Utility of Personal Health Records in Personalized Health AI	cs.AI	0	Open
Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency	cs.AI	0	Open
AgentNLQ: A General-Purpose Agent for Natural Language to SQL	cs.AI	0	Open
KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition	cs.AI	0	Open
Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On	cs.AI	0	Open
Interference-Aware Multi-Task Unlearning	cs.AI	0	Open
Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts	cs.AI	0	Open
DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows	cs.AI	0	Open
POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: The next phase of OpenAI’s Education for Countries
OpenAI: Introducing OpenAI for Singapore

🐦 Twitter/X Highlights

Account	Tweet Summary
OpenAI	People are generating over 1.5 billion images a week in ChatGPT. Researcher @kenjihata joins Product lead @adele__li and host @AndrewMayne to explore the new use cases and trends emerging since the launch of Images 2.0. Post
OpenAI	Introducing OpenAI Guaranteed Capacity: a new offering that enables customers to guarantee long-term access to OpenAI compute. We’ve made long-term investments in infrastructure, partnerships, and capacity planning to help customers scale reliably. Now, Guaranteed Capacity helps customers plan ahead Post
AnthropicAI	Over the past few months, we've been holding dialogues with scholars, philosophers, clergy, and ethicists on the questions AI raises—starting with how good character forms. Read more about how we’re widening the conversation on frontier AI: https://www.anthropic.com/news/widening-conversation-ai Post
GoogleDeepMind	Build your next story with Gemini Omni. Post
GoogleDeepMind	Gemini 3.5 Flash 🤝 @Antigravity Watch how the model deploys multiple subagents to design and build an entire city. Post
xai	Starting today, use your Grok or X Premium subscription in @openclaw. Chat with your agent, generate images and videos, or search for X posts. http://x.ai/news/grok-openclaw Post

Repeated From Recent Briefings

tinyhumansai/openhuman — Your Personal AI super intelligence. Private, Simple and extremely powerful. - first seen 2026-05-11
Imbad0202/academic-research-skills — Academic Research Skills for Claude Code: research → write → review → revise → finalize - first seen 2026-05-13
Qwen is cooking hard - first seen 2026-05-19
Reviving PapersWithCode (by Hugging Face) [P] - first seen 2026-05-19
colbymchenry/codegraph — Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, and OpenCode — fewer tokens, fewer tool calls, 100% local - first seen 2026-05-09
rohitg00/agentmemory — #1 Persistent memory for AI coding agents based on real-world benchmarks - first seen 2026-05-09
HKUDS/CLI-Anything — "CLI-Anything: Making ALL Software Agent-Native" -- CLI-Hub:https://clianything.cc/ - first seen 2026-05-17
ZhuLinsen/daily_stock_analysis — LLM驱动的 A/H/美股智能分析：多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送，零成本定时运行，纯白嫖. LLM-powered stock analysis system for A/H/US markets. - first seen 2026-05-11
humanlayer/12-factor-agents — What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers? - first seen 2026-05-19
A Simple Solution to Improve Broken Peer Review System at AI Conferences [R] - first seen 2026-05-19
... plus 124 more repeated items in processed data

AI Watchtower Briefing — 2026-05-20

🔴 High Significance

Model Releases

Developer Tools

Enterprise Adoption

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Other Signals

📈 Trending Repos

📄 New Papers

🏢 Lab Blog Posts

🐦 Twitter/X Highlights

Repeated From Recent Briefings