AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 The RTX 5000 PRO (48GB) arrived and it is better than I expected. — score 87 Sources: reddit/r/LocalLLaMA

I posted here about buying it a few days ago: [https://www.reddit.com/r/LocalLLaMA/comments/1t2slmw/first_time_gpu_buyer_got_a_rtx_5000_pro_was_it_a/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button](https://www.reddit.com/r/Loc

🔴 Codex is now in the ChatGPT mobile app — score 85 Sources: hackernews · lab_blog/OpenAI · twitter_rss

Use Codex anywhere with the ChatGPT mobile app. Monitor, steer, and approve coding tasks in real time across devices and remote environments.

🔴 How Claude Code works in large codebases — score 72 Sources: hackernews

Developer Tools

🔴 arXiv implements 1-year ban for papers containing incontrovertible evidence of unchecked LLM-generated errors, such as hallucinated references or results. [N] — score 94 Sources: reddit/r/MachineLearning

From Thomas G. Dietterich (arXiv moderator for cs.LG) on 𝕏 (thread): https://x.com/tdietterich/status/2055000956144935055 https://xcancel.com/tdietterich/status/2055000956144935055 "

🔴 AI Agents Need Economic Memory Ownership And Market Access — score 94 Sources: reddit/r/AIAgents

🔴 VS Code's new "Agents window" lets you use local AI models. Still requires an Internet connection and a Github Copilot plan (because we can't have nice things) — score 87 Sources: reddit/r/LocalLLaMA

At first I was excited to see this, but I guess I'll wait till someone figures out what people actually want

🔴 I think people underestimate how much “state” matters once agents leave the demo stage — score 81 Sources: reddit/r/AIAgents

In demos, agents look incredibly smart because every run starts fresh: clean context clean browser state clean memory clean inputs production is the opposite lol after a few days you suddenly have: * half-completed tasks * stale sessions * conflicting memory * retries from old runs * browser tabs in

🔴 A First Comprehensive Study of TurboQuant: Accuracy and Performance — score 77 Sources: reddit/r/LocalLLaMA

TL;DR from the article: - FP8 via --kv-cache-dtype fp8 remains the best default for KV-cache quantization: it provides 2x KV-cache capacity with negligible accuracy loss, while matching BF16 on most performance metrics and substantially improving them in memory-constrained serving scenarios. - Turbo

Infrastructure & Compute

🔴 NVIDIA Reportedly Prepares RTX 5090 Price Hike Amid Rising GDDR7 Costs (maybe RTX 50 and PRO series as well) — score 97 Sources: reddit/r/LocalLLaMA

Research Papers

🔴 FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale — score 82 Sources: huggingface · arxiv/cs.LG

Many real-world coding challenges are open-ended and admit no known optimal solution. Yet, recent progress in LLM coding has focused on well-defined tasks such as feature implementation, bug fixing, and competitive programming. Open-ended coding remains a weak spot for LLMs, largely because open-end

🔴 Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning — score 78 Sources: huggingface · arxiv/cs.LG

We often aim to generate images that are both photorealistic and 3D-consistent, adhering to precise geometry, material, and viewpoint controls. Typically, this is achieved by fine-tuning an image generator, pre-trained on billions of real images, using renders of synthetic 3D assets, where annotatio

Other Signals

🔴 Ontario auditors find doctors' AI note takers routinely blow basic facts — score 83 Sources: hackernews

🟡 Notable

Model Releases

🟡 China modded GPU (eg. 4090 48gb) --> I'm gonna figure it out. IS THERE NO ONE ELSE CURIOUS?? — score 63 Sources: reddit/r/LocalLLaMA

There's a dearth of information (in the english world) about these cards. The good recent video is probably this one: https://www.youtube.com/watch?v=TcRGBeOENLg even in this subreddit, there's seems to be few reviews of these cards. Last couple of dece

🟡 @xai: An early beta of Grok Build, an agentic CLI for coding, building apps, and automating workflows is now available for SuperGrok Heavy subscribers. Through this early beta, we will improve the model an — score 60 Sources: twitter_rss

An early beta of Grok Build, an agentic CLI for coding, building apps, and automating workflows is now available for SuperGrok Heavy subscribers. Through this early beta, we will improve the model and product based on your feedback. Try it at http://x.ai/cli

🟡 I Let a Small Model Train on Its Own Mistakes. It Reached 80% on HumanEval and Beat GPT-3.5 on Math — score 57 Sources: reddit/r/LocalLLaMA

A few months ago, I got stuck on one line in the DeepSeek-R1 paper. It said models could improve through verifiable rewards. That sounded almost magical to me. Not because it was impossible, but because it made me wonder something very simple: What if a model could teach itself to code, without huma

🟡 Claude for Legal — score 50 Sources: hackernews

🟡 @AnthropicAI: We’re partnering with the Gates Foundation, committing $200 million in grants, Claude credits, and technical support to programs in global health, life sciences, education, agriculture, and economic m — score 50 Sources: twitter_rss

We’re partnering with the Gates Foundation, committing $200 million in grants, Claude credits, and technical support to programs in global health, life sciences, education, agriculture, and economic mobility. Read more: https://www.anthropic.com/news/gates-foundation-partnership

Omitted 2 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

🟡 Jakedismo/codegraph-rust — 100% Rust implementation of code graphRAG with blazing fast AST+FastML parsing, surrealDB backend and advanced agentic code analysis tools through MCP for efficient code agent context management — score 67 Sources: github_trending

100% Rust implementation of code graphRAG with blazing fast AST+FastML parsing, surrealDB backend and advanced agentic code analysis tools through MCP for efficient code agent context management

🟡 I’m building a UE5 MetaHuman (realistic digital human) AI Companion that adapts conversation into gestures, body actions, and voice-ready replies — score 56 Sources: reddit/r/AIAgents

Hey everyone 👋 I’m building **Companion AI**, a UE5 + MetaHuman based embodied AI system where conversation becomes body language, actions, and presence. Instead of opening a normal chat window, the user sees a realistic MetaHuman companion in a room. The character can respond through text, voic

🟡 OthmanAdi/planning-with-files — Claude Code skill implementing Manus-style persistent markdown planning — the workflow pattern behind the $2B acquisition. — score 56 Sources: github_trending

Claude Code skill implementing Manus-style persistent markdown planning — the workflow pattern behind the $2B acquisition.

🟡 Sea's View on the Future of Agentic Software Development with Codex — score 50 Sources: lab_blog/OpenAI

Sea Limited's CPO explains why the company is deploying Codex across engineering teams to accelerate AI-native software development in Asia.

🟡 zubair-trabzada/geo-seo-claude — GEO-first SEO skill for Claude Code. Comprehensive AI search optimization for any website — citability scoring, AI crawler analysis, brand authority, schema markup, platform-specific optimization, and PDF reports. If you want learn how to sell this to real businesses, check out the skool community — score 46 Sources: github_trending

GEO-first SEO skill for Claude Code. Comprehensive AI search optimization for any website — citability scoring, AI crawler analysis, brand authority, schema markup, platform-specific optimization, and PDF reports. If you want learn how to sell this to real businesses, check out the skool community

Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.

Research Papers

🟡 ViMU: Benchmarking Video Metaphorical Understanding — score 55 Sources: huggingface

Any new medium, once it emerges, is used for more than the transmission of overt content alone. The information it carries typically operates on two levels: one is the content directly presented, while the other is the subtext beneath it-the implicit ideas and intentions the creator seeks to convey

🟡 LiSA: Lifelong Safety Adaptation via Conservative Policy Induction — score 55 Sources: huggingface · arxiv/cs.CL

As AI agents move from chat interfaces to systems that read private data, call tools, and execute multi-step workflows, guardrails become a last line of defense against concrete deployment harms. In these settings, guardrail failures are no longer merely answer-quality errors: they can leak secrets,

🟡 CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves — score 42 Sources: huggingface · arxiv/cs.LG

We introduce CurveBench, a benchmark for hierarchical topological reasoning from visual input. CurveBench consists of 756 images of pairwise non-intersecting Jordan curves across easy, polygonal, topographic-inspired, maze-like, and dense counting configurations. Each image is annotated with a roote

🟡 BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE — score 40 Sources: huggingface

Mixture-of-Experts (MoE) architectures enhance the efficiency of large language models by activating only a subset of experts per token. However, standard MoE employs a fixed Top-K routing strategy, leading to redundant computation and suboptimal inference latency. Existing acceleration methods eith

Other Signals

🟡 Would a 2000-2021 ML paper even get accepted today? [D] — score 69 Sources: reddit/r/MachineLearning

I keep hearing some version of this: “A paper that got accepted years ago wouldn’t stand a chance today.” Honestly, for a lot of ML subfields, this doesn’t sound crazy anymore. A paper that once looked solid can now look under-evaluated, under-ablated, weak on baselines, or just too obvious. So mayb

🟡 Access to frontier AI will soon be limited by economic and security constraints — score 61 Sources: hackernews

🟡 Need a second pair of eyes, this Qwen3.6 27B quant recipe consistently thinks less and is correct — score 50 Sources: reddit/r/LocalLLaMA

Ok, hear me out. This all started when I was trying to understand why this Qwen3.6 27B INT8 Autoround (https://huggingface.co/Minachist/Qwen3.6-27B-INT8-AutoRound/tree/main) recipe was performing so much better than any other Q

🟡 @AnthropicAI: We've published a paper that explains our views on AI competition between the US and China. The US and democratic allies hold the lead in frontier AI today. Read more on what it’ll take to keep that — score 50 Sources: twitter_rss

We've published a paper that explains our views on AI competition between the US and China. The US and democratic allies hold the lead in frontier AI today. Read more on what it’ll take to keep that lead: https://www.anthropic.com/research/2028-ai-leadership

🟡 eight months running autonomous business agents in production with real money. here is the specific failure mode that benchmarks structurally cannot surface. — score 44 Sources: reddit/r/AIAgents

this sub thinks seriously about agents so I will skip the basics and get straight to the production observation worth discussing. PayWithLocus is the company. LocusFounder is the product. YC backed this year. VC backed. launched May 5th. the system runs entire businesses through a multi agent archit

🟢 Incremental

Model Releases

🟢 RDNA3 Flash Attention fix just dropped by llama.cpp b9158 — score 37 Sources: reddit/r/LocalLLaMA

https://github.com/ggml-org/llama.cpp/releases

🟢 Show HN: GlycemicGPT – Open-source AI-powered diabetes management — score 28 Sources: hackernews

🟢 RelaxAI – UK sovereign LLM inference at 80% cheaper than OpenAI/Claude — score 17 Sources: hackernews

🟢 I have (even faster) DeepSeek V4 Pro at home — score 7 Sources: reddit/r/LocalLLaMA

Few days ago I posted about my DeepSeek V4 Pro at home - now time for an update. Yesterday I finally managed to run this model in ktransformers (sglang + kt-kernel).

🟢 I got tired of OpenClaw skills having no actual usage so I spent 3 weeks building one. — score 0 Sources: reddit/r/AIAgents

Building something for developers who use OpenClaw. I just quit using ClawHub. Not because it's bad. Because I built something better. The OpenClaw ecosystem just got a lot more powerful. If you're using Claude, Cursor, or OpenClaw — this is for you. Beta dropping soon. 🦞 #BuildInPublic #OpenClaw #

Developer Tools

🟢 cline/cline — Autonomous coding agent as an SDK, IDE extension, or CLI assistant. — score 39 Sources: github_trending

Autonomous coding agent as an SDK, IDE extension, or CLI assistant.

🟢 Most Agent Reliability Write-Ups Completely Ignore the "This Agent Moves Money" Failure Mode — score 19 Sources: reddit/r/AIAgents

I've been building an agent layer that connects to user accounts on Kalshi, Polymarket, DraftKings, FanDuel, and a handful of others, and watches user-defined strategies execute against them. the writeup the agent reliability literature wants me to do is "here's our eval suite, here's our supervisio

🟢 Infracost (YC W21) Is Hiring Sr Dev Advocate to make agents cloud cost-aware — score 6 Sources: hackernews

🟢 awslabs/agent-plugins — Agent Plugins for AWS equip AI coding agents with the skills to help you architect, deploy, and operate on AWS. — score 6 Sources: github_trending

Agent Plugins for AWS equip AI coding agents with the skills to help you architect, deploy, and operate on AWS.

Infrastructure & Compute

🟢 NVIDIA-AI-Blueprints/video-search-and-summarization — Suite of reference architectures for building GPU-accelerated vision agents and AI-powered video analytics applications. — score 37 Sources: github_trending

Suite of reference architectures for building GPU-accelerated vision agents and AI-powered video analytics applications.

Other Signals

🟢 Our AI agent told a customer our competitor was better. That's when we realized generic guardrails aren't enough. — score 39 Sources: reddit/r/AIAgents

Shipped a customer-facing agent a few months back. Had the standard safety guardrails in place, felt pretty good about it. First week in prod, a customer asks "should I go with you or [competitor]" and our agent gives them a thoughtful comparison that ends with honestly for your use case they migh

🟢 Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version) — score 30 Sources: reddit/r/LocalLLaMA

In my opinion, MTP models are 100% game changer for local LLMs. In terms of speed, I was getting around 1.5x the tok/sec of previous tests. The project was a test - building a full iterative step-by-step pygame; a small mystery dungeon-style game. At first I set 100-200k context and raised it to 300

🟢 MiniMax M2.7 ultra uncensored heretic is Out Now with 4/100 Refusals, Available in Safetensors and GGUFs Formats! — score 23 Sources: reddit/r/LocalLLaMA

llmfan46/MiniMax-M2.7-BF16-ultra-uncensored-heretic: https://huggingface.co/llmfan46/MiniMax-M2.7-BF16-ultra-uncensored-heretic llmfan46/MiniMax-M2.7-ultra-uncensored-heretic-GGUF: [https://huggingface.co/llmfan46/MiniMax-

🟢 LLM Policy for Rust Compiler — score 19 Sources: hackernews

🟢 club-5060ti: practical RTX 5060 Ti local LLM notes and configs — score 17 Sources: reddit/r/LocalLLaMA

I put together a small public repo for RTX 5060 Ti 16GB local LLM setups: I took inspiration from the club-3090 repo, but this one is focused on documenting what we’ve actually tested on 5060 Ti hardware so the setup details are easier to share and reproduce. Current seed setup is 2x RTX 5060 Ti 16G

Omitted 2 additional other signals items from the main section; see raw data and source-specific sections below.

📊 Cross-Source Signals

Items that appeared on 3+ sources today:

Codex is now in the ChatGPT mobile app — appeared on: hackernews (333), lab_blog/OpenAI (100), twitter_rss (0)

Repo	Description	Stars Today	Language
Jakedismo/codegraph-rust	100% Rust implementation of code graphRAG with blazing fast AST+FastML parsing, surrealDB backend and advanced agentic code analysis tools through MCP for efficient code agent context management	191	rust
OthmanAdi/planning-with-files	Claude Code skill implementing Manus-style persistent markdown planning — the workflow pattern behind the $2B acquisition.	124	python
zubair-trabzada/geo-seo-claude	GEO-first SEO skill for Claude Code. Comprehensive AI search optimization for any website — citability scoring, AI crawler analysis, brand authority, schema markup, platform-specific optimization, and PDF reports. If you want learn how to sell this to real businesses, check out the skool community	80	python
sirmalloc/ccstatusline	🚀 Beautiful highly customizable statusline for Claude Code CLI with powerline support, themes, and more.	76	typescript
cline/cline	Autonomous coding agent as an SDK, IDE extension, or CLI assistant.	63	typescript
NVIDIA-AI-Blueprints/video-search-and-summarization	Suite of reference architectures for building GPU-accelerated vision agents and AI-powered video analytics applications.	62	python
awslabs/agent-plugins	Agent Plugins for AWS equip AI coding agents with the skills to help you architect, deploy, and operate on AWS.	8	python

📄 New Papers

Title	Category	Hotness	Link
FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale	research_paper	14	Open
Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning	research_paper	11	Open
ViMU: Benchmarking Video Metaphorical Understanding	research_paper	3	Open
LiSA: Lifelong Safety Adaptation via Conservative Policy Induction	research_paper	2	Open
Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey	cs.CL	0	Open
VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use	cs.CL	0	Open
Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding	cs.CL	0	Open
Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning	cs.CL	0	Open
Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation	cs.CL	0	Open
PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts	cs.CL	0	Open
Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents	cs.CL	0	Open
Distribution Corrected Offline Data Distillation for Large Language Models	cs.CL	0	Open
Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study	cs.CL	0	Open
When Evidence Conflicts: Uncertainty and Order Effects in Retrieval-Augmented Biomedical Question Answering	cs.CL	0	Open
Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards	cs.CL	0	Open

🏢 Lab Blog Posts

OpenAI: Sea's View on the Future of Agentic Software Development with Codex

🐦 Twitter/X Highlights

Account	Tweet Summary
xai	An early beta of Grok Build, an agentic CLI for coding, building apps, and automating workflows is now available for SuperGrok Heavy subscribers. Through this early beta, we will improve the model and product based on your feedback. Try it at http://x.ai/cli Post
AnthropicAI	We've published a paper that explains our views on AI competition between the US and China. The US and democratic allies hold the lead in frontier AI today. Read more on what it’ll take to keep that lead: https://www.anthropic.com/research/2028-ai-leadership Post
AnthropicAI	We’re partnering with the Gates Foundation, committing $200 million in grants, Claude credits, and technical support to programs in global health, life sciences, education, agriculture, and economic mobility. Read more: https://www.anthropic.com/news/gates-foundation-partnership Post
OpenAI	Pinned: You've been asking for this one... Now in preview: Codex in the ChatGPT mobile app. Start new work, review outputs, steer execution, and approve next steps, all from the ChatGPT mobile app. Codex will keep running on your laptop, Mac mini, or devbox. Post

Repeated From Recent Briefings

tinyhumansai/openhuman — Your Personal AI super intelligence. Private, Simple and extremely powerful. - first seen 2026-05-11
rohitg00/agentmemory — #1 Persistent memory for AI coding agents based on real-world benchmarks - first seen 2026-05-09
NousResearch/hermes-agent — The agent that grows with you - first seen 2026-05-11
yikart/AiToEarn — Let's use AI to Earn! - first seen 2026-05-11
garrytan/gstack — Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA - first seen 2026-05-12
K-Dense-AI/scientific-agent-skills — A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing. - first seen 2026-05-14
Human-level performance via ML was not proven impossible with complexity theory [D] - first seen 2026-05-14
rtk-ai/rtk — CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies - first seen 2026-05-05
danielmiessler/Personal_AI_Infrastructure — Agentic AI Infrastructure for magnifying HUMAN capabilities. - first seen 2026-05-02
millionco/react-doctor — Your agent writes bad React. This catches it - first seen 2026-05-10
... plus 512 more repeated items in processed data

AI Watchtower Briefing — 2026-05-15

🔴 High Significance

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Other Signals

📊 Cross-Source Signals

📈 Trending Repos

📄 New Papers

🏢 Lab Blog Posts

🐦 Twitter/X Highlights

Repeated From Recent Briefings