AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Qwen will release another 27B with high probability — score 96 Sources: reddit/r/LocalLLaMA

They are waiting for the exact roadmap

Developer Tools

🔴 rohitg00/ai-engineering-from-scratch — Learn it. Build it. Ship it for others. — score 83 Sources: github_trending

Learn it. Build it. Ship it for others.

🔴 How competitive are PhD admissions currently [D] — score 81 Sources: reddit/r/MachineLearning

Hi, how hard is it currently to get a PhD position in machine Learning? Like what are the requirements to get to a decent mid tier program (= they publish regularly at respected journals and their work gets read my some people)? How is it in different regions e.g US, Europe, etc.. I am about to fini

🔴 hugohe3/ppt-master — AI generates natively editable PPTX from any document — real PowerPoint shapes with native animations, not images · by Hugo He — score 75 Sources: github_trending

AI generates natively editable PPTX from any document — real PowerPoint shapes with native animations, not images · by Hugo He

🔴 Got my boring admin work semi-automatedand it actually kinda works — score 72 Sources: reddit/r/AIAgents

I run a small business and there's a lot of dumb repetitive stuff i do every day. Sorting emails, writing client reports, moving support tickets to github so my dev stops copy pasting from slack. I can't code so the normal openclaw setup with terminal commands and api configs was a dead end for me.

Infrastructure & Compute

🔴 AMD Ryzen AI Halo PC will cost 3999$ with 128GB memory on board — score 82 Sources: reddit/r/LocalLLaMA

🔴 karpathy/autoresearch — AI agents running research on single-GPU nanochat training automatically — score 71 Sources: github_trending

AI agents running research on single-GPU nanochat training automatically

Research Papers

🔴 You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories — score 70 Sources: huggingface · arxiv/cs.CL

Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight trajectories are extr

🔴 IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools — score 70 Sources: huggingface

Multimodal large language models (MLLMs) have shown remarkable capability in bridging visual perception and textual reasoning, enabling zero-shot understanding across diverse industrial scenarios. However, their performance in open-vocabulary industrial anomaly detection (IAD) is often limited by do

Other Signals

🔴 An OpenAI model has disproved a central conjecture in discrete geometry — score 90 Sources: hackernews

🔴 HuggingFace benchmark datasets now let you filter by model size — score 89 Sources: reddit/r/LocalLLaMA

Quite useful to see which model under 32B performs best on swebenchverified for example. https://huggingface.co/datasets?benchmark=benchmark:official&sort=trending

🔴 Google’s AI is being manipulated. The search giant is quietly fighting back — score 70 Sources: hackernews

🟡 Notable

Model Releases

🟡 Qwen 3.6 35B GGUF: NTP vs MTP quantization results across GPUs and CPUs — score 68 Sources: reddit/r/LocalLLaMA

Hey r/LocalLLaMA, We’ve released our ByteShape Qwen 3.6 35B GGUF quantizations in two families: standard NTP (Next Token Prediction or non-MTP) and MTP. Blog / Download NTP Models / [Download MTP

🟡 @GoogleDeepMind: How can you accelerate your day to day research workflow? By giving AI the right scientific toolkit. We launched Science Skills for Google @Antigravity, integrating insights from over 30 major life — score 50 Sources: twitter_rss

How can you accelerate your day to day research workflow? By giving AI the right scientific toolkit. We launched Science Skills for Google @Antigravity, integrating insights from over 30 major life science sources, including UniProt and the AlphaFold Database.

🟡 @GoogleDeepMind: Gemini 3.5 Flash has landed. — score 50 Sources: twitter_rss

Gemini 3.5 Flash has landed.

🟡 Same task in github-copilot, pi, claude-code, and opencode with Qwen3.6 27B — score 46 Sources: reddit/r/LocalLLaMA

I wanted to know how much of a coding agent's performance came from the model and how much came from the harness, so I vibed a setup to allow me to test multiple agentic harnesses/model combinations on the same task. ALl the images above all come from the same model, but with a different harness. St

Developer Tools

🟡 I built a pip-installable Python coding agent from first principles — hummcode — score 63 Sources: reddit/r/AIAgents

I wanted to understand how coding agents actually work at the lowest level, so I studied a few open-source implementations and built one in Python. It's called hummcode (the Hummingbird Coding Agent). pip install hummcode GitHub: [https://github.com/0xchamin/hummcode](https://github.com/0x

🟡 Best way to build a visual AI soryboard workflow (n8n|zapier? Agent? Custom webapp? Already available solution?) — score 61 Sources: reddit/r/AIAgents

I need to build an AI-powered storyboard workflow or app or any system which MY BOSS WILL USE and I’d like advice on the best tools. I have not worked with automation tools before, neither an agent, neither python. What I need to accomplish (an automated visual system for boss): My non-technical

🟡 can1357/oh-my-pi — ⌥ AI Coding agent for the terminal — hash-anchored edits, optimized tool harness, LSP, Python, browser, subagents, and more — score 61 Sources: github_trending

⌥ AI Coding agent for the terminal — hash-anchored edits, optimized tool harness, LSP, Python, browser, subagents, and more

🟡 Lum1104/Understand-Anything — Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. — score 55 Sources: github_trending

Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more.

🟡 Back again, many changes have taken place. — score 54 Sources: reddit/r/LocalLLaMA

After fixing more than 90 bugs, I can now safely claim that my project when downloaded from npm or built from source is stable. As a newer dev there was a LOT of issues I had to work through, hours of troubleshooting and tui/commandline conflicts. It was a nightmare but it's finally over. I would re

Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟡 Anthropic is expanding to Colossus2. Will use GB200 — score 50 Sources: hackernews

🟡 vllm-project/vllm — A high-throughput and memory-efficient inference and serving engine for LLMs — score 43 Sources: github_trending

A high-throughput and memory-efficient inference and serving engine for LLMs

Research Papers

🟡 Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs — score 62 Sources: huggingface · arxiv/cs.CL

LLM agents have recently emerged as a powerful paradigm for solving complex tasks through planning, tool use, memory retrieval, and multi-step interaction. However, these agentic workflows often introduce substantial input-side overhead, making the compute-intensive prefilling stage a key bottleneck

🟡 OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization — score 42 Sources: huggingface · arxiv/cs.LG

The key-value (KV) cache dominates memory bandwidth and footprint in long-context autoregressive inference. Recent rotation-preconditioned codecs (TurboQuant, PolarQuant) show that a structured random rotation followed by a per-coordinate scalar quantizer matched to an analytically tractable margina

Other Signals

🟡 OpenAI claims a general-purpose reasoning model found a counterexample to Erdos's unit-distance bound [D] — score 69 Sources: reddit/r/MachineLearning

OpenAI posted a math result today claiming that one of its general-purpose reasoning models found a construction disproving the conjectured n^{1+O(1/log log n)} upper bound in Erdős’s planar unit-distance problem. Announcement: [https://openai.com/index/model-disproves-discrete-geometry-conjecture/

🟡 CohereLabs/command-a-plus-05-2026-bf16 · Hugging Face — score 61 Sources: reddit/r/LocalLLaMA

🟡 Any tool to get accepted conference papers sorted by citation count? [D] — score 56 Sources: reddit/r/MachineLearning

Ie given a conference (say with openreview data) eg “NeurIPS, 2025”, return the accepted papers based on number of citations according to standard paper search engine (eg google scholar) Seems to be a surprisingly difficult thing to find online.

🟡 @OpenAI: Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions — score 50 Sources: twitter_rss

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entir

🟢 Incremental

Model Releases

🟢 How can you stop your model from looping — score 11 Sources: reddit/r/LocalLLaMA

So i thought this is a small model issue but when i added a new gpu and i am able to run low mid model like Qwen 3.6 35b q4 or q5 this issue still exists now its not as much as small model but it does break when linking the model to copilot chat or Hermes the model mid task will start loop thinking

Developer Tools

🟢 DayuanJiang/next-ai-draw-io — A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visualization. — score 37 Sources: github_trending

A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visualization.

🟢 [VAPI Experts: How are you handling real client phone numbers and no-answer routing? — score 28 Sources: reddit/r/AIAgents

Hey guys, I’m building an AI receptionist with Vapi + Make for service businesses (HVAC, plumbing, etc.) and I’m trying to understand how production setups actually work. I have a few questions: 1. How do you connect a client’s existing business phone number to a Vapi agent? 2. If the client already

🟢 Are Ai agents creating a new workflow management problem with managed OpenClaw? — score 28 Sources: reddit/r/AIAgents

The more I work with Ai agents, the more I notice that the models are getting smarter faster than the tools we have to manage them. Getting agents running is becoming easier. Managing them long term feels much messier, even when using a managed OpenClaw setup. Once multiple workflows stay active acr

🟢 Helix-agi project — score 28 Sources: reddit/r/AIAgents

I've been working on an Agentic wrapper system kind of like Openclaw or Hermes but with an 8d spatial-mapping system for memory retrieval instead of conventional RAG based system. I'm just looking to get some more people involved in testing and giving feedback for additional troubleshooting. [https:

🟢 ZeroID Agent Identity now has CIBA — score 28 Sources: reddit/r/AIAgents

ZeroID recently added support for client initiated backchannel authentication. ZeroID allows you to create agent identities with scoped permissions and delegated access. The only problem was you needed to predefine all permissions up front and they were stat

Omitted 5 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟢 Looking for real world comparisons between WALL OSS pi0.6 and OpenVLA[D] — score 38 Sources: reddit/r/MachineLearning

I am choosing a baseline for a real manipulation stack and trying not to lose a month on setup that someone here has already done. Shortlist is OpenVLA, pi0.6, and WALL OSS from X Square Robot. OpenVLA is still the easiest reference point with lots of reproductions. pi0.6 looks strong from recent pu

🟢 Training a vision model from scratch on iPod touch 4 images — score 32 Sources: reddit/r/LocalLLaMA

I trained a DCGAN model from scratch on iPod touch 4 pics. I understand the scale needed to train a vision model from scratch so I’m starting with just 1 case/object to take pics of. I took around 350 pics of a red solo cup in different backgrounds, lighting conditions, etc. The pictures that the mo

🟢 AMD BC-250 and the search for Cheap Compute — score 18 Sources: reddit/r/LocalLLaMA

I've been searching for disused/underappreciated compute vectors for a few months since the MI50 shot up in proce - in comes the salvaged PS5 APU on a standalone board; Zen 2, 16 GB unified GDDR6, RDNA 2 (gfx1013). They're $50-150 on eBay and ship with 24 of 40 CUs enabled. Got curious and started r

🟢 ai-dynamo/dynamo — A Datacenter Scale Distributed Inference Serving Framework — score 7 Sources: github_trending

A Datacenter Scale Distributed Inference Serving Framework

🟢 High E2E latency on fine-tuned Gemma 4 26B despite low TTFT [R] — score 6 Sources: reddit/r/MachineLearning

Recently fine-tuned a Gemma 4 26B model, and I’m seeing surprisingly high end-to-end latency despite the effective inference footprint being much smaller (~4B-ish behavior during serving). Current setup: * Model: Gemma 4 26B (fine-tuned) * Engine: vLLM * Quantization: FP8 * Hardware: H100 Observed

Business & Funding

🟢 OpenAI Is Preparing to File for an IPO Soon — score 30 Sources: hackernews

Other Signals

🟢 Qwen3.6 27B and llama.cpp appreciation post — score 39 Sources: reddit/r/LocalLLaMA

To preface, here's my config: llama-server \ --host 0.0.0.0 \ --port 1235 \ --models-preset %h/Software/models.ini \ --models-max 1 \ --sleep-idle-seconds 3600 \ --timeout 3600 \ --parallel 1 \ --device ROCm0,ROCm1 [*] flash-attn = on jinja = true fit = true ctxcp = 5 offline = true mmproj-offload =

🟢 Columbia Machine Learning Summer School (MLSS) 2026 [D] — score 38 Sources: reddit/r/MachineLearning

I got into this CFE MLSS 2026 and would like to connect with people who also got into it or have been in previous cohorts! I am organizing a group chat for people who got into the program :DD https://cfe.columbia.edu/content/mlss

🟢 Typewise (YC S22) Is Hiring an AI Growth Engineer (Zurich or Remote) — score 10 Sources: hackernews

🟢 I built AgentLighthouse, a local “Lighthouse for AI agents” that scans repos/docs/APIs for agent readiness — score 8 Sources: reddit/r/AIAgents

hello The basic idea comes from the fact that more people (including me) use Codex, Claude Code, Cursor, Copilot, MCP tools, etc., but they are still written only for humans. Agents might fail and struggle to use what you build because setup commands are unclear, docs are stale, OpenAPI operations a

🟢 HalBench: I built a custom sycophancy and hallucination benchmark and tested 4 frontier models (Sonnet 4.6, Grok 4.3, GPT 5.4 and Gemini 3.1 Pro), looking for input on what OSS models to run next! — score 5 Sources: reddit/r/LocalLLaMA

||0.64| |:-|:-| # HalBench Results: TL;DR: I built HalBench, an open benchmark for LLM sycophancy and hallucination. 3,200 false-premise prompts × 4 models = 12,800 graded responses. Validated against a human reader on 100 random items. Sonnet 4.6 > Grok 4.3 > GPT-5.4 > Gemini 3.1 Pro,

Repo	Description	Stars Today	Language
rohitg00/ai-engineering-from-scratch	Learn it. Build it. Ship it for others.	765	python
hugohe3/ppt-master	AI generates natively editable PPTX from any document — real PowerPoint shapes with native animations, not images · by Hugo He	421	python
karpathy/autoresearch	AI agents running research on single-GPU nanochat training automatically	367	python
can1357/oh-my-pi	⌥ AI Coding agent for the terminal — hash-anchored edits, optimized tool harness, LSP, Python, browser, subagents, and more	270	typescript
Lum1104/Understand-Anything	Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more.	188	typescript
volcengine/OpenViking	OpenViking is an open-source context database designed specifically for AI Agents(such as openclaw). OpenViking unifies the management of context (memory, resources, and skills) that Agents need through a file system paradigm, enabling hierarchical context delivery and self-evolving.	111	python
vllm-project/vllm	A high-throughput and memory-efficient inference and serving engine for LLMs	99	python
DayuanJiang/next-ai-draw-io	A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visualization.	68	typescript
e2b-dev/E2B	Open-source, secure environment with real-world tools for enterprise-grade agents.	34	python
agentgateway/agentgateway	Next Generation Agentic Proxy for AI Agents and MCP servers	20	rust

📄 New Papers

Title	Category	Hotness	Link
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories	research_paper	26	Open
IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools	research_paper	26	Open
Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs	research_paper	17	Open
Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs	cs.CL	0	Open
Leveraging Large Language Models for Sentiment Analysis: Multi-Modal Analysis of Decentraland's MANA Token	cs.CL	0	Open
Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification	cs.CL	0	Open
Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction	cs.CL	0	Open
Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues	cs.CL	0	Open
Data Scaling as Progressive Coverage of a Predictive Contribution Spectrum	cs.CL	0	Open
MedicalBench: Evaluating Large Language Models Toward Improved Medical Concept Extraction	cs.CL	0	Open
FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation	cs.CL	0	Open
Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning	cs.CL	0	Open
Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models	cs.CL	0	Open
Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models	cs.CL	0	Open
When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation	cs.CL	0	Open

🐦 Twitter/X Highlights

Account	Tweet Summary
OpenAI	Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entir Post
GoogleDeepMind	How can you accelerate your day to day research workflow? By giving AI the right scientific toolkit. We launched Science Skills for Google @Antigravity, integrating insights from over 30 major life science sources, including UniProt and the AlphaFold Database. Post
GoogleDeepMind	Gemini 3.5 Flash has landed. Post

Repeated From Recent Briefings

tinyhumansai/openhuman — Your Personal AI super intelligence. Private, Simple and extremely powerful. - first seen 2026-05-11
colbymchenry/codegraph — Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, and OpenCode — fewer tokens, fewer tool calls, 100% local - first seen 2026-05-09
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining - first seen 2026-05-15
Imbad0202/academic-research-skills — Academic Research Skills for Claude Code: research → write → review → revise → finalize - first seen 2026-05-13
Machine Learning on Spherical Manifold [R] - first seen 2026-05-20
The harmless prompt injection that leaked our system architecture - first seen 2026-05-20
rtk-ai/rtk — CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies - first seen 2026-05-05
rohitg00/agentmemory — #1 Persistent memory for AI coding agents based on real-world benchmarks - first seen 2026-05-09
HKUDS/CLI-Anything — "CLI-Anything: Making ALL Software Agent-Native" -- CLI-Hub:https://clianything.cc/ - first seen 2026-05-17
Alishahryar1/free-claude-code — Use claude-code for free in the terminal, VSCode extension or discord like OpenClaw (voice supported) - first seen 2026-05-20
... plus 545 more repeated items in processed data

AI Watchtower Briefing — 2026-05-21

🔴 High Significance

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Business & Funding

Other Signals

📈 Trending Repos

📄 New Papers

🐦 Twitter/X Highlights

Repeated From Recent Briefings