AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Waiting for Qwen 3.7 open weight... The new King has arrived... — score 90 Sources: reddit/r/LocalLLaMA

The hype is real! https://qwen.ai/blog?id=qwen3.7

🔴 Qwen3.6 35Ba3 has changed my workflows and even how I use my computer — score 77 Sources: reddit/r/LocalLLaMA

My workflow has changed basically to ask Codex to do certain tasks and then document how to do them (including errors it found on its way) into a skill. I feed that skill to pi, and suddenly my qwen3.6 gets that hard stuff done: - devops on a VPS - using docling to create epubs from old PDFs - us

Developer Tools

🔴 Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D] — score 81 Sources: reddit/r/MachineLearning

The research community has provided (already for some time) seemingly more efficient and effective tokenizations for vision. Do we have any hint on whether non-fixed-patches tokenization is being applied on the big player models? I imagine not, and I'm trying to think why: - marginal gains? - pipe

🔴 A.I. Agents: They’re Fun They’re Useful But Don’t Give Them the Credit Card — score 81 Sources: reddit/r/AIAgents

Infrastructure & Compute

🔴 When your LLM treats data center GPUs like an optional DLC — score 70 Sources: reddit/r/LocalLLaMA

Research Papers

🔴 TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation — score 82 Sources: huggingface · arxiv/cs.CL

Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency. We present TransitLM, a large-scale dataset of over 13 million transit route planning records from four Chines

🔴 LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning — score 72 Sources: huggingface · arxiv/cs.CL

Joint audio-visual reasoning is essential for omnimodal understanding, yet current multimodal large language models (MLLMs) still struggle when reasoning requires fine-grained evidence from both modalities. A central limitation is that explicit text-based chain-of-thought (CoT) compresses continuous

Other Signals

🔴 [AMA] Got laid off 3 weeks ago. Instead of updating my resume I went down a rabbit hole. Here's what I found — score 94 Sources: reddit/r/AIAgents

I've been a software engineer for a few years. Worked in bigtech, good salary, stable job, the whole thing. Then my position got cut and I had one of those forced moments of clarity that I think a lot of people in tech are having right now. I could update my LinkedIn, apply to 200 jobs, and land som

🔴 Multi-Stream LLMs: new paper on parallelizing/separating prompts, thinking, I/O — score 88 Sources: hackernews

🔴 110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp — score 83 Sources: reddit/r/LocalLLaMA

Had been getting great MTP performance with llama.cpp on my RTX 4070 Super 12GB, until they actually merged the MTP PR. Then, performance tanked and was bare

🟡 Notable

Model Releases

🟡 LatitudeGames/Equinox-31B · Hugging Face — score 63 Sources: reddit/r/LocalLLaMA

new model from LatitudeGames - Gemma 31B finetune https://huggingface.co/LatitudeGames/Equinox-31B-GGUF Equinox draws its name from the balance between extremes. Trained on a bal

🟡 Launch HN: Runtime (YC P26) – Sandboxed coding agents for everyone on a team — score 62 Sources: hackernews

🟡 For everyone that uses OpenCode / Pi - Heres your promptprocessing fix! — score 50 Sources: reddit/r/LocalLLaMA

This PR deserves much more attention as it fixes the constant promptprocessing that happens when using llama.cpp with Opencode or pi. https://github.com/ggml-org/llama.cpp/pull/22929

🟡 AdventHealth advances whole-person care with OpenAI — score 50 Sources: lab_blog/OpenAI

AdventHealth is using ChatGPT for Healthcare to streamline workflows, reduce administrative burden, and return more time to patient care.

🟡 We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks — score 50 Sources: lab_blog/DeepMind

Omitted 3 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

🟡 Novel Problems in VLA [R] — score 56 Sources: reddit/r/MachineLearning

I'm currently doing a research internship and my supervisor is constantly pushing me to have a novel idea, I've read about 15-20 papers about VLA and I think that most of the things are saturated, I thought about an equivariant VLA based on equivariant CNN which was published in 2016 and successfull

🟡 teng-lin/notebooklm-py — Unofficial Python API and agentic skill for Google NotebookLM. Full programmatic access to NotebookLM's features—including capabilities the web UI doesn't expose—via Python, CLI, and AI agents like Claude Code, Codex, and OpenClaw. — score 50 Sources: github_trending

Unofficial Python API and agentic skill for Google NotebookLM. Full programmatic access to NotebookLM's features—including capabilities the web UI doesn't expose—via Python, CLI, and AI agents like Claude Code, Codex, and OpenClaw.

🟡 In theory, if I have $20k-ish to spend on hardware what would actually get me closest to local coding agent that would allow me to go totally off the social grid? — score 43 Sources: reddit/r/LocalLLaMA

Let's say I'm in the market to buy a studio or RTX 6000's. At what point am I off the grid with a local coding agent? Probably a model question too.

Research Papers

🟡 Q-ARVD: Quantizing Autoregressive Video Diffusion Models — score 65 Sources: huggingface

Autoregressive video diffusion models (ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical

🟡 GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation — score 55 Sources: huggingface

Open-ended image generation is no longer a simple prompt-to-image problem. High-quality generation often requires an agent to combine a model's internal generative ability with external resources. As requests become more diverse and demanding, we aim to develop a general image-generation agent that

🟡 More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts — score 42 Sources: huggingface · arxiv/cs.CL

Detecting Schwartz values in political text is difficult because implicit cues often depend on surrounding arguments and fine-grained distinctions between neighboring values. We study when context and explicit moral knowledge help sentence-level value detection. Using the ValuesML/Touch{é} ValueEval

Other Signals

🟡 Heretic has been served a legal notice by Meta, Inc. — score 67 Sources: reddit/r/LocalLLaMA

To Whomsoever it May Concern, The individual behind the Heretic Free Software Project (henceforth called "Heretic", notwithstanding unrelated entities of the same name) has been served a notice by a legal services provider representing Meta Platforms, Inc. (henceforth called "Meta"), via the digital

🟡 We're Thursday and no one claimed AGI yet this week! — score 57 Sources: reddit/r/LocalLLaMA

U guys okay?

🟢 Incremental

Model Releases

🟢 Which agent should I use for coding and notetaking? — score 38 Sources: reddit/r/AIAgents

Hey everyone, I'm a software developer with a strong interest in psychology and photography. I’m currently subscribed to ChatGPT mainly because I use it for coding with agents, but I keep running into the limits of the $20 plan. Because of that, I’m considering other options, including GitHub Copilo

🟢 New Release of ROCm based MLX LLM Engine - lemon-mlx-engine — score 23 Sources: reddit/r/LocalLLaMA

Hey everyone lemon-mlx-engine just got done integrating TheRock / ROCm 7.13 into the lemon-mlx-engine which means you get to try the latest ROCm on your local hardware with the MLX engine! This also includes various bug fixes and kernel fixes we have been seeing in Qwen3, 3.5 and 3.6 MoE and dense.

🟢 Low-level coding dataset — score 10 Sources: reddit/r/LocalLLaMA

Hi all, I've recently been thinking about putting together a community sourced coding dataset for finetuning models, with a heavy focus on cpp and systems programming. My goal is to eventually have a model (say a finetune of Qwen3.6-27b) that is good at stuff like memory ownership, thread safety, op

🟢 ztok — a fast multithreaded tokenizer in Zig that loads tiktoken / HF / SentencePiece and is 2–5× faster — score 0 Sources: reddit/r/LocalLLaMA

I built ztok, a tokenizer library focused on being fast and format-agnostic for local pipelines. - Loads what you already have — .tiktoken, HF tokenizer.json, SentencePiece .model, TokenMonster, Mistral Tekken. Auto-detected. - Bit-identical to tiktoken / HF / SentencePiece on the equivalence gate

Developer Tools

🟢 If you've built an AI agent or chatbot - how do you know what users actually want from it? — score 38 Sources: reddit/r/AIAgents

Real question for anyone running an agent or chat product. When users just talk to your agent in natural language, you lose visibility into what they actually asked for, whether they got it, and what they kept wanting that your agent couldn't do. And when it quietly fails someone, there's no error

🟢 A customer asked why our agent believed something. We had no answer. — score 38 Sources: reddit/r/AIAgents

Prompt was clean. Model was fine. We went digging. No timestamp on the write. No source attached. No way to tell if it came from a user input, a tool call, a summarization pass, or something that got corrupted three updates ago. The memory layer had absorbed it at some point and that was the end of

🟢 Inline prompt-injection guards need to be fast enough for the agent hot path — score 38 Sources: reddit/r/AIAgents

I wrote up a benchmark note for Armorer Guard here: https://armorerlabs.com/blog/armorer-guard-inline-prompt-injection-defense The practical question I am trying to answer is not just "can we detect prompt injection?" It is: can a guard sit directly before an agent turns context into action without

🟢 Agent reliability is killing the one genius narrative — score 38 Sources: reddit/r/AIAgents

The line from Yao Shunyu's interview that stuck with me was not the spicy "AI does not need brains" bit. It was the claim that the individual hero era in AI is basically over. That sounds harsh, but it matches what the industry looks like now. Frontier model progress is less about one genius idea an

🟢 google-labs-code/stitch-skills — A library of Agent Skills designed to work with the Stitch MCP server. Each skill follows the Agent Skills open standard, for compatibility with coding agents such as Antigravity, Gemini CLI, Claude Code, Cursor. — score 31 Sources: github_trending

A library of Agent Skills designed to work with the Stitch MCP server. Each skill follows the Agent Skills open standard, for compatibility with coding agents such as Antigravity, Gemini CLI, Claude Code, Cursor.

Omitted 5 additional developer tools items from the main section; see raw data and source-specific sections below.

Business & Funding

🟢 One thing that's been bothering me lately: benchmark performance often tells me almost nothing about whether a workflow will survive production usage.[D] — score 31 Sources: reddit/r/MachineLearning

I've seen systems score well internally and then immediately fail under: * ambiguous user intent * messy real-world context * contradictory instructions * long-running sessions Feels like evaluation still heavily rewards clean-task optimization instead of behavioral robustness. What are people using

Enterprise Adoption

🟢 Nobody talks about what AI memory looks like after six months in production. — score 38 Sources: reddit/r/AIAgents

Old preferences keep winning retrieval, sarcastic comments get stored as literal truth, and summaries outlive the facts that made them true. You're not running a memory system at that point, you're babysitting one. Your AI context should not be a black box. It should be configurable, correctable, an

Research Papers

🟢 OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding — score 15 Sources: huggingface

Omni-proactive streaming video understanding, i.e., autonomously deciding when to speak and what to say from continuous audio-visual streams, is an emerging capability of omni-modal large language models. Existing benchmarks fall short in three key aspects: they rely primarily on visual signals, ado

Other Signals

🟢 Can liveness detection models generalise to synthetic media generation techniques they were never trained on? [D] — score 39 Sources: reddit/r/MachineLearning

Most liveness detection systems in production today were built around a threat model where the attacker is submitting a static image or a basic replay video. The generation quality of current synthetic media is categorically different from what those training datasets captured. The question I keep c

🟢 Latest b9274 Addresses MTP VRAM leak — score 37 Sources: reddit/r/LocalLLaMA

B9274 I have been having an issue with MTP models unloading after a couple minutes of use. Can't figure out why. Anyways z I don't think this is relevant to that but I did observe the vram creep so hopefully this helps. > server : free d

🟢 Anyone evaluated the difference between Qwen Code for the local qwen models vs another harness? CC, OC, LC, Aider etc.. — score 17 Sources: reddit/r/LocalLLaMA

For me, opencode doing fantastically but was wondering if qwen code would be more native and have better functionality, since idk which agentic harness they used to get their benchmark results

🟢 Lisbon Machine Learning School (LxMLS 2026) [D] — score 6 Sources: reddit/r/MachineLearning

Hi did anyone apply it, or attended it previously? How was the experience? I got the acceptance but no scholarship, is it worth going self sponsored?

Repo	Description	Stars Today	Language
teng-lin/notebooklm-py	Unofficial Python API and agentic skill for Google NotebookLM. Full programmatic access to NotebookLM's features—including capabilities the web UI doesn't expose—via Python, CLI, and AI agents like Claude Code, Codex, and OpenClaw.	186	python
google-labs-code/stitch-skills	A library of Agent Skills designed to work with the Stitch MCP server. Each skill follows the Agent Skills open standard, for compatibility with coding agents such as Antigravity, Gemini CLI, Claude Code, Cursor.	69	typescript
software-mansion/argent	An agentic toolkit to control, debug, and profile iOS and Android apps. Made by Software Mansion.	67	typescript
ryoppippi/ccusage	Analyze coding (agent) CLI token usage and costs from local data.	58	rust
google/adk-samples	A collection of sample agents built with Agent Development Kit (ADK)	33	python

📄 New Papers

Title	Category	Hotness	Link
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation	research_paper	106	Open
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning	research_paper	30	Open
Q-ARVD: Quantizing Autoregressive Video Diffusion Models	research_paper	14	Open
GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation	research_paper	10	Open
Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration	cs.AI	0	Open
OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind	cs.AI	0	Open
AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows	cs.AI	0	Open
High Quality Embeddings for Horn Logic Reasoning	cs.AI	0	Open
Open-World Evaluations for Measuring Frontier AI Capabilities	cs.AI	0	Open
Personality Engineering with AI Agents: A New Methodology for Negotiation Research	cs.AI	0	Open
From Automated to Autonomous: Hierarchical Agent-native Network Architecture (HANA)	cs.AI	0	Open
COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space	cs.AI	0	Open
Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines	cs.AI	0	Open
Declarative Data Services: Structured Agentic Discovery for Composing Data Systems	cs.AI	0	Open
VBFDD-Agent for Electric Vehicle Battery Fault Detection and Diagnosis: Descriptive Text Modeling of Battery Digital Signals	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: AdventHealth advances whole-person care with OpenAI
DeepMind: We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks

🐦 Twitter/X Highlights

Account	Tweet Summary
OpenAI	Highlights from today’s Codex Thursday launches: 1️⃣ Codex can now securely use apps on your Mac from your phone, even when your Mac is locked and the screen is off. http://developers.openai.com/codex/app/computer-use#locked-use Post
xai	You can now use your @grok or X Premium subscription in @opencode. Use the model powering Grok Build for high speed and codebase intelligence. https://x.ai/news/grok-opencode Post
simonw	I released the first alpha of Datasette Agent - a conversational AI assistant for Datasette that can answer questions about data in SQLite databases, and can be extended with plugins to add extra tools and features Here's a demo Post

Repeated From Recent Briefings

colbymchenry/codegraph — Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, and OpenCode — fewer tokens, fewer tool calls, 100% local - first seen 2026-05-09
Imbad0202/academic-research-skills — Academic Research Skills for Claude Code: research → write → review → revise → finalize - first seen 2026-05-13
OpenAI claims a general-purpose reasoning model found a counterexample to Erdos's unit-distance bound [D] - first seen 2026-05-21
NousResearch/hermes-agent — The agent that grows with you - first seen 2026-05-11
tinyhumansai/openhuman — Your Personal AI super intelligence. Private, Simple and extremely powerful. - first seen 2026-05-11
rohitg00/ai-engineering-from-scratch — Learn it. Build it. Ship it for others. - first seen 2026-05-21
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps - first seen 2026-05-19
anthropics/claude-plugins-official — Official, Anthropic-managed directory of high quality Claude Code Plugins. - first seen 2026-05-09
Lum1104/Understand-Anything — Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. - first seen 2026-05-21
HKUDS/CLI-Anything — "CLI-Anything: Making ALL Software Agent-Native" -- CLI-Hub:https://clianything.cc/ - first seen 2026-05-17
... plus 580 more repeated items in processed data

AI Watchtower Briefing — 2026-05-22

🔴 High Significance

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Business & Funding

Enterprise Adoption

Research Papers

Other Signals

📈 Trending Repos

📄 New Papers

🏢 Lab Blog Posts

🐦 Twitter/X Highlights

Repeated From Recent Briefings