AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 White House Considers Vetting A.I. Models Before They Are Released — score 81 Sources: reddit/r/LocalLLaMA

🔴 Peanut - Text to Image Model (Open Weights coming soon) — score 73 Sources: reddit/r/LocalLLaMA

A new anonymous model debuts at #8 in the Artificial Analysis Text to Image Arena! Peanut’s weights are expected to be released soon, which would make it the leading Text to Image Open Weights Model.

Peanut is positioned to be the new leading open weights Text to Image model, surpassing Z-I

Developer Tools

🔴 ruvnet/ruflo — 🌊 The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, self-learning swarm intelligence, RAG integration, and native Claude Code / Codex Integration — score 99 Sources: github_trending

🌊 The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, self-learning swarm intelligence, RAG integration, and native Claude Code / Codex Integration

🔴 TauricResearch/TradingAgents — TradingAgents: Multi-Agents LLM Financial Trading Framework — score 97 Sources: github_trending

TradingAgents: Multi-Agents LLM Financial Trading Framework

🔴 Hmbown/DeepSeek-TUI — Coding agent for DeepSeek models that runs in your terminal — score 95 Sources: github_trending

Coding agent for DeepSeek models that runs in your terminal

🔴 Are modern ML PhDs becoming too incremental, or is this just what research looks like now? [D] — score 94 Sources: reddit/r/MachineLearning

I’ve been thinking about the current state of machine learning PhDs, including my own work, and I’d like to hear how others see it.
My impression is that a large fraction of modern ML PhD work follows a fairly predictable pattern: take an existing idea, connect it to another existing idea, apply i

🔴 My nerd lil brother surpassed me in his monthly income last month! — score 94 Sources: reddit/r/AIAgents

He's kinda a nerd yeah. I was teaching him some basics in AI since he turned 16 but fast forward he found out he can make money off this. I replied "Im sure you do", jokingly because I didn't think he actually is able to find some clients. Hes now 17 and thanks to an automation he built on n8n, he g

Omitted 12 additional developer tools items from the main section; see raw data and source-specific sections below.

Research Papers

🔴 MolmoAct2: Action Reasoning Models for Real-world Deployment — score 95 Sources: huggingface

Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today's systems fall short on the criteria that matter for real-world deployment. Frontier models are closed, open-weight alternatives are tied to expensive hardware, reasoning-augmented policies pay pr

🔴 Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs — score 78 Sources: huggingface · arxiv/cs.AI

While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with gene

🔴 Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling — score 72 Sources: huggingface · arxiv/cs.AI

Recent research has shown that filtering massive English web corpora into high-quality subsets significantly improves training efficiency. However, for high-resource non-English languages like German, French, or Japanese, aggressive filtering creates a strategic dilemma: should practitioners priorit

Other Signals

🔴 it's time to update your Gemma 4 GGUFs — score 88 Sources: reddit/r/LocalLLaMA

Chat Template was fixed a few days ago

choose your fav dealer:

https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF

[https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF](https://huggingface.co/bartowski/google_gem

🔴 How OpenAI delivers low-latency voice AI at scale — score 88 Sources: hackernews

🟡 Notable

Model Releases

🟡 FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8 — score 65 Sources: reddit/r/LocalLLaMA

Last year researchers affiliated with NVIDIA, University of Warsaw, and University of Edinburgh published Dynamic Memory Sparsification (DMS), a KV-cache sparsification technique using learned per-head token eviction, reporting up to 8x KV-cache compression.

I fo

🟡 Claude Code structure that didn’t break after 2–3 real projects — score 61 Sources: reddit/r/AIAgents

Been iterating on my Claude Code setup for a while. Most examples online worked… until things got slightly complex. This is the first structure that held up once I added multiple skills, MCP servers, and agents.

What actually made a difference:

**If you’re skipping CLAUDE MD, that’s probably the

🟡 **[@xai: Two voices. One human. One AI. Can you guess the AI clone? 👇

Voice cloning, rich with natural emotion, is now live on the Grok Voice API.

http://x.ai/news/grok-custom-voices](https://x.com/xai/status/2051438210065322244)** — score 60 Sources: twitter_rss

Two voices. One human. One AI. Can you guess the AI clone? 👇

Voice cloning, rich with natural emotion, is now live on the Grok Voice API.

http://x.ai/news/grok-custom-voices

🟡 **[@xai: Voice Cloning is now live via the xAI API!

Create a custom voice in less than 2 minutes or select from our library of 80+ voices across 28 languages to personalize your voice agents, audiobooks, vide](https://x.com/xai/status/2050355373052223585)** — score 60 Sources: twitter_rss

Voice Cloning is now live via the xAI API!

Create a custom voice in less than 2 minutes or select from our library of 80+ voices across 28 languages to personalize your voice agents, audiobooks, video game characters, and more.

http://x.ai/news/grok-custom-voices

🟡 **[@MistralAI: 🆕 Today, we're releasing the public preview of Workflows, the orchestration layer for enterprise AI.

🌎 Enterprise teams have capable models. What they don't have is a way to run them reliably in prod](https://x.com/MistralAI/status/2049128071874179091)** — score 60 Sources: twitter_rss

🆕 Today, we're releasing the public preview of Workflows, the orchestration layer for enterprise AI.

🌎 Enterprise teams have capable models. What they don't have is a way to run them reliably in production. That's the gap Workflows fills. It takes AI-powered business processes from prototype to pro

Omitted 2 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

🟡 LearningCircuit/local-deep-research — ~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.cpp, Ollama, Google, ...). 10+ search engines - arXiv, PubMed, your private documents. Everything Local & Encrypted. — score 68 Sources: github_trending

~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.cpp, Ollama, Google, ...). 10+ search engines - arXiv, PubMed, your private documents. Everything Local & Encrypted.

🟡 cocoindex-io/cocoindex — Incremental engine for long horizon agents 🌟 Star if you like it! — score 66 Sources: github_trending

Incremental engine for long horizon agents 🌟 Star if you like it!

🟡 Agent Skills — score 62 Sources: hackernews

🟡 mnfst/manifest — Smart Model Routing for Agents. Cut Costs up to 70% 🦚 — score 60 Sources: github_trending

Smart Model Routing for Agents. Cut Costs up to 70% 🦚

🟡 How do you experiment with a (very) large model architecture? [D] — score 56 Sources: reddit/r/MachineLearning

Im trying to reproduce a paper (a very particular kind of diffusion model), and their training regime is incredibly compute heavy.

In general, how are quick experiments performed to validate hypotheses when the models are large and compute is expensive?

Some cursory browsing yields the following:

Omitted 6 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟡 Lightricks/LTX-2 — Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model. — score 53 Sources: github_trending

Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.

Research Papers

🟡 OceanPile: A Large-Scale Multimodal Ocean Corpus for Foundation Models — score 68 Sources: huggingface · arxiv/cs.CL

The vast and underexplored ocean plays a critical role in regulating global climate and supporting marine biodiversity, yet artificial intelligence has so far delivered limited impact in this domain due to a fundamental data bottleneck. Specifically, ocean data are highly fragmented across disparate

🟡 Graph Rewiring in GNNs to Mitigate Over-Squashing and Over-Smoothing: A Survey — score 60 Sources: arxiv/cs.AI · arxiv/cs.LG

arXiv:2411.17429v2 Announce Type: replace-cross Abstract: Graph Neural Networks are powerful models for learning from graph-structured data, yet their effectiveness is often limited by two critical challenges: over-squashing, where information from distant nodes is excessively compressed, and over-

🟡 Representation in large language models — score 60 Sources: arxiv/cs.AI · arxiv/cs.LG

arXiv:2501.00885v2 Announce Type: replace-cross Abstract: The extraordinary success of recent Large Language Models (LLMs) on a diverse array of tasks has led to an explosion of scientific and philosophical theorizing aimed at explaining how they do what they do. Unfortunately, disagreement over fu

🟡 AcademiClaw: When Students Set Challenges for AI Agents — score 55 Sources: huggingface

Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of OpenClaw largely unexamined. We introduce AcademiClaw, a bilingual benchmark of 80 complex, long-horizon tasks sourced directly from university students' rea

🟡 Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation — score 48 Sources: huggingface · arxiv/cs.AI

Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challeng

Other Signals

🟡 **[@sama: pretty excited for voice models to get great

its interesting to watch how people are already starting to change the way they interface with AI](https://x.com/sama/status/2051464865634742334)** — score 50 Sources: twitter_rss

pretty excited for voice models to get great

its interesting to watch how people are already starting to change the way they interface with AI

🟡 DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17× cheaper — score 42 Sources: reddit/r/LocalLLaMA

Tested DeepSeek V4 Pro on FoodTruck Bench — our 30-day agentic benchmark where models run a food truck via 34 tools (locations, pricing, inventory, staff, weather, events) with persistent memory and daily reflection.

First Chinese model to land in the frontier tier on our benchmark. Tied with Grok

🟢 Incremental

Model Releases

🟢 As MTP prepares to land in llama.cpp, Models that support MTP — score 19 Sources: reddit/r/LocalLLaMA

DeepSeekv3 OG

DeepSeekv3.2/4

Qwen3.5

GLM4.5+

MiniMax2.5+

Step3.5Flash

Mimo v2+

Until we get mtp weights, you need to download HF weights and convert to gguf. I think I'm going to try either qwen3.5-122b or glm4.5-air first.

Developer Tools

🟢 openclaw/acpx — Headless CLI client for stateful Agent Client Protocol (ACP) sessions — score 36 Sources: github_trending

Headless CLI client for stateful Agent Client Protocol (ACP) sessions

🟢 danielmiessler/Personal_AI_Infrastructure — Agentic AI Infrastructure for magnifying HUMAN capabilities. — score 29 Sources: github_trending

Agentic AI Infrastructure for magnifying HUMAN capabilities.

🟢 vibevoice.cpp: Microsoft VibeVoice (TTS + long-form ASR with diarization) ported to ggml/C++, runs on CPU/CUDA/Metal/Vulkan, no Python at inference — score 27 Sources: reddit/r/LocalLLaMA

A few weeks ago I shipped vibevoice.cpp, a pure-C++ ggml port of Microsoft
VibeVoice (the speech-to-speech model with voice cloning, https://github.com/microsoft/VibeVoice). Wanted to post a follow-up here because we're at a point where the engine has gro

🟢 xingkongliang/skills-manager — A lightweight desktop app to manage, sync, and organize AI agent skills across 15+ coding tools — Cursor, Claude Code, Codex, Copilot, and more. — score 27 Sources: github_trending

A lightweight desktop app to manage, sync, and organize AI agent skills across 15+ coding tools — Cursor, Claude Code, Codex, Copilot, and more.

🟢 My electric bill doubled running local models — score 17 Sources: reddit/r/AIAgents

Been running my side project's social presence using Mangos.ai - it's a no code agent builder for product distribution.

I run these agents 24/7 and they do a phenomenal job. They have access to my browser with all my socials logged in. Every few hours they go into X, Threads,

Omitted 6 additional developer tools items from the main section; see raw data and source-specific sections below.

Research Papers

🟢 Code World Model Preparedness Report — score 25 Sources: huggingface

This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducted pre-release testing across domains identified in our Frontier AI Framework as potentially presenting catastrophic risks, and also evaluated the mo

🟢 Motion-Aware Caching for Efficient Autoregressive Video Generation — score 25 Sources: huggingface

Autoregressive video generation paradigms offer theoretical promise for long video synthesis, yet their practical deployment is hindered by the computational burden of sequential iterative denoising. While cache reuse strategies can accelerate generation by skipping redundant denoising steps, existi

🟢 Generative Modeling with Orbit-Space Particle Flow Matching — score 25 Sources: huggingface

We present Orbit-Space Geometric Probability Paths (OGPP), a particle-native flow-matching framework for generative modeling of particle systems. OGPP is motivated by two insights: (i) particles are defined up to permutation symmetries, so anonymous indexing inflates per-index target variance and yi

🟢 Perceptual Flow Network for Visually Grounded Reasoning — score 25 Sources: huggingface

Despite the success of Large-Vision Language Models (LVLMs), general optimization objectives (e.g., standard MLE) fail to constrain visual trajectories, leading to language bias and hallucination. To mitigate this, current methods introduce geometric priors from visual experts as additional supervis

Other Signals

🟢 Is there a notable increase in demand for privacy-preserving AI/ML with the advent of LLMs? [D] — score 39 Sources: reddit/r/MachineLearning

While browsing through this subreddit, I encountered this old discussion post about demand for AI with the rise of privacy regulation. It got me thinking that, 6 years on, the demand for AI

🟢 Train Your Own LLM from Scratch — score 38 Sources: hackernews

🟢 MTPLX | 2.24x faster TPS | The native MTP inference engine for Apple Silicon — score 35 Sources: reddit/r/LocalLLaMA

TLDR: 28 tok/s → 63 tok/s on Qwen3.6-27B on a MacBook Pro M5 Max. 2.24× faster at real temperature 0.6.

Works for coding, creative writing, and chat

https://i.redd.it/i9x794c0q7zg1.gif

Works on ANY MTP model: No external drafter. No extra memory usage. Uses the model's own built-in MTP he

🟢 [D] What Happened to Neurips Creative AI Track? [R] — score 31 Sources: reddit/r/MachineLearning

At Neurips 2025, the Creative AI Track was announced as part of the official proceedings:

https://neurips.cc/Conferences/2025/CallForCreativeAI

"Please note that this year the Creative AI track will be part of the NeurIPS conference

🟢 Building a 9-ball AI player: Candidate generation for direct cut shots [P] — score 24 Sources: reddit/r/MachineLearning

I'm building a 9-ball-player to help with pattern play. There are many ways to make the next ball, and sometimes in more than one obvious pocket. Which should should you choose depends on probability of making that shot AND ending up in a favorable spot for the next shot, that is also amenable to

Omitted 4 additional other signals items from the main section; see raw data and source-specific sections below.

Repo	Description	Stars Today	Language
ruvnet/ruflo	🌊 The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, self-learning swarm intelligence, RAG integration, and native Claude Code / Codex Integration	2598	typescript
TauricResearch/TradingAgents	TradingAgents: Multi-Agents LLM Financial Trading Framework	2182	python
Hmbown/DeepSeek-TUI	Coding agent for DeepSeek models that runs in your terminal	1274	rust
AIDC-AI/Pixelle-Video	🚀 AI 全自动短视频引擎 \| AI Fully Automated Short Video Engine	1153	python
rtk-ai/rtk	CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies	732	rust
1jehuang/jcode	Coding Agent Harness	548	rust
czlonkowski/n8n-mcp	A MCP for Claude Desktop / Claude Code / Windsurf / Cursor to build n8n workflows for you	496	typescript
virattt/dexter	An autonomous agent for deep financial research	409	typescript
mksglu/context-mode	Context window optimization for AI coding agents. Sandboxes tool output, 98% reduction. 14 platforms	306	typescript
withastro/flue	The sandbox agent framework.	290	typescript

📄 New Papers

Title	Category	Hotness	Link
MolmoAct2: Action Reasoning Models for Real-world Deployment	research_paper	65	Open
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs	research_paper	12	Open
Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling	research_paper	11	Open
OceanPile: A Large-Scale Multimodal Ocean Corpus for Foundation Models	research_paper	9	Open
Graph Rewiring in GNNs to Mitigate Over-Squashing and Over-Smoothing: A Survey	cs.AI	0	Open
Representation in large language models	cs.AI	0	Open
AcademiClaw: When Students Set Challenges for AI Agents	research_paper	3	Open
TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data	cs.AI	0	Open
AgentReputation: A Decentralized Agentic AI Reputation Framework	cs.AI	0	Open
Minimal, Local, Causal Explanations for Jailbreak Success in Large Language Models	cs.AI	0	Open
Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents	cs.AI	0	Open
TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization	cs.AI	0	Open
ARMOR 2025: A Military-Aligned Benchmark for Evaluating Large Language Model Safety Beyond Civilian Contexts	cs.AI	0	Open
Causal Foundations of Collective Agency	cs.AI	0	Open
Agentic AI for Trip Planning Optimization Application	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: OpenAI and PwC collaborate to reimagine the office of the CFO

🐦 Twitter/X Highlights

Account	Tweet Summary
xai	Two voices. One human. One AI. Can you guess the AI clone? 👇 Voice cloning, rich with natural emotion, is now live on the Grok Voice API. http://x.ai/news/grok-custom-voices Post
xai	Voice Cloning is now live via the xAI API! Create a custom voice in less than 2 minutes or select from our library of 80+ voices across 28 languages to personalize your voice agents, audiobooks, video game characters, and more. http://x.ai/news/grok-custom-voices Post
MistralAI	🆕 Today, we're releasing the public preview of Workflows, the orchestration layer for enterprise AI. 🌎 Enterprise teams have capable models. What they don't have is a way to run them reliably in production. That's the gap Workflows fills. It takes AI-powered business processes from prototype to production, with the durability, observability, and fault tolerance that production actually requires. Leading organisations like ASML, ABANCA, CMA-CGM, France Travail, La Banque Postale, Moeve, and many others are already using Workflows to automate critical processes. Post
MistralAI	Mistral AI made the TIME100 Most Influential Companies list for 2026 — and the top 10 for AI. Why we're proud: customers run frontier models in production on their own terms, on their own infrastructure. Thank you to our customers for their trust and for joining us on the journey. Grateful to our incredible team members around the world and congrats to all the businesses recognized this year. Learn more at: https://time.com/collection/time100-most-influential-companies/2026/mistral/ #TIME100Companies #TIME100CompaniesIndustryLeader Post
karpathy	Fireside chat at Sequoia Ascent 2026 from a ~week ago. Some highlights: The first theme I tried to push on is that LLMs are about a lot more than just speeding up what existed before (e.g. coding). Three examples of new horizons: 1. menugen: an app that can be fully engulfed by LLMs, with no classical code needed: input an image, output an image and an LLM can natively do the thing. 2. install .md skills instead of install .sh scripts. Why create a complex Software 1.0 bash script for e.g. installing a piece of software if you can write the installation out in words and say "just show this to your LLM". The LLM is an advanced interpreter of English and can intelligently target installation to your setup, debug everything inline, etc. 3. LLM knowledge bases as an example of something that was impossible with classical code because it's computation over unstructured data (knowledge) from arbitrary sources and in arbitrary formats, including simply text articles etc. I pushed on these because in every new paradigm change, the obvious things are always in the realm of speeding up or somehow improving what existed, but here we have examples of functionality that either suddenly perhaps shouldn't even exist (1,2), or was fundamentally not possible before (3). The second (ongoing) theme is trying to explain the pattern of jaggedness in LLMs. How it can be true that a single artifact will simultaneously 1) coherently refactor a 100,000-line code base and 2) tell you to walk to the car wash to wash your car. I previously wrote about the source of this as having to do with verifiability of a domain, here I expand on this as having to also do with economics because revenue/TAM dictates what the frontier labs choose to package into training data distributions during RL. You're either in the data distribution (on the rails of the RL circuits) and flying or you're off-roading in the jungle with a machete, in relative terms. Still not 100% satisfied with this, but it's an ongoing struggle to build an accurate model of LLM capabilities if you wish to practically take advantage of their power while avoiding their pitfalls, which brings me to... Last theme is the agent-native economy. The decomposition of products and services into sensors, actuators and logic (split up across all of 1.0/2.0/3.0 computing paradigms), how we can make information maximally legible to LLMs, some words on the quickly emerging agentic engineering and its skill set, related hiring practices, etc., possibly even hints/dreams of fully neural computing handling the vast majority of computation with some help from (classical) CPU coprocessors. Post
sama	pretty excited for voice models to get great its interesting to watch how people are already starting to change the way they interface with AI Post

AI Watchtower Briefing — 2026-05-05

🔴 High Significance

Model Releases

Developer Tools

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Research Papers

Other Signals

TLDR: 28 tok/s → 63 tok/s on Qwen3.6-27B on a MacBook Pro M5 Max. 2.24× faster at real temperature 0.6.

📈 Trending Repos

📄 New Papers

🏢 Lab Blog Posts

🐦 Twitter/X Highlights