AW · AI Watchtower

🔴 High Significance

Developer Tools

🔴 I think a lot of people are accidentally building systems they can never debug — score 94 Sources: reddit/r/AIAgents

Something I’ve noticed after working on more complex agent workflows:

everything feels manageable at first

one agent
a couple tools
some logging
works fine

then slowly:

retries get added
memory gets added
more tools get connected
browser automation gets involved
agents start call

🔴 Vibe coding and agentic engineering are getting closer than I'd like — score 93 Sources: hackernews

🔴 I installed nano claw and removed it after an hour after he texted my friends without my permission — score 83 Sources: reddit/r/AIAgents

I tried one of these agent-style products from a large company. I won’t mention names. It’s not exactly a direct competitor of my company, but it operates in a similar space — and the experience was honestly shocking from a security standpoint.

I added it to my WhatsApp. The interaction already fel

🔴 anthropics/financial-services — score 82 Sources: github_trending

🔴 None of this will ever get stolen — score 81 Sources: reddit/r/LocalLLaMA

It's crazy that they're thinking of doing this. There are problems with people stealing catalytic converters off people's cars and now they want to put a rack outside your house!?

Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🔴 Bad news: Apple drops high-memory Mac Studio configs — score 88 Sources: reddit/r/LocalLLaMA

Looks like Apple has quietly killed off the higher-memory Mac Studio options. The M3 Ultra Mac Studio is now only available with 96GB RAM. The 512GB option was already removed back in March, and now the 256GB config is gone too.

Apple has said both the Mac Studio and Mac mini will stay supply-const

🔴 ZAYA1-8B: Frontier intelligence density, trained on AMD — score 73 Sources: reddit/r/LocalLLaMA

Research Papers

🔴 Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation — score 95 Sources: huggingface

Distillation-based acceleration has become foundational for making autoregressive streaming video diffusion models practical, with distribution matching distillation (DMD) as the de facto choice. Existing methods, however, train the student to match the teacher's output indiscriminately, treating ev

🔴 Stream-T1: Test-Time Scaling for Streaming Video Generation — score 85 Sources: huggingface

While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration costs and lack temporal guidance. To address these structura

🔴 OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents — score 75 Sources: huggingface

Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the a

Other Signals

🔴 Stop letting LLMs edit your .bib [D] — score 94 Sources: reddit/r/MachineLearning

It’s shocking how frequently I notice hallucinated citations. For citations of my own papers, I’ve seen 5 in the past couple of months, where the the title is correct but the author list is wrong. When I email the author to let them know, they always blame an LLM for hallucinating.

Is it really th

🔴 Weights & Biases New Master Service Agreement Questions [D] — score 81 Sources: reddit/r/MachineLearning

**Update: my questions have been escalated to their teams. I'll share their answers (& hopefully reassurance) here.**

Weights & Biases sent an email yesterday, saying their new Master Service Agreement takes effect May 11th. I use & love wandb, but I'm concerned about the changes. I

🟡 Notable

Model Releases

🟡 Qwen3.6 27B uncensored heretic v2 Native MTP Preserved is Out Now With KLD 0.0021, 6/100 Refusals and the Full 15 MTPs Preserved and Retained, Available in Safetensors, GGUFs and NVFP4s formats. — score 65 Sources: reddit/r/LocalLLaMA

llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved: https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved

llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Pre

🟡 **[@OpenAI: Introducing the ChatGPT Futures Class of 2026—26 honorees from the first graduating class to have had ChatGPT throughout all four years of university, who used AI to:

Map 1.5M previously unknown ob](https://x.com/OpenAI/status/2052086313797705954)** — score 60 Sources: twitter_rss

Introducing the ChatGPT Futures Class of 2026—26 honorees from the first graduating class to have had ChatGPT throughout all four years of university, who used AI to:

Map 1.5M previously unknown objects in space
Detect disaster survivors through walls and debris
Make 100M+ galaxy images sea

🟡 **[@xai: Image Generation Quality Mode is now available on the xAI API.

This model has already powered the generation of over 300 million images on Grok.

It brings higher realism, stronger text rendering, ](https://x.com/xai/status/2052193877675983031)** — score 60 Sources: twitter_rss

Image Generation Quality Mode is now available on the xAI API.

This model has already powered the generation of over 300 million images on Grok.

It brings higher realism, stronger text rendering, and better creative control for business professionals.

https://x.ai/news/grok-imagine-quality-mode

🟡 What Anthropic's 'Dreaming' feature release made me notice about my own ClawVault agent's memory — score 50 Sources: reddit/r/AIAgents

Last Tuesday Anthropic released Dreams for Claude Managed Agents. It's a memory cleanup pipeline: feed it a memory store and up to 100 session transcripts, get back a new store with duplicates merged and stale entries replaced. The same we

🟡 @xai: SpaceXAI will provide @AnthropicAI with access to Colossus 1, one of the world’s largest and fastest-deployed AI supercomputers, to provide additional capacity for Claude → http://x.ai/news/anthropic- — score 50 Sources: twitter_rss

SpaceXAI will provide @AnthropicAI with access to Colossus 1, one of the world’s largest and fastest-deployed AI supercomputers, to provide additional capacity for Claude → http://x.ai/news/anthropic-compute-partnership

Omitted 1 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

🟡 Show HN: Tilde.run – Agent sandbox with a transactional, versioned filesystem — score 64 Sources: hackernews

🟡 Shubhamsaboo/awesome-llm-apps — 100+ AI Agent & RAG apps you can actually run — clone, customize, ship. — score 62 Sources: github_trending

100+ AI Agent & RAG apps you can actually run — clone, customize, ship.

🟡 NeurIPS 2026 AC-Pilot, how much would you trust this? [D] — score 56 Sources: reddit/r/MachineLearning

I wonder how this AC-Pilot thing works for NeurIPS 2026.

The guidelines say that "What you are communicating is that the authors do not need to worry about concerns you have not listed, and that there is a real opportunity for acceptance if listed concerns are sufficiently addressed."

However

🟡 onyx-dot-app/onyx — Open Source AI Platform - AI Chat with advanced features that works with every LLM — score 56 Sources: github_trending

Open Source AI Platform - AI Chat with advanced features that works with every LLM

🟡 hsliuping/TradingAgents-CN — 基于多智能体LLM的中文金融交易框架 - TradingAgents中文增强版 — score 56 Sources: github_trending

基于多智能体LLM的中文金融交易框架 - TradingAgents中文增强版

Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟡 InsForge/InsForge — InsForge is a Postgres-based backend with auth, storage, compute, hosting, and AI gateway. Built for coding agents. — score 68 Sources: github_trending

InsForge is a Postgres-based backend with auth, storage, compute, hosting, and AI gateway. Built for coding agents.

🟡 **[@OpenAI: AI supercomputers need a new kind of network to stay in sync at massive scale.

OpenAI’s @markjhandley and @poyntingatgreg join @AndrewMayne to discuss what it takes to move data across record numbers](https://x.com/OpenAI/status/2052039800384057348)** — score 50 Sources: twitter_rss

AI supercomputers need a new kind of network to stay in sync at massive scale.

OpenAI’s @markjhandley and @poyntingatgreg join @AndrewMayne to discuss what it takes to move data across record numbers of chips reliably and efficiently, the new Multipath Reliable Connection (MRC) networking protocol,

Research Papers

🟡 Lightning Unified Video Editing via In-Context Sparse Attention — score 65 Sources: huggingface

Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse framework tailored for ICL video editing.

🟡 Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation — score 60 Sources: huggingface · arxiv/cs.AI

We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing. JoyAI-Image couples a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), allowing perception

🟡 When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning — score 40 Sources: huggingface · arxiv/cs.CL

In single-stream autoregressive interfaces, the same tokens both update the model state and constitute an irreversible public commitment. This coupling creates a silence tax: additional deliberation postpones the first task-relevant content, while naive early streaming risks premature commitments th

Other Signals

🟡 META Superintelligence Lab Presents: ProgramBench: Can SOTA AI Recreate Real Executable Programs(ffmpeg, SQLite, ripgrep) From Scratch Without The Internet? — score 69 Sources: reddit/r/MachineLearning

🟡 Analysis of the 100 most popular hardware setups on Hugging Face — score 58 Sources: reddit/r/LocalLLaMA

https://x.com/ClementDelangue/status/2052020105328890188

🟡 Get faster qwen 3.6 27b — score 50 Sources: reddit/r/LocalLLaMA

Using 100k context with 3090 with MTP GGUF and getting 50 t/s on llama.cpp Thought I would knowledge share Use https://huggingface.co/RDson/Qwen3.6-27B-MTP-Q4_K_M-GGUF And am17an commit - https://github.com/ggml-org/llama.cpp/pull/22673 How to apply - Steps `bash cd path/to/llama.cpp git fetch origi

🟡 Learning the Integral of a Diffusion Model — score 50 Sources: hackernews

🟢 Incremental

Model Releases

🟢 why llama.cpp can’t combine speculative decode methods? — score 19 Sources: reddit/r/LocalLLaMA

dicking around with the new mtp speculative decode with qwen3.6 27b, and it’s great. but for agentic coding i’ve seen significant improvements from ngram, because a decent fraction of the time (e.g. calling edit tool) the model is just repeating verbatim a section of code that it has already seen be

🟢 Qwen 3.6? — score 4 Sources: reddit/r/LocalLLaMA

Qwen/Qwen3.6-35B-A3B was released 22 days ago

Qwen/Qwen3.6-27B was released 15 days ago

Let's predict when we can expect the 9B and 122B versions

Developer Tools

🟢 Exploring Black‑Box Optimization [R] — score 38 Sources: reddit/r/MachineLearning

Hey everyone!

I’d like to share a personal project that’s still in its early stages, focused on black‑box optimization algorithms.

I’m open to feedback, suggestions, or any questions you might have.

You can check the full overview here:

https://github.com/misa-hdez/sgo-lab/blob/main/docs/project

🟢 googleworkspace/cli — Google Workspace CLI — one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more. Dynamically built from Google Discovery Service. Includes AI agent skills. — score 36 Sources: github_trending

Google Workspace CLI — one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more. Dynamically built from Google Discovery Service. Includes AI agent skills.

🟢 ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference — score 35 Sources: reddit/r/LocalLLaMA

https://z-lab.ai/projects/paroquant/

https://github.com/z-lab/paroquant

https://huggingface.co/collections/z-lab/paroquant

🟢 BoundaryML/baml — The AI framework that adds the engineering to prompt engineering (Python/TS/Ruby/Java/C#/Rust/Go compatible) — score 34 Sources: github_trending

The AI framework that adds the engineering to prompt engineering (Python/TS/Ruby/Java/C#/Rust/Go compatible)

🟢 BigBodyCobain/Shadowbroker — Open-source intelligence for the global theater. Track everything from the corporate/private jets of the wealthy, and spy satellites, to seismic events in one unified interface. Hook an AI agent up to have it parse through data and find previously unseen correlations. The knowledge is available to all but rarely aggregated in the open, until now. — score 32 Sources: github_trending

Open-source intelligence for the global theater. Track everything from the corporate/private jets of the wealthy, and spy satellites, to seismic events in one unified interface. Hook an AI agent up to have it parse through data and find previously unseen correlations. The knowledge is available to a

Omitted 9 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟢 Dataset of 150k+ stool images and not sure how to fully use it [D] — score 38 Sources: reddit/r/MachineLearning

I have a dataset of around 150k stool images; growing at 300+ images per day, and I’m trying to better understand the “right” way to use it for training a computer vision model.

Right now, our process is pretty manual. We initially trained on about 5k images that were individually verified by a hum

🟢 pytorch/pytorch — Tensors and Dynamic neural networks in Python with strong GPU acceleration — score 38 Sources: github_trending

Tensors and Dynamic neural networks in Python with strong GPU acceleration

🟢 Making LLM Training Faster with Unsloth and NVIDIA — score 21 Sources: hackernews

Research Papers

🟢 Parameter-Efficient Multi-View Proficiency Estimation: From Discriminative Classification to Generative Feedback — score 30 Sources: huggingface

Estimating how well a person performs an action, rather than which action is performed, is central to coaching, rehabilitation, and talent identification. This task is challenging because proficiency is encoded in subtle differences in timing, balance, body mechanics, and execution, often distribute

🟢 Diffusion Model as a Generalist Segmentation Learner — score 10 Sources: huggingface

Diffusion models are primarily trained for image synthesis, yet their denoising trajectories encode rich, spatially aligned visual priors. In this paper, we demonstrate that these priors can be utilized for text-conditioned semantic and open-vocabulary segmentation, and this approach can be generali

Other Signals

🟢 ProgramBench: Can Language Models Rebuild Programs from Scratch? — score 36 Sources: hackernews

🟢 Need advice on hardware purchasing decision: RTX 5090 vs. M5 Max 128GB for agentic software development — score 27 Sources: reddit/r/LocalLLaMA

tl;dr - For software development, Qwen3.6 27B, 5090 gives you ~3x speed over M5 Max, letting you plow through code, while M5 Max gives you ~4x memory, letting you use higher quantization and bigger context. Which would you choose and why?

I've been doing a lot of research on this topic for a co

🟢 Visual Perceptual to Conceptual First-Order Rule Learning Networks [R] — score 19 Sources: reddit/r/MachineLearning

I'm genuinely curious, because I've been seeing some papers come out recently from the ILP world, like referenced above as well as others [1, 2]. It seems they're busy cooking.

In the main linked paper they're tackling pure i

🟢 Any tool that tells you the cheapest setup needed to run a model? I want to know the cheapest setup that can realistically run Qwen 3.6 27B at decent speeds. — score 12 Sources: reddit/r/LocalLLaMA

I’m looking for a tool or calculator that can estimate the minimum hardware needed to run a specific model locally.

For example, I want to know the cheapest setup that can realistically run Qwen 3.6 27B at decent speeds. Ideally something that can tell me:

- Required VRAM for different quantizati

🟢 NeuIPS submission small formatting question [D] — score 6 Sources: reddit/r/MachineLearning

Neurips deadline crunch stress post. template has no new page after references before appendices this year but all camera ready papers from last year have this. looks hella awkward to have appendices start on same page as references. is adding a /newpage ok/required/not ok/etc? TIA

Repo	Description	Stars Today	Language
anthropics/financial-services		641	python
vercel-labs/open-agents	An open source template for building cloud agents.	406	typescript
shiyu-coder/Kronos	Kronos: A Foundation Model for the Language of Financial Markets	234	python
InsForge/InsForge	InsForge is a Postgres-based backend with auth, storage, compute, hosting, and AI gateway. Built for coding agents.	230	typescript
Shubhamsaboo/awesome-llm-apps	100+ AI Agent & RAG apps you can actually run — clone, customize, ship.	200	python
onyx-dot-app/onyx	Open Source AI Platform - AI Chat with advanced features that works with every LLM	116	python
hsliuping/TradingAgents-CN	基于多智能体LLM的中文金融交易框架 - TradingAgents中文增强版	116	python
koala73/worldmonitor	Real-time global intelligence dashboard. AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking in a unified situational awareness interface	111	typescript
aaif-goose/goose	an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM	111	rust
pytorch/pytorch	Tensors and Dynamic neural networks in Python with strong GPU acceleration	49	python

📄 New Papers

Title	Category	Hotness	Link
Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation	research_paper	99	Open
Stream-T1: Test-Time Scaling for Streaming Video Generation	research_paper	87	Open
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents	research_paper	16	Open
Lightning Unified Video Editing via In-Context Sparse Attention	research_paper	7	Open
Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation	research_paper	4	Open
LCM: Lossless Context Management	cs.AI	0	Open
Regularized Centered Emphatic Temporal Difference Learning	cs.AI	0	Open
Actionable Real-Time Modeling of Surgical Team Dynamics via Time-Expanded Interaction Graphs	cs.AI	0	Open
ANDRE: An Attention-based Neuro-symbolic Differentiable Rule Extractor	cs.AI	0	Open
Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks	cs.AI	0	Open
Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA	cs.AI	0	Open
Parallel Prefix Verification for Speculative Generation	cs.AI	0	Open
Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games	cs.AI	0	Open
The Scaling Properties of Implicit Deductive Reasoning in Transformers	cs.AI	0	Open
When Context Hurts: The Crossover Effect of Knowledge Transfer on Multi-Agent Design Exploration	cs.AI	0	Open

🐦 Twitter/X Highlights

Account	Tweet Summary
OpenAI	Introducing the ChatGPT Futures Class of 2026—26 honorees from the first graduating class to have had ChatGPT throughout all four years of university, who used AI to: - Map 1.5M previously unknown objects in space - Detect disaster survivors through walls and debris - Make 100M+ galaxy images searchable - Preserve endangered languages - Build infrastructure to reroute 5M+ pounds of unsold inventory from landfills Post
xai	Image Generation Quality Mode is now available on the xAI API. This model has already powered the generation of over 300 million images on Grok. It brings higher realism, stronger text rendering, and better creative control for business professionals. https://x.ai/news/grok-imagine-quality-mode Post
OpenAI	AI supercomputers need a new kind of network to stay in sync at massive scale. OpenAI’s @markjhandley and @poyntingatgreg join @AndrewMayne to discuss what it takes to move data across record numbers of chips reliably and efficiently, the new Multipath Reliable Connection (MRC) networking protocol, and why it's available for the whole industry to use. Post
GoogleDeepMind	We’re partnering with the developers of @EveOnline to explore the next frontier of AI research in games. EVE's complex, player-driven universe is the perfect safe sandbox to test agents on memory, continual learning, and long-term planning. Find out more → https://goo.gle/4epQIdy Post
xai	SpaceXAI will provide @AnthropicAI with access to Colossus 1, one of the world’s largest and fastest-deployed AI supercomputers, to provide additional capacity for Claude → http://x.ai/news/anthropic-compute-partnership Post

Repeated From Recent Briefings

Hmbown/DeepSeek-TUI — Coding agent for DeepSeek models that runs in your terminal - first seen 2026-05-02; reason: canonical_url
ruvnet/ruflo — 🌊 The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, self-learning swarm intelligence, RAG integration, and native Claude Code / Codex Integration - first seen 2026-05-02; reason: canonical_url
2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints - first seen 2026-05-06; reason: canonical_url
AIDC-AI/Pixelle-Video — 🚀 AI 全自动短视频引擎 | AI Fully Automated Short Video Engine - first seen 2026-05-03; reason: canonical_url
rtk-ai/rtk — CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies - first seen 2026-05-05; reason: canonical_url
Arindam200/awesome-ai-apps — A collection of projects showcasing RAG, agents, workflows, and other AI use cases - first seen 2026-05-06; reason: canonical_url
mksglu/context-mode — Context window optimization for AI coding agents. Sandboxes tool output, 98% reduction. 14 platforms - first seen 2026-05-05; reason: canonical_url
virattt/dexter — An autonomous agent for deep financial research - first seen 2026-05-03; reason: canonical_url
LearningCircuit/local-deep-research — ~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.cpp, Ollama, Google, ...). 10+ search engines - arXiv, PubMed, your private documents. Everything Local & Encrypted. - first seen 2026-05-03; reason: canonical_url
cocoindex-io/cocoindex — Incremental engine for long horizon agents 🌟 Star if you like it! - first seen 2026-05-03; reason: canonical_url
... plus 56 more repeated items in processed data

AI Watchtower Briefing — 2026-05-07

🔴 High Significance

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

📈 Trending Repos

📄 New Papers

🐦 Twitter/X Highlights

Repeated From Recent Briefings