AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 The Financial Times has published an article about Heretic — score 96 Sources: reddit/r/LocalLLaMA

https://www.ft.com/content/5630ed79-a263-41ed-9a1a-321617ae310e “The FT was able to use Heretic, a tool available on the popular code repository GitHub, to remove the guardrails from Meta’s Llama 3.3 model in less than 10 minutes without any specialist hardware.” “Heretic creator Philipp Emanuel Wei

🔴 Update on 12x32gb sxm v100 cluster / local AI for legal drafting — score 81 Sources: reddit/r/LocalLLaMA

Update from the lawyer with the V100 server. A few of you asked what I actually ended up running once the dust settled, so here it is. Still just a lawyer, still driving the whole thing through Claude Code, still not fully sure what I'm doing — but it works now, which is more than I could say last t

🔴 Is there a way to use multiple AI models without paying for 11 different monthly subscriptions? — score 81 Sources: reddit/r/AIAgents

I’m getting into AI content creation, generating both images and short videos, but subscribing to different AI tools feels like a total rip-off. I need GPT for logic and layout, Flux for visuals, and specialized video models for motion.Right now, I’m juggling like 5 different API keys and subscripti

🔴 Is Qwen3.6 current king for local agentic use? — score 73 Sources: reddit/r/LocalLLaMA

I've been testing other models but it seems like nothing even come close to Qwen3.6 35B A3B for agentic use. The worse I'd get is a loop sometimes, while Gemma4 produced broken tool calls occasionally and I couldn't even get GLM 4.7 Flash REAP past 2 or 3 messages before it starts looping. All IQ4_N

Developer Tools

🔴 How Do You Think We Can Help Avoid AI Scams? — score 94 Sources: reddit/r/AIAgents

Right now most people can tell if they were talking to AI. But with older folks, it's trickier. I see things like this, and it seems like it's only going to get worse. I don't think a realistic solution would be to ban AI. To me, there are a few opti

Research Papers

🔴 SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills — score 82 Sources: huggingface · arxiv/cs.AI

Large language model (LLM) agents accumulate rich episodic trajectories while solving real-world tasks, but it remains unclear whether such experience can be distilled into reusable procedural skills. We introduce SkillEvolBench, a diagnostic benchmark for evaluating this step from experience reuse

🔴 Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents — score 78 Sources: huggingface · arxiv/cs.CL

While AI agents demonstrate remarkable capabilities in reasoning and tool use, they remain fundamentally reactive: they compute responses only after explicit user prompts. This paradigm ignores a critical opportunity: the idle time between interactions is largely wasted, leaving agents unable to pre

🔴 InstructSAM: Segment Any Instance with Any Instructions — score 75 Sources: huggingface

In this paper, we introduce InstructSAM, a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. We formulates instruction-driven instance segmentation as a set-structured query prediction problem and propose an explicit reasoning-to-instance query

Other Signals

🔴 Using AI to write better code more slowly — score 88 Sources: hackernews

🔴 The famous METR AI time horizons graph contains numerous severe errors [D] — score 81 Sources: reddit/r/MachineLearning

Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, writes damningly about the famous METR AI time horizons graph in the Substack publication Transformer: >It is impossible to dr

🟡 Notable

Model Releases

🟡 @xai: Grok Build is now available in Beta for all SuperGrok and X Premium+ users. Use Plan Mode, create images and videos with Imagine, and build automations or orchestrators with the CLI. Visit http://x. — score 60 Sources: twitter_rss

Grok Build is now available in Beta for all SuperGrok and X Premium+ users. Use Plan Mode, create images and videos with Imagine, and build automations or orchestrators with the CLI. Visit http://x.ai/cli to get started.

Developer Tools

🟡 Built a runtime governance proxy for AI agents after realizing prompt injection gets a lot scarier once agents have tools — score 69 Sources: reddit/r/AIAgents

When your agent reads external content — webpages, emails, documents, database rows — that content can contain hidden instructions that hijack it. This isn’t theoretical. A poisoned document tells your agent to forward credentials. A malicious email tells it to ignore its guidelines. The model has n

🟡 @AnthropicAI: Anthropic co-founder Chris Olah was invited to speak at today's presentation of Pope Leo XIV's encyclical "Magnifica humanitas." Read the full text of his remarks: https://www.anthropic.com/news/chri — score 50 Sources: twitter_rss

Anthropic co-founder Chris Olah was invited to speak at today's presentation of Pope Leo XIV's encyclical "Magnifica humanitas." Read the full text of his remarks: https://www.anthropic.com/news/chris-olah-pope-leo-encyclical

🟡 DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P] — score 44 Sources: reddit/r/MachineLearning

Just thought I'd share, I ran a DCGAN on a dual core RISC-V microcontroller, the CH32H417 generating 64x64 cat faces. This is a new RISC-V MCU, so no TFLite, no CMSIS NN and no external memory. It's a pure C inference engine, bit-identical to PyTorch reference outputs. The model is 12.6M parameters

Infrastructure & Compute

🟡 Norway's 2 petabytes of Huawei flash storage and LLM training — score 62 Sources: hackernews

Research Papers

🟡 CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models — score 65 Sources: huggingface

Video prediction is increasingly viewed as a path toward generalizable world models, yet it remains unclear whether these systems learn underlying causal structure or merely exploit superficial visual correlations for future prediction. We introduce CRONOS, an intervention-based benchmark designed t

🟡 A Comprehensive Dataset for Human vs. AI Generated Image Detection — score 60 Sources: arxiv/cs.AI · arxiv/cs.CL

arXiv:2601.00553v2 Announce Type: replace-cross Abstract: Multimodal generative AI systems like Stable Diffusion, DALL-E, and MidJourney have fundamentally changed how synthetic images are created. These tools drive innovation but also enable the spread of misleading content, false information, and

🟡 SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking — score 50 Sources: huggingface · arxiv/cs.AI

Mobile GUI agents powered by large language models have progressed rapidly, creating urgent needs for realistic and comprehensive evaluation. Existing benchmarks prioritize reproducibility but are often limited to open-source apps or file-operation tasks for the difficulty of constructing rewards on

🟡 Reinforcing Few-step Generators via Reward-Tilted Distribution Matching — score 45 Sources: huggingface

Recent advances in few-step diffusion distillation have enabled efficient image generation, yet aligning these models with human preferences remains challenging. We propose Reward-Tilted Distribution Matching Distillation (RTDMD), a two-stage framework that unifies distribution matching distillation

Other Signals

🟡 Already 11 000 submissions for EMNLP? [D] — score 69 Sources: reddit/r/MachineLearning

Is this normal? I searched it up and last year it was only 8000.

🟡 One letter to appease them all — score 58 Sources: reddit/r/LocalLLaMA

🟡 Are ICML workshops worth attending? [D] — score 56 Sources: reddit/r/MachineLearning

Hi! I missed securing a main conference ticket for ICML 2026, as my workshop paper got accepted two days ago. Do you believe that it is worth attending just workshops at such A*-tier conferences (with all the overseas travel costs etc.)? I was quite looking forward to attending both, including the

🟡 Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs. — score 50 Sources: reddit/r/LocalLLaMA

Here's the PR by pedapudi. https://github.com/ggml-org/llama.cpp/pull/21344 It's merge request has been denied so it will not be in mainline llama.cpp. The changes are so small that I just put them into whatever the current release of llama.cpp is. Read the PR for more info. It will only work with M

🟡 CXMT started selling ram to corsair — score 42 Sources: reddit/r/LocalLLaMA

They started producing cheaper ram for corsair, hopefully it will get cheaper for consumers [https://www.tomshardware.com/pc-components/ddr5/chinese-memory-maker-cxmt-enters-the-mainstream-consumer-memory-with-corsair-vengeance-ddr5-kit-chinese-made-dram-emerges-as-an-antidote-for-crushing-shortages

🟢 Incremental

Model Releases

🟢 model : add support for talkie-1930-13b by niklassheth · Pull Request #22596 · ggml-org/llama.cpp — score 27 Sources: reddit/r/LocalLLaMA

https://huggingface.co/talkie-lm/talkie-1930-13b-it talkie-1930-13b-it talkie-1930-13b-it is a 13B vintage language model. It is an instruction-tuned post-train of talkie-1930-13b-base, which was trained on 260B tokens of pre-1931 Englis

🟢 Running on a macbook, and having issues with crashing? Maybe this will help... — score 4 Sources: reddit/r/LocalLLaMA

Just a friendly pointer on getting around some issues on macbooks. I hope someone finds this useful. I spent weeks of ripping my hair out with crashes, crap performance and issues - and being entirely too stubborn to harness the power of Google to find solutions to my issues. Though, I prefer doing

Developer Tools

🟢 SkillOpt treats markdown skill files as trainable parameters with proper optimization machinery — score 35 Sources: reddit/r/LocalLLaMA

Paper came out recently that formalizes something a lot of agent builders have been doing ad hoc. They use a frontier model to propose bounded edits (add/delete/replace) to markdown skill files, then gate every edit against a held out validation set. Only strict improvements accepted, ties rejected,

🟢 moeru-ai/airi — 💖🧸 Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported. — score 34 Sources: github_trending

💖🧸 Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.

🟢 OpenBB-finance/OpenBB — Financial data platform for analysts, quants and AI agents. — score 25 Sources: github_trending

Financial data platform for analysts, quants and AI agents.

🟢 Freelancers who build WhatsApp Business API bots for multiple clients: how do you structure your Meta Developer setup? — score 24 Sources: reddit/r/AIAgents

Hey everyone, I'm building WhatsApp appointment bots for dental clinics and I'm confused about the Meta Developer App structure when scaling to multiple clients. My confusion: * Do you create ONE Meta Developer App and add each client's phone number inside it? * Or do you create a SEPARATE app f

🟢 NateBJones-Projects/OB1 — Open Brain — The infrastructure layer for your thinking. One database, one AI gateway, one chat channel — any AI plugs in. No middleware, no SaaS. — score 20 Sources: github_trending

Open Brain — The infrastructure layer for your thinking. One database, one AI gateway, one chat channel — any AI plugs in. No middleware, no SaaS.

Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.

Business & Funding

🟢 I finally put my NPU (Intel Arrow Lake) to use doing ASR for my smart home — score 15 Sources: reddit/r/LocalLLaMA

I wrote about what I found in a deep dive elsewhere (which I will no mention because Reddit doesn't like cross linking) but I wanted to share it here since this is where I learn the most about AI stuff and I've seen before questions about NPUs, that are often dismissed as marketing gimmicks (and for

Enterprise Adoption

🟢 How I built a safety layer around LLM-generated trading code and cut deployment time from 40 hours to 20 minutes — score 0 Sources: reddit/r/AIAgents

I built AlgoAI, a platform that converts plain-English strategy descriptions into live Python trading bots running against MetaTrader 5. The goal was to compress a workflow that traditionally takes quants 8 to 40 hours down to under 20 minutes. We hit that target. But the interesting engineering pro

Research Papers

🟢 HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction — score 30 Sources: huggingface

Online 3D reconstruction requires estimating camera pose and scene geometry under strict causal and bounded-memory constraints. Existing methods often suffer from drift, jitter, or collapse on long sequences. We trace these failures to a fundamental mismatch. Streaming geometry is inherently tempora

🟢 Pixel-Level Pavement Distress Assessment Using Instance Segmentation — score 10 Sources: huggingface

Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to achieve the geometric precision necessary for maintenance-relevant quantification. This paper presents a v

Other Signals

🟢 Use Boring Languages with LLMs — score 38 Sources: hackernews

🟢 [Open Source] a contract layer at the agent tool boundary, rules in yaml not in the prompt (apache 2.0) — score 26 Sources: reddit/r/AIAgents

sharing what i've been working on. sponsio is an open-source contract layer that sits at the tool-call boundary of an llm agent. apache 2.0, python and ts. the thesis: rules that absolutely must hold (like "always check policy before issuing a refund" or "never call this tool twice per session") don

🟢 qwen 3.6 27B AR-> Diffusion - local training on 5090 — score 15 Sources: reddit/r/LocalLLaMA

based on the work of open-dllm - (which achieved qwen 2.5 autoregressive -> diffusion realignment head - same exact model under the hood delivering a 4x in improvement.) TLDR I haven't got a trained model yet. just a burnt out gpu cable and a new psu on order. I did actually get the thing to do a

🟢 Multimodal adaptive optical microscope: in vivo imaging, molecules to organisms — score 12 Sources: hackernews

🟢 Aiki my local Wikipedia Retrieval-Augmented Generation system [R] — score 11 Sources: reddit/r/MachineLearning

Hey i built Aiki a lightweight tool that let's you chat with Wikipedia locally. https://i.redd.it/67mzfsrc6f3h1.gif what it does: * Downloads and chunks wikipedia articles (u can choose those articles by their name or articles and also the option of downloading the similar topics) * Uses a cus

Repo	Description	Stars Today	Language
moeru-ai/airi	💖🧸 Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.	62	typescript
OpenBB-finance/OpenBB	Financial data platform for analysts, quants and AI agents.	43	python
NateBJones-Projects/OB1	Open Brain — The infrastructure layer for your thinking. One database, one AI gateway, one chat channel — any AI plugs in. No middleware, no SaaS.	25	typescript

📄 New Papers

Title	Category	Hotness	Link
SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills	research_paper	12	Open
Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents	research_paper	10	Open
InstructSAM: Segment Any Instance with Any Instructions	research_paper	9	Open
CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models	research_paper	6	Open
A Comprehensive Dataset for Human vs. AI Generated Image Detection	cs.AI	0	Open
SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking	research_paper	2	Open
In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models	cs.AI	0	Open
Confidence Calibration in Large Language Models	cs.AI	0	Open
How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning	cs.AI	0	Open
Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction	cs.AI	0	Open
Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs	cs.AI	0	Open
Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game	cs.AI	0	Open
BODHI: Precise OS Kernel Specification Inference	cs.AI	0	Open
When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure	cs.AI	0	Open
Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Model	cs.AI	0	Open

🐦 Twitter/X Highlights

Account	Tweet Summary
xai	Grok Build is now available in Beta for all SuperGrok and X Premium+ users. Use Plan Mode, create images and videos with Imagine, and build automations or orchestrators with the CLI. Visit http://x.ai/cli to get started. Post
AnthropicAI	Anthropic co-founder Chris Olah was invited to speak at today's presentation of Pope Leo XIV's encyclical "Magnifica humanitas." Read the full text of his remarks: https://www.anthropic.com/news/chris-olah-pope-leo-encyclical Post

Repeated From Recent Briefings

Lum1104/Understand-Anything — Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. - first seen 2026-05-21
colbymchenry/codegraph — Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode, and Hermes Agent — fewer tokens, fewer tool calls, 100% local - first seen 2026-05-09
rohitg00/ai-engineering-from-scratch — Learn it. Build it. Ship it for others. - first seen 2026-05-21
How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D] - first seen 2026-05-25
anthropics/knowledge-work-plugins — Open source repository of plugins primarily intended for knowledge workers to use in Claude Cowork - first seen 2026-05-25
mukul975/Anthropic-Cybersecurity-Skills — 754 structured cybersecurity skills for AI agents · Mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND & NIST AI RMF · agentskills.io standard · Works with Claude Code, GitHub Copilot, Codex CLI, Cursor, Gemini CLI & 20+ platforms · 26 security domains · Apache 2.0 - first seen 2026-05-24
NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) - first seen 2026-05-23
earendil-works/pi — AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods - first seen 2026-05-09
multica-ai/multica — The open-source managed agents platform. Turn coding agents into real teammates — assign tasks, track progress, compound skills. - first seen 2026-05-24
garrytan/gstack — Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA - first seen 2026-05-12
... plus 202 more repeated items in processed data

AI Watchtower Briefing — 2026-05-26

🔴 High Significance

Model Releases

Developer Tools

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Business & Funding

Enterprise Adoption

Research Papers

Other Signals

Hey i built Aiki a lightweight tool that let's you chat with Wikipedia locally. https://i.redd.it/67mzfsrc6f3h1.gif what it does: * Downloads and chunks wikipedia articles (u can choose those articles by their name or articles and also the option of downloading the similar topics) * Uses a cus

📄 New Papers

🐦 Twitter/X Highlights

Repeated From Recent Briefings

AI Watchtower Briefing — 2026-05-26

🔴 High Significance

Model Releases

Developer Tools

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Business & Funding

Enterprise Adoption

Research Papers

Other Signals

Hey i built Aiki a lightweight tool that let's you chat with Wikipedia locally. https://i.redd.it/67mzfsrc6f3h1.gif what it does: * Downloads and chunks wikipedia articles (u can choose those articles by their name or articles and also the option of downloading the similar topics) * Uses a cus

📈 Trending Repos

📄 New Papers

🐦 Twitter/X Highlights

Repeated From Recent Briefings