AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Qwen3.6 huge quality gain from Q4 to Q6 for coding agent — score 75 Sources: reddit/r/LocalLLaMA

So, last week I tried to update my unused local LLM setup. I had to stop using it because quality was too low and deepseek was too cheap. First thing I stopped using Ollama and now I only use llama.cpp built in server that works really great. The quality improvement from Q4 to Q6 is outstanding and

Developer Tools

🔴 harry0703/MoneyPrinterTurbo — 利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM. — score 97 Sources: github_trending

利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.

🔴 Vulnerability found in framework used by VLLM, many MCP servers, and other LLM tools — score 89 Sources: reddit/r/LocalLLaMA

Worth taking a look to see if this affects any of you. Surprised nobody has posted it yet.

🔴 how much do you all actually trust autonomous AI agents — score 81 Sources: reddit/r/AIAgents

hey all — been thinking about multi-agent AI systems lately and was curious what people here actually think about autonomous AI agents. there are some companies out there doing real things with it, but how much do you actually trust them? what's your willingness to adopt these kinds of agents? curio

Infrastructure & Compute

🔴 Behold! Probably the most ghetto local AI server: — score 96 Sources: reddit/r/LocalLLaMA

AKA: Jank Incarnate After months of pain, I finally got a working setup. There's a bunch of quirks about running a multi-Tesla setup. I was planning to write something about my experience after I get it running. Currently, the fans are plugged into the wall, speed is controlled with a knob. I still

Business & Funding

🔴 Unpopular opinion: most "AI memory" products are just RAG with a subscription fee — score 94 Sources: reddit/r/AIAgents

Most "AI memory" products are just RAG with a subscription fee and a black box where your data goes to die. The bar for what counts as "memory" has been set embarrassingly low, and developers keep paying for it because switching costs are brutal by design. Self-hosted, inspectable memory isn't a nic

Research Papers

🔴 DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes — score 82 Sources: huggingface · arxiv/cs.AI

Reinforcement learning has become a central paradigm for advancing reasoning in large language models, yet most existing methods still depend on stronger teacher models or heavily curated difficult datasets, limiting scalable capability improvement. In this paper, we introduce DenoiseRL, a reinforce

Other Signals

🔴 AI-generated CUDA kernels silently break training and inference [R] — score 94 Sources: reddit/r/MachineLearning

Last month NVIDIA released SOL-ExecBench, a new benchmark of 235 production CUDA kernels lifted from DeepSeek, Qwen, Gemma, and Kimi. We took several top-ranked AI-generated submissions and tried using them in production workloads. Many of them

🔴 I think Anthropic and OpenAI have found product-market fit — score 90 Sources: hackernews

🔴 DuckDuckGo search saw 28% more visits after Google said people love AI mode — score 70 Sources: hackernews

🟡 Notable

Model Releases

🟡 I built a 103B-token Usenet corpus (1980–2013) — pre-web, human-only, zero AI contamination. Got strong traction on r/ML, thought this community would find it useful. — score 62 Sources: reddit/r/LocalLLaMA

Posted this to r/MachineLearning a couple weeks ago (30K views, 100+ upvotes) and have been meaning to share it here where the fine-tuning angle is more directly relevant. I spent years building and processing a complete Usenet corpus from 1980 to 2013. Here’s why it might matter for local model wor

🟡 CrankGPT by Squeez Labs - hand-cranked edge AI - talk about local AI!!! — score 50 Sources: reddit/r/LocalLLaMA

I met Katrin from Squeez Labs at an event hosted by Pathway AI (the team behind Baby Dragon Hatchling) where she told me about CrankGPT, a literally hand-cranked device for running local LLMs. It's apparently real. It's appearently launched. It's apparently glorious. Check it out at [https://crankgp

🟡 @xai: Use your SuperGrok or X Premium+ subscription in @kilocode. Try grok-build-0.1 for high speed and agentic coding intelligence, available in the Kilo IDE extensions or CLI. https://x.ai/news/grok-ki — score 50 Sources: twitter_rss

Use your SuperGrok or X Premium+ subscription in @kilocode. Try grok-build-0.1 for high speed and agentic coding intelligence, available in the Kilo IDE extensions or CLI. https://x.ai/news/grok-kilocode

🟡 Nvidia LocateAnything - Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding. (10x faster than Qwen3-VL) — score 48 Sources: reddit/r/LocalLLaMA · arxiv/cs.AI

https://huggingface.co/nvidia/LocateAnything-3B https://github.com/NVlabs/Eagle demo https://huggingface.co/spaces/nvidia/LocateAnything

Developer Tools

🟡 Profiling PyTorch training without accidentally stalling the GPU [D] — score 69 Sources: reddit/r/MachineLearning

Profiling PyTorch training has an interesting measurement problem: the more you measure, the more you can change the behavior of the run itself. A simple example is torch.cuda.synchronize(). It gives cleaner timing boundaries, but it also inserts synchronization points into an otherwise asynchrono

🟡 I opened a live red team environment for my AI agent security proxy — try to get something through — score 69 Sources: reddit/r/AIAgents

Been building Arc Gate for the past few months — a proxy that sits between your agent and your LLM and enforces instruction-authority boundaries at runtime. The core problem it solves: when your agent reads external content — webpages, emails, documents, tool output — that content can contain hidden

🟡 unclecode/crawl4ai — 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here:https://discord.gg/jP8KfhDhyN — score 59 Sources: github_trending

🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here:https://discord.gg/jP8KfhDhyN

Infrastructure & Compute

🟡 Stress disrupts hippocampal integration of overlapping events, memory inference — score 50 Sources: hackernews

Research Papers

🟡 GE-Sim 2.0: A Roadmap Towards Comprehensive Closed-loop Video World Simulators for Robotic Manipulation — score 65 Sources: huggingface

We introduce GE-Sim 2.0 (Genie Envisioner World Simulator 2.0), a closed-loop video world simulator for robotic manipulation. Building on the action-conditioned video generation framework of Genie Envisioner, GE-Sim 2.0 is re-trained on thousands of hours of real-world robot data spanning teleoperat

🟡 SkillGrad: Optimizing Agent Skills Like Gradient Descent — score 60 Sources: huggingface · arxiv/cs.AI

Agent skills provide a lightweight way to adapt LLM agents to specialized domains by storing reusable procedural knowledge in structured files. However, whether downloaded from third parties or self-generated, these skills are often unreliable, incomplete, or outdated. Existing skill-evolution metho

🟡 ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations — score 50 Sources: huggingface · arxiv/cs.AI

Existing emotional support conversation (ESC) systems mainly rely on end-to-end response generation or coarse strategy supervision, offering limited interpretability and little support for systematic skill improvement. We propose ESC-Skills, a skill-centric framework that discovers and self-evolves

🟡 Revealing Algorithmic Deductive Circuits for Logical Reasoning — score 42 Sources: huggingface · arxiv/cs.AI

Recent studies have shown that Large Language Models (LLMs) can achieve strong reasoning performance by incorporating functional symbolic representations that abstractly describe graph traversal algorithms and step-by-step reasoning in few-shot learning settings. However, it remains unclear how LLMs

Other Signals

🟡 Qwen3.6 35B-A3B successfully completed the FoodTruck Bench! — score 68 Sources: reddit/r/LocalLLaMA

🟡 The frontier reasoning race is starting to look like a crowded subway station — score 61 Sources: reddit/r/LocalLLaMA

We went from chasing GPT4 to looking at graphs with GPT5.4 xhigh, Gemini 3.1Pro, and now Hy3 preview completely shaking up the leaderboard. Look at that CHSBO 2025 chart Hy3 preview scoring 87.8 over Gemini and GPT. What a time to be alive, but honestly, my brain can't keep up with the version numbe

🟡 My new home office radiator 🥵 — score 50 Sources: reddit/r/LocalLLaMA

4 x RTX Pro Max-Q We will not speak about the 64GB system RAM...

🟢 Incremental

Model Releases

🟢 i built pengepul: pool multiple claude/gpt accounts behind one local api — score 30 Sources: reddit/r/AIAgents

instead of jumping straight to a higher monthly plan, pengepul lets you pool existing claude and gpt accounts behind one local api endpoint. you can also call it from your own harness or agent runtime. repo: https://github.com/gitshrl/pengepul

🟢 Training GPT-like model on non-language series [R] — score 19 Sources: reddit/r/MachineLearning

I am responsible for a research project that is supposed to train a GPT-like model (Transformer-decoder) with 100M, 250M and 500M model variants. # params ## training dataset - 750M tokens - vocabulary is ~15k to ~100k tokens (depends on tokenizer settings) - ~3% of the vocabulary is used in

Developer Tools

🟢 EMA-Gated Temporal Sequence Compression in Vision Transformers [P] — score 38 Sources: reddit/r/MachineLearning

Vision Transformers waste 90% of their compute recalculating stationary asphalt. NeuroFlow tracks semantic surprise in embedding space, physically eliminating background tokens before the encoder. Result: 55.8x wall-clock speedup for ViTs on high-res video (1792p) with 97% fidelity. No fine-tuning r

🟢 agentscope-ai/agentscope — Build and run agents you can see, understand and trust. — score 38 Sources: github_trending

Build and run agents you can see, understand and trust.

🟢 Gemma-4-Harmonia-31B-Uncensored-Heretic Is Out Now, a Merge of Multiple gemma-4-31B-it Finetunes Designed for a Targeted Approach to Deep Neural Consolidation, Minimizing Regression While Amplifying Unique Capability Boundaries. With KLD 0.0047 and 9/100 Refusals! — score 32 Sources: reddit/r/LocalLLaMA

Provided in both Safetensors and GGUFs. Safetensors, llmfan46/Gemma-4-Harmonia-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic GGUFs, llmfan46/Gemma-4-Harmonia-31B-it-uncenso

🟢 Trying to figure out priorities when setting up a persistent memory for a new agent I'm building — score 30 Sources: reddit/r/AIAgents

https://preview.redd.it/ch0h1cwkbp3h1.png?width=1946&format=png&auto=webp&s=1f62328c2cb285a6854b64b46091556463314e71 I always look at the long term especially when I frequently add new skills or update something so that I won't have any problems contradicting with future context. It's al

🟢 langfuse/langfuse — 🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23 — score 30 Sources: github_trending

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

Omitted 6 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟢 Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA [R] — score 6 Sources: reddit/r/MachineLearning

New preprint. A Mixture-of-Experts inference kernel (TritonMoE) written entirely in OpenAI Triton, targeting portability across NVIDIA and AMD without vendor-specific code. Highlights: * A fused gate+up GEMM computes both SwiGLU projections from shared tile loads, eliminating 35% of global memory tr

🟢 NVIDIA-NeMo/Megatron-Bridge — Training library for Megatron-based models with bidirectional Hugging Face conversion capability — score 1 Sources: github_trending

Training library for Megatron-based models with bidirectional Hugging Face conversion capability

Research Papers

🟢 Clark Hash: Stateless Sparse Johnson-Lindenstrauss Quantization for Neural Embeddings — score 38 Sources: huggingface · arxiv/cs.AI

Clark Hash is a small method for storing neural embeddings in less space. It normalizes each database vector, applies a deterministic sparse signed Johnson-Lindenstrauss projection, clips the result, and stores a fixed-width scalar-quantized code. Queries stay in floating point and are scored agains

Other Signals

🟢 Qwen/Qwen-Image-Bench · Hugging Face — score 39 Sources: reddit/r/LocalLLaMA

Model Description Q-Judger is a vision-language model fine-tuned specifically for automated evaluation of text-to-image generated images. Given a text prompt and a generated image, the model evaluates the image on fine-grained quali

🟢 Investigating how prompt politeness affects LLM accuracy (2025) — score 30 Sources: hackernews

🟢 Question: Llama cpp, whats good right now for: MTP, KV cache quant, Long context. — score 11 Sources: reddit/r/LocalLLaMA

Used the vllm version of https://github.com/noonghunna/club-3090 It worked fine for myabe 20 40k context, havent tried the new one. Anyone used the new llama.cpp patched one for single 3090? The project is starting to seem very bloated, at least readme wise

🟢 A Eureka machine that thinks like nature and explores what AI cannot — score 10 Sources: hackernews

🟢 Krasis update: Qwen3.6-35B-A3B (Q4) at reading speed, 1x 8GB 3070 Mobile laptop (32GB RAM) — score 4 Sources: reddit/r/LocalLLaMA

Context Krasis is an LLM runtime for running models that don't fit into VRAM. Krasis streams the model through VRAM from system RAM efficiently and handles prefill and decode as separate architectures and optimised usecases. # Latest results (v1.0 release) * 1x Laptop RTX 3070 Mobile 8GB, (35B par

Omitted 1 additional other signals items from the main section; see raw data and source-specific sections below.

Repo	Description	Stars Today	Language
harry0703/MoneyPrinterTurbo	利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.	1742	python
unclecode/crawl4ai	🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here:https://discord.gg/jP8KfhDhyN	210	python
agentscope-ai/agentscope	Build and run agents you can see, understand and trust.	86	python
langfuse/langfuse	🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23	82	typescript
yossTheDev/removerized	AI Image Toolkit that runs fully in your browser — free, private, and offline-first.	52	typescript
meilisearch/meilisearch	A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.	22	rust
NVIDIA-NeMo/Megatron-Bridge	Training library for Megatron-based models with bidirectional Hugging Face conversion capability	7	python

📄 New Papers

Title	Category	Hotness	Link
DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes	research_paper	30	Open
GE-Sim 2.0: A Roadmap Towards Comprehensive Closed-loop Video World Simulators for Robotic Manipulation	research_paper	9	Open
SkillGrad: Optimizing Agent Skills Like Gradient Descent	research_paper	8	Open
ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations	research_paper	3	Open
Identifying and Understanding Human Values in Text: A Tailorable LLM-based Architecture	cs.AI	0	Open
Soro: A Lightweight Foundation Model and Chatbot for Tajik	cs.AI	0	Open
On the Origin of Synthetic Information by Means of Steganographic Inheritance	cs.AI	0	Open
DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents	cs.AI	0	Open
Why LLMs Fail at Causal Discovery and How Interventional Agents Escape	cs.AI	0	Open
RULER: Representation-Level Verification of Machine Unlearning	cs.AI	0	Open
LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation	cs.AI	0	Open
Discovery Agents for Real-Time Analytics: Toward Proactive Insight Systems	cs.AI	0	Open
Agyn: An Open-Source Platform for AI Agents with Scalable On-Demand Execution, Agent Definition as a Code, and Zero-Trust Access	cs.AI	0	Open
You Are in Control of Your State: Why Human Outcomes Are Controllable Through Causal State Intervention	cs.AI	0	Open
Cyberbullying Governance on Social Media: A Unified Framework from Content Identification to Intervention	cs.AI	0	Open

🐦 Twitter/X Highlights

Account	Tweet Summary
xai	Use your SuperGrok or X Premium+ subscription in @kilocode. Try grok-build-0.1 for high speed and agentic coding intelligence, available in the Kilo IDE extensions or CLI. https://x.ai/news/grok-kilocode Post

Repeated From Recent Briefings

Lum1104/Understand-Anything — Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. - first seen 2026-05-21
rohitg00/ai-engineering-from-scratch — Learn it. Build it. Ship it for others. - first seen 2026-05-21
farion1231/cc-switch — A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io - first seen 2026-05-08
mukul975/Anthropic-Cybersecurity-Skills — 754 structured cybersecurity skills for AI agents · Mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND & NIST AI RMF · agentskills.io standard · Works with Claude Code, GitHub Copilot, Codex CLI, Cursor, Gemini CLI & 20+ platforms · 26 security domains · Apache 2.0 - first seen 2026-05-24
earendil-works/pi — AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods - first seen 2026-05-09
anthropics/knowledge-work-plugins — Open source repository of plugins primarily intended for knowledge workers to use in Claude Cowork - first seen 2026-05-25
Triplet-Block Diffusion RWKV - first seen 2026-05-26
anthropics/skills — Public repository for Agent Skills - first seen 2026-05-11
[R]GNN Model For Fraud Detection Isn't Performing Well[R] - first seen 2026-05-27
thedotmack/claude-mem — Persistent Context Across Sessions for Every Agent – Captures everything your agent does during sessions, compresses it with AI, and injects relevant context back into future sessions. Works with Claude Code, OpenClaw, Codex, Gemini, Hermes, Copilot, OpenCode + More - first seen 2026-05-27
... plus 134 more repeated items in processed data

AI Watchtower Briefing — 2026-05-28

🔴 High Significance

Model Releases

Developer Tools

Infrastructure & Compute

Business & Funding

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

Model Description Q-Judger is a vision-language model fine-tuned specifically for automated evaluation of text-to-image generated images. Given a text prompt and a generated image, the model evaluates the image on fine-grained quali

Context Krasis is an LLM runtime for running models that don't fit into VRAM. Krasis streams the model through VRAM from system RAM efficiently and handles prefill and decode as separate architectures and optimised usecases. # Latest results (v1.0 release) * 1x Laptop RTX 3070 Mobile 8GB, (35B par

📈 Trending Repos

📄 New Papers

🐦 Twitter/X Highlights

Repeated From Recent Briefings