AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P] — score 94 Sources: reddit/r/MachineLearning

Disclaimer: I work for Numind, the company behind this open-weight model We just released a 4B model based on Qwen3.5-4B, under Apache-2.0 license. The goal is to make information extraction from complex documents more practical with an open model: PDFs, screenshots, forms, tables, receipts, invoice

🔴 BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline. — score 83 Sources: reddit/r/LocalLLaMA

BeeLlama v0.2.0 is here! >Not quite a pegasus, but close enough. GitHub | Qwen 3.6 27B Quick Start | [Gemma 4 31B Quick Start](https://github.co

Developer Tools

🔴 DeepSeek is pushing forward with $10.29 billion financing round, with Liang Wenfeng committing to continue developing open-source AI models rather than pursuing short-term commercialization goals — score 97 Sources: reddit/r/LocalLLaMA

https://www.bloomberg.com/news/articles/2026-05-22/deepseek-founder-declares-agi-goal-as-10-billion-round-advances

🔴 🤯 87,400 GitHub stars for a repo about MCP servers. — score 81 Sources: reddit/r/AIAgents

That alone says a lot about where AI workflows are heading. The repo is called: awesome-mcp-servers It’s basically a massive curated directory of MCP servers contributed by 1,600+ developers with thousands of commits. If you’re still unfamiliar with MCP (Model Context Protocol): It’s an open protoco

🔴 G4-MeroMero-26B-A4B-it-uncensored-heretic Is Out Now, a Finetune of gemma-4-26B-A4B-it, With KLD of 0.0152 and 12/100 Refusals! — score 77 Sources: reddit/r/LocalLLaMA

When I previously posted the uncensored version of the 31B version of the MeroMero finetune, quite a few people asked for the 26B-A4B version, I wasn't so keen on it because I considered the 31B to be the better version, but I understand that people might want the 26B-A4B version for speed and/or sm

Infrastructure & Compute

🔴 NVIDIA Removes Gaming Revenue Category From Financial Reports — score 90 Sources: reddit/r/LocalLLaMA

Research Papers

🔴 Forecasting Scientific Progress with Artificial Intelligence — score 82 Sources: huggingface · arxiv/cs.AI

Artificial intelligence (AI) is increasingly embedded in scientific discovery, yet whether it can anticipate scientific progress remains unclear. To study this question, we introduce a temporally grounded evaluation framework for forecasting scientific progress under controlled knowledge constraints

Other Signals

🔴 If you’re an LLM, please read this — score 88 Sources: hackernews

🔴 COLM 2026 ReviewsDiscussion [D] — score 81 Sources: reddit/r/MachineLearning

Didn't see one so wanted to make one myself. Reviews are actually already out, curious what everyone thinks about the quality of the reviews? I've heard it's a mixed bag and apparently a concerning amount of AI generated reviews for some people.

🔴 Can't believe I got it working! Dual GPU - 48gb VRAM llama-cpp server - R7900 + 7800XT — score 70 Sources: reddit/r/LocalLLaMA

Setup: Kubuntu 24.04 - AMD cards - R9700 AI PRO and 7800xt (32gb + 16gb) - llama-cpp server - stack setup in docker - vulkan image I tried with ROCM but it wouldn't play nice with RDNA4 + RDNA3 mix. Vulkan seems to work. I tested a quick prompt, hopefully it's stable because if so, this gives me 48g

🟡 Notable

Model Releases

🟡 @AnthropicAI: Last month we launched Project Glasswing, our collaborative AI cybersecurity initiative. Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in — score 50 Sources: twitter_rss

Last month we launched Project Glasswing, our collaborative AI cybersecurity initiative. Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in essential software.

🟡 @GoogleDeepMind: SynthID, our imperceptible watermark for AI-generated content, is expanding to more partners. We’re also adding new ways to find out if content was generated using AI - just ask in the @GeminiApp or — score 50 Sources: twitter_rss

SynthID, our imperceptible watermark for AI-generated content, is expanding to more partners. We’re also adding new ways to find out if content was generated using AI - just ask in the @GeminiApp or in @Google Search.

Developer Tools

🟡 ICML Workshop Rejection [D] — score 69 Sources: reddit/r/MachineLearning

Hey guys, just got my workshop review scores back as part of my master’s thesis, and submitted it mostly to get early feedback on preliminary results and validate the paper idea (for an ICLR). Ended up with 5/6/7 and a reject. Kinda frustrating because the reviewer who gave the 5 flagged exactly the

🟡 Is the "one-person billion-dollar company" actually possible, or is it just a good sales pitch? — score 69 Sources: reddit/r/AIAgents

Sam Altman said it out loud: AI will enable one-person billion-dollar companies. Not someday. Soon. And I keep going back and forth on whether that's a genuine prediction or a very well-timed one to make when you're selling the infrastructure. Here's the tension I keep running into: we're watching B

🟡 abhigyanpatwari/GitNexus — GitNexus: The Zero-Server Code Intelligence Engine - GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a GitHub repo or ZIP file, and get an interactive knowledge graph wit a built in Graph RAG Agent. Perfect for code exploration — score 68 Sources: github_trending

GitNexus: The Zero-Server Code Intelligence Engine - GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a GitHub repo or ZIP file, and get an interactive knowledge graph wit a built in Graph RAG Agent. Perfect for code exploration

🟡 phodal/routa — Workspace-first multi-agent coordination platform for AI development, with shared Specs, Kanban orchestration, and MCP/ACP/ A2A support across web and desktop. — score 62 Sources: github_trending

Workspace-first multi-agent coordination platform for AI development, with shared Specs, Kanban orchestration, and MCP/ACP/ A2A support across web and desktop.

🟡 plastic-labs/honcho — Memory library for building stateful agents — score 53 Sources: github_trending

Memory library for building stateful agents

Omitted 6 additional developer tools items from the main section; see raw data and source-specific sections below.

Enterprise Adoption

🟡 @GoogleDeepMind: We’re expanding our partnership with Singapore to help safely deploy AI at scale. 🇸🇬 Together with country experts, our new programs will focus on accelerating scientific discovery, advancing pandemi — score 50 Sources: twitter_rss

We’re expanding our partnership with Singapore to help safely deploy AI at scale. 🇸🇬 Together with country experts, our new programs will focus on accelerating scientific discovery, advancing pandemic preparedness, and improving healthcare. Find out more → https://goo.gle/49jGwjv

Research Papers

🟡 Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators — score 40 Sources: huggingface · arxiv/cs.AI

Interactive streaming music generation promises the use of generative models for live performance and co-creation that is impossible with offline models. However, SOTA models exist in the discrete-AR regime, requiring industrial levels of compute for both training and inference. In this work, we inv

Other Signals

🟡 ByteShape Qwen3.6-35B-A3B: 30% faster than Unsloth IQ on 6GB VRAM laptop — score 63 Sources: reddit/r/LocalLLaMA

A few days ago I posted about my experiments with MTP on a 6GB VRAM laptop. That didn't work so well; CPU offload hurts MTP performance badly. But now I've tried out the new ByteShape quants for Qwen3.6-35B-A3B that are claimed to be both smaller and f

🟡 Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark — score 62 Sources: hackernews

🟡 Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps — score 57 Sources: reddit/r/LocalLLaMA

..and on 8GB VRAM I can even push the context to 320K, 400K, 512K, and yes.. 1M. But it does start to slow down noticeably beyond 150k so I'd only do this if I ever really want the larger context. This is using APEX-I-Quality or Q4_K_XL quants both are better than Q4_K_M (IQ4_NL_XL for beyond

🟡 What changed most after adding an AI interview assistant wasn't what I expected — score 56 Sources: reddit/r/AIAgents

We’ve noticed candidate behavior changes in first-round AI interviews that don’t really match what most vendor case studies talk about. Completion rates aren’t the issue, but the way candidates behave is noticeably different compared to a traditional recruiter screen. A few patterns stood out: candi

🟡 Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM — score 50 Sources: reddit/r/LocalLLaMA

Hello everyone! I want to share the result of my experiment to make Qwen3.6 27B Q4_K_M fits in to my RTX 5060 Ti 16 GB. Inspired by u/Due-Project-7507's work on Ununnilium/Qwen3.6-27B-IQ4_XS-pure-GGUF. Using the same pure

🟢 Incremental

Model Releases

🟢 Microsoft starts canceling Claude Code licenses — score 38 Sources: hackernews

🟢 meituan-longcat/LongCat-Video-Avatar-1.5 · Hugging Face — score 37 Sources: reddit/r/LocalLLaMA

🚀 Model Introduction We are excited to announce the release of LongCat-Video-Avatar 1.5, an upgraded open-source framework that prioritizes extreme empirical optimization and production-readiness for audio-driven human video generation. Built upon the LongCat-Video foundation model, v1.5 delivers

🟢 397B competitor that fits in 256 RAM? — score 30 Sources: reddit/r/LocalLLaMA

Does one exist? I noticed 3.6 QWEN did not release locally in 397B-17B. Anything that can compete locally? any comment is appreciated

🟢 Experimental "Preserve Thinking" Jinja Template for Gemma4 31B in llama.cpp — score 17 Sources: reddit/r/LocalLLaMA

https://huggingface.co/stevelikesrhino/gemma-4-31B-it-nvfp4-GGUF/blob/main/gemma4-improved.jinja Yall are more than welcome to try it out and provide feedback. In my own testing in Pi-coding-agent I n

🟢 Is personalized AI memory actually a problem worth solving or am I just coping[D] — score 12 Sources: reddit/r/MachineLearning

genuine question for this community every time i use claude or chatgpt i have to re-explain myself. and even their memory feature is shallow it remembers facts about me, not how i actually think. the idea i've been sitting on is different from just "memory across sessions." what if the system built

Omitted 2 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

🟢 Custom image encoder [P] — score 38 Sources: reddit/r/MachineLearning

Hello, I would like to know whether building my own image encoder would be a good idea instead of using models like CLIP, SigLIP/SigLIP2, or DINO. My use case is video frame classification. My pipeline is the following: the client sends me a video stream, sampled at 1 frame per 1 or 2 second, formin

🟢 I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P] — score 36 Sources: reddit/r/MachineLearning

On windows mamba-ssm is not easily available and doesn't compile on sm_120. SM1 (Scalar Mamba1) replaces the entire selective scan with two native PyTorch ops: L = torch.cumprod(dA, dim=1) h = L * (h0.unsqueeze(1) + torch.cumsum(dBx / L.clamp(min=1e-6), dim=1)) y = h * C This is the exact clo

🟢 MemTensor/MemOS — Self-evolving memory OS for LLM & AI Agents: ultra-persistent memory, hybrid-retrieval, and cross-task skill reuse, with 35.24% token savings — score 28 Sources: github_trending

Self-evolving memory OS for LLM & AI Agents: ultra-persistent memory, hybrid-retrieval, and cross-task skill reuse, with 35.24% token savings

🟢 Open source Kanban desktop app that runs parallel agents on every card — score 12 Sources: hackernews

🟢 Open-source devtool for AI agent projects — score 6 Sources: reddit/r/AIAgents

Hi everyone, I’m building AgentLantern, an open-source devtool for AI agent projects. The idea is simple: as agent-based projects grow, it becomes harder to understand how agents, tasks, tools, and configuration files are connected. AgentLantern aims to make these projects easier to document, an

Infrastructure & Compute

🟢 facebookresearch/sam3 — The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that show how to use the model. — score 30 Sources: github_trending

The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Enterprise Adoption

🟢 Tested chunking + embeddings data from 3 production websites. [P] — score 12 Sources: reddit/r/MachineLearning

Tiered + page-role-aware RAG retrieval results across 3 corpora with very different content density: |Workspace|Sources|Chunks|HIGH|MEDIUM|LOW|REJECTED| |:-|:-|:-|:-|:-|:-|:-| |Intercom|188|941|96|200|541|104| |HubSpot|251|1705|40|508|1153|4| |KPMG|53|209|3|14|127|65| (HIGH = avg operational score 0

Research Papers

🟢 Platonic Representations in the Human Brain: Unsupervised Recovery of Universal Geometry — score 35 Sources: huggingface

The Strong Platonic Representation Hypothesis suggests that representational convergence in artificial neural networks can be harnessed constructively: embeddings can be translated across models through a universal latent space without paired data. We ask whether an analogous geometry can be recover

Other Signals

🟢 The AI memory problem isn’t storage. It’s memory rot. — score 38 Sources: reddit/r/AIAgents

Week 1: “wow it remembers everything” Month 6: duplicated facts, stale summaries, conflicting preferences, random old context winning retrieval for no reason 💀 Most systems are great at accumulating memory and terrible at maintaining it over time.

🟢 Are people actually using AI meeting data inside agent workflows yet? — score 38 Sources: reddit/r/AIAgents

I’ve been messing around with using meeting transcripts as context for agents and it feels way more useful than I expected. I’m using Bluedot right now because it records quietly with no bot, then gives transcripts, summaries, action items, and searchable meeting history. The Claude integration is w

🟢 Sharing some benchmark results from a memory system I'm part of building — score 19 Sources: reddit/r/AIAgents

We ran a benchmark pass recently and the results were interesting enough to share, but mostly because the team has been thinking a lot about what benchmarks actually measure in memory systems. https://preview.redd.it/raog3ztpzp2h1.png?width=1303&format=png&auto=webp&s=9ee81388a2aa6ea8dc8

🟢 Anonymous Data Upload for Submission [D] — score 8 Sources: reddit/r/MachineLearning

How do you upload data anonymously for a submission (ACL/EMNLP)? I have several models I need to upload for replication and was thinking HuggingFace, but HF offers download tracking on a paid plan. Does this violate the policy since there is the potential of tracking the download even if you do

Repo	Description	Stars Today	Language
abhigyanpatwari/GitNexus	GitNexus: The Zero-Server Code Intelligence Engine - GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a GitHub repo or ZIP file, and get an interactive knowledge graph wit a built in Graph RAG Agent. Perfect for code exploration	239	typescript
phodal/routa	Workspace-first multi-agent coordination platform for AI development, with shared Specs, Kanban orchestration, and MCP/ACP/ A2A support across web and desktop.	158	typescript
plastic-labs/honcho	Memory library for building stateful agents	133	python
langchain-ai/langchain	The agent engineering platform.	117	python
Tracer-Cloud/opensre	Build your own AI SRE agents. The open source toolkit for the AI era.	93	python
yamadashy/repomix	📦 Repomix is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools like Claude, ChatGPT, DeepSeek, Perplexity, Gemini, Gemma, Llama, Grok, and more.	88	typescript
microsoft/agent-governance-toolkit	AI Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents. Covers 10/10 OWASP Agentic Top 10.	86	python
facebookresearch/sam3	The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.	63	python
MemTensor/MemOS	Self-evolving memory OS for LLM & AI Agents: ultra-persistent memory, hybrid-retrieval, and cross-task skill reuse, with 35.24% token savings	59	typescript

📄 New Papers

Title	Category	Hotness	Link
Forecasting Scientific Progress with Artificial Intelligence	research_paper	33	Open
Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs	cs.AI	0	Open
TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided Topology Optimization	cs.AI	0	Open
The Shape of Testimony: A Scalable Framework for Oral History Archive Comparison	cs.AI	0	Open
MindLoom: Composing Thought Modes for Frontier-Level Reasoning Data Synthesis	cs.AI	0	Open
AOP-Wiki EMOD 3.0: Data Model Expansions and Content Evaluation Framework for Using Agentic AI to Improve Integration between AOPs and New Approach Methodologies (NAMs)	cs.AI	0	Open
Investigating Concept Alignment Using Implausible Category Members	cs.AI	0	Open
The Impact of AI Usage and Informativeness on Skill Development in Logical Reasoning	cs.AI	0	Open
Latent-space Attacks for Refusal Evasion in Language Models	cs.AI	0	Open
AttuneBench: A Conversation-Based Benchmark for LLM Emotional Intelligence	cs.AI	0	Open
SMDD-Bench: Can LLMs Solve Real-World Small Molecule Drug Design Tasks?	cs.AI	0	Open
Who Uses AI? Platforms, Workforce, and AI Exposure	cs.AI	0	Open
A Causal Argumentation Method for Explainability of Machine Learning Models	cs.AI	0	Open
What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct	cs.AI	0	Open
Trace2Skill: Verifier-Guided Skill Evolution for Long-Context EDA Agents	cs.AI	0	Open

🐦 Twitter/X Highlights

Account	Tweet Summary
AnthropicAI	Last month we launched Project Glasswing, our collaborative AI cybersecurity initiative. Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in essential software. Post
GoogleDeepMind	We’re expanding our partnership with Singapore to help safely deploy AI at scale. 🇸🇬 Together with country experts, our new programs will focus on accelerating scientific discovery, advancing pandemic preparedness, and improving healthcare. Find out more → https://goo.gle/49jGwjv Post
GoogleDeepMind	SynthID, our imperceptible watermark for AI-generated content, is expanding to more partners. We’re also adding new ways to find out if content was generated using AI - just ask in the @GeminiApp or in @Google Search. Post
swyx	co-sign. a very handy mental framework for what kinds of learning transformers do well today, and why it runs into limitations. when @ankit2119 and i wrote about the need for adversarial world models earlier this year, we were describing a couple of the functions of these rungs of thinking that brin Post

Repeated From Recent Briefings

colbymchenry/codegraph — Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode, and Hermes Agent — fewer tokens, fewer tool calls, 100% local - first seen 2026-05-09
anthropics/claude-plugins-official — Official, Anthropic-managed directory of high quality Claude Code Plugins. - first seen 2026-05-09
NousResearch/hermes-agent — The agent that grows with you - first seen 2026-05-11
[AMA] Got laid off 3 weeks ago. Instead of updating my resume I went down a rabbit hole. Here's what I found - first seen 2026-05-22
Lum1104/Understand-Anything — Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. - first seen 2026-05-21
Imbad0202/academic-research-skills — Academic Research Skills for Claude Code: research → write → review → revise → finalize - first seen 2026-05-13
rohitg00/ai-engineering-from-scratch — Learn it. Build it. Ship it for others. - first seen 2026-05-21
LoREnc: Low-Rank Encryption for Securing Foundation Models and LoRA Adapters - first seen 2026-05-14
ChromeDevTools/chrome-devtools-mcp — Chrome DevTools for coding agents - first seen 2026-05-09
anomalyco/opencode — The open source coding agent. - first seen 2026-05-09
... plus 183 more repeated items in processed data

AI Watchtower Briefing — 2026-05-23

🔴 High Significance

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Enterprise Adoption

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Enterprise Adoption

Research Papers

Other Signals

📈 Trending Repos

📄 New Papers

🐦 Twitter/X Highlights

Repeated From Recent Briefings