AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Launch HN: TesterArmy (YC P26) – Agents that test web and mobile apps — score 83 Sources: hackernews

🔴 GLM-5.2 is above GPT-5.5 in AA-Briefcase, Artificial Analysis' new agentic knowledge work eval — score 75 Sources: reddit/r/LocalLLaMA

Developer Tools

🔴 Kilo-Org/kilocode — Kilo is the all-in-one agentic engineering platform. Build, ship, and iterate faster with the most popular open source coding agent. — score 99 Sources: github_trending

Kilo is the all-in-one agentic engineering platform. Build, ship, and iterate faster with the most popular open source coding agent.

🔴 GLM-5.2 inference is free on Hugging Face for the next 6 hours — score 89 Sources: reddit/r/LocalLLaMA

doc: https://huggingface.co/docs/inference-providers/index a cool prompt to try first: [https://huggingface.co/chat/r/aFATtCW?leafId=ed28d5b0-d99b-40be-ba8b-315b1f450e5a](https://huggingface.co/chat/r/aFATtCW?leafId=ed28d5b0-d99b-40be-ba8b-315

🔴 It has been a while since I wrote — score 78 Sources: reddit/r/AIAgents

And it is important that rules are followed to the letter, or else this will be censored, so therefor I shall obey the rules, I swear on my life and my love for it, that I shall not break the rules of this Sub-Reddit - A Man - That is the truth so help me humanity r/aiagents truly r

🔴 browser agents work in demo and then die on auth, sessions, captcha, dom drift... what are ppl doing? — score 78 Sources: reddit/r/AIAgents

The demo-to-production cliff for browser agents is brutal and nobody warns you. Demo: agent navigates to the site, clicks around, extracts data, looks magical. Production, in order of how they killed us: - auth: login flows with MFA, SSO redirects, "verify it's you" emails. agent just stops. - *

Research Papers

🔴 S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence — score 85 Sources: huggingface

Real-world spatial intelligence requires reasoning over a continuous and evolving 3D world, yet existing VLMs and tool-augmented agents largely remain tied to static, stateless inference from isolated visual observations. We introduce \textsc{S-Agent}, a spatial tool-use agentic paradigm for underst

🔴 Playful Agentic Robot Learning — score 82 Sources: huggingface · arxiv/cs.AI

Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied c

🔴 DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects — score 75 Sources: huggingface

Dexterous interaction with articulated objects is important for household, assistive, and humanoid manipulation, where multi-finger hands can provide compliant contact patterns beyond parallel-jaw grasping. However, articulated-object manipulation differs from static-object manipulation: the target

Other Signals

🔴 GLM's founder says GLM-fable before the end of the year?! — score 96 Sources: reddit/r/LocalLLaMA

🟡 Notable

Model Releases

🟡 @xai: Grok models are now available on Databricks Agent Bricks. Bring SpaceXAI's latest models to your enterprise data to power capable AI agents. https://x.ai/news/grok-databricks — score 60 Sources: twitter_rss

Grok models are now available on Databricks Agent Bricks. Bring SpaceXAI's latest models to your enterprise data to power capable AI agents. https://x.ai/news/grok-databricks

🟡 New usage analytics and updated spend controls for enterprises — score 50 Sources: lab_blog/OpenAI

OpenAI introduces new spend controls and usage analytics for ChatGPT Enterprise, helping organizations manage costs and scale AI with confidence.

🟡 Improving health intelligence in ChatGPT — score 50 Sources: lab_blog/OpenAI

Learn how GPT-5.5 Instant improves ChatGPT’s health and wellness responses with stronger reasoning, better context, clearer communication, and physician-informed evaluations.

🟡 @AnthropicAI: New Frontier Red Team blog: Phase 2 of Project Fetch, where we test how well Claude can program a robodog. Opus 4.7, on its own, was ~20x faster than last year's best human team aided by Opus 4.1. (T — score 50 Sources: twitter_rss

New Frontier Red Team blog: Phase 2 of Project Fetch, where we test how well Claude can program a robodog. Opus 4.7, on its own, was ~20x faster than last year's best human team aided by Opus 4.1. (The robodog, alas, still failed to fetch a beach ball.) https://www.anthropic.com/research/project-fet

🟡 Giving GLM-5.2 a spin locally on CPU only! (poor man's rig for big models) — score 46 Sources: reddit/r/LocalLLaMA

This is the UD-Q2-K_XL quant. Hardware is: Model: Dell PowerEdge R740 CPU: Dual Xeon 6248R (24 cores each) RAM: 768 GB (All memory channels populated) I'm using ik_llama.cpp which provides some significant performance improvements over the base llama.cpp for CPU-only inference. Unfortunately, we d

Developer Tools

🟡 K-Dense-AI/scientific-agent-skills — Turn any AI agent into an AI Scientist. The #1 Agent Skills library for science, used by 160,000+ scientists worldwide. 140 ready-to-use skills plus 100+ scientific databases covering biology, chemistry, medicine, and drug discovery. Compatible with Cursor, Claude Code, Codex, Pi, Antigravity, and the open Agent Skills standard. — score 69 Sources: github_trending

Turn any AI agent into an AI Scientist. The #1 Agent Skills library for science, used by 160,000+ scientists worldwide. 140 ready-to-use skills plus 100+ scientific databases covering biology, chemistry, medicine, and drug discovery. Compatible with Cursor, Claude Code, Codex, Pi, Antigravity, and t

🟡 BuilderIO/agent-native — A framework for building agent-native applications. — score 67 Sources: github_trending

A framework for building agent-native applications.

🟡 garrytan/gbrain — Garry's Opinionated OpenClaw/Hermes Agent Brain — score 64 Sources: github_trending

Garry's Opinionated OpenClaw/Hermes Agent Brain

🟡 Is Your Agent a Liar? How to Tell and How to Overcome it: — score 56 Sources: reddit/r/AIAgents

How to Tell If Your AI Agent Is Lying, Hallucinating, or Building Vaporware I have spent a lot of time working with autonomous AI agents. After running them on real projects I have learned one expensive lesson. Most of them are very good at lying. They sound confident. They produce clean looking

🟡 poolside/Laguna-M.1 · Hugging Face - 225B-A23B — score 54 Sources: reddit/r/LocalLLaMA

Laguna M.1 Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token designed for agentic coding and long-horizon work. # Highlights * Large sparse MoE for agentic coding: Laguna M.1 is a 70-layer MoE transformer with 225B total parameters and 23B ac

Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟡 Fearless Concurrency on the GPU: Safe GPU inference in Rust, competitive with vLLM/SGLang [R] — score 69 Sources: reddit/r/MachineLearning

I maintain cuTile Rust and just posted the paper "Fearless Concurrency on the GPU." As more GPU code gets AI-generated, the bottleneck moves from writing it to trusting it. cuTile Rust lets you write or generate GPU kernels whose memory safety and data-race freedom are verified by the compiler, thro

🟡 Lightricks/LTX-2 — Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model. — score 43 Sources: github_trending

Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.

Research Papers

🟡 JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising — score 65 Sources: huggingface

Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough challenge. Existing optimization-based methods are slow and can produce oversaturated colors. In contrast, naive stitching approaches fail to produce geome

🟡 FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows — score 55 Sources: huggingface

Conditional diffusion and flow models routinely fail to satisfy the very constraints that define their task. For instance, a depth-conditioned model often produces images whose re-extracted depth disagrees with the input, even though the forward operator--the depth predictor defining the constraint-

🟡 DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis — score 55 Sources: huggingface · arxiv/cs.AI

Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a

Other Signals

🟡 GLM-5.2 Is The Best Open Weight Creative Writing Model — score 68 Sources: reddit/r/LocalLLaMA

As Per Sam Paech's Creative Writing Benchmark on EQ Bench: https://eqbench.com/creative_writing.html

🟡 OSS models decisively overtook Proprietary models in market share (based on the last 3 months of OpenRouter data) — score 61 Sources: reddit/r/LocalLLaMA

🟡 The Korean telecom giant at the center of Anthropic's Mythos controversy — score 50 Sources: hackernews

🟢 Incremental

Model Releases

🟢 Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter — score 25 Sources: reddit/r/LocalLLaMA

Hey! We heard the feedback on making the model more portable and accessible. So in light of that we have 2 updates to share. First, you can pull a new 4-bit quant straight from Hugging Face, so it’s now small enough to run on a Mac or wh

🟢 [NEW MODEL] SupraLabs just released SupraVL-Nano-900k, a Vision-Language Model built entirely from scratch! — score 4 Sources: reddit/r/LocalLLaMA

Hey r/LocalLLaMA! We just released SupraVL-Nano-900k, our first VLM. It has ~900k parameters, was trained from scratch on Flickr8k, and the entire architecture fits in a single Jupyter notebook. This is not a production model, it's a fully transparent, readable blueprint for anyone who wants to

Developer Tools

🟢 Looking for 3–4 people with running AI agents to test a multi-agent collaboration platform ($20/hour) — score 39 Sources: reddit/r/AIAgents

Hey everyone, I’m looking for 3–4 people who already have AI agents running and are willing to help test a multi-agent collaboration platform I’ve built. The platform allows agents to connect with other agents in a controlled/supervised environment. An agent can create a session, invite “friend” age

🟢 labring/FastGPT — FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration. — score 37 Sources: github_trending

FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive s

🟢 livekit/agents — A framework for building realtime voice AI agents 🤖🎙️📹 — score 26 Sources: github_trending

A framework for building realtime voice AI agents 🤖🎙️📹

🟢 Google DeepMind unveils plan to protect itself from its own rogue AI agents — score 17 Sources: reddit/r/AIAgents

🟢 Sharing my DIY framework that gives AI coding agents eyes — they can finally see the UI they build (open source) — score 8 Sources: reddit/r/AIAgents

I kept hitting the same wall with coding agents: they're blind . An agent writes a web page, a chart, an SVG, a PDF… and never actually sees the result. It reasons from source code and terminal output, then confidently says "done" while the button overflows, the text fails contrast, the chart

Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟢 GLM-5.2 (744B, 2-bit) at 7.3 tok/s on 4×3090 + 192GB — and why IQ1_M wasn't any faster — score 18 Sources: reddit/r/LocalLLaMA

TLDR: For the first time, I feel relief that they could shut down the cloud services and I would be ok. I got my 4th 3090 and then unsloth dropped the Q2 and Q1. I wrote nothing else here its from CC, so it might be wrong. GLM-5.2 UD-IQ2_M runs across 4×3090 + RAM expert offload at ~7.3 tok/s. Two

Business & Funding

🟢 LQ50/LQ50-24GB cost around $1200 — score 39 Sources: reddit/r/LocalLLaMA

Well found this shit on TAOBAO very expensive

Enterprise Adoption

🟢 Voice debugging at the conversation level seems far more useful than isolated benchmark metrics [D] — score 19 Sources: reddit/r/MachineLearning

I have been thinking a lot about how poorly isolated benchmark metrics capture real conversational system quality once models are deployed into multi-turn environments. You can have strong STT scores, decent latency, high task completion rates, and still end up with conversations that humans perceiv

Research Papers

🟢 HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining — score 25 Sources: huggingface

Embodied foundation models are expected to benefit from data scaling like large language models, but face a much tighter data bottleneck. Teleoperated real-robot trajectories remain the dominant pretraining source due to their precise action supervision and embodiment alignment, yet their scalabilit

🟢 No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages — score 5 Sources: huggingface

Large Language Models (LLMs) have significantly advanced the automation of software engineering tasks. One prominent example is code generation, where an LLM produces code in a specified programming language based on a natural language description. Most research in this area has focused on high-reso

Other Signals

🟢 Researchers trained a Deep Research agent with 32 H100s and open-sourced everything — score 32 Sources: reddit/r/LocalLLaMA

Ohio State University's NLP team released QUEST-35B, an open-source Deep Research agent trained using ~32 H100s and ~8K synthetic samples. The team open-sourced the training recipe, code, weights and datasets. Benchmark results show competitive performance against several frontier Deep Research

🟢 Latent space interpretation [R] — score 31 Sources: reddit/r/MachineLearning

Hi all, I have trained a convolutional autoencoder on a set of medical images. Further classified latent feature maps using random forest to find the top scoring feature map. Now my goal is to understand which input image is captured in top scoring latent feature map. Any suggestions? I have tried e

🟢 Zen and the Art of Machine Learning Research — score 17 Sources: hackernews

🟢 GLM-5.2 can now run locally in llama.cpp and Unsloth Studio. — score 11 Sources: reddit/r/LocalLLaMA

The 2-bit model retains ~82% accuracy after we shrunk it from 1.51TB to 238GB (-84% size). Run on a 256GB Mac or RAM/VRAM setups. GLM-5.2 is the strongest open model to date. Check the graph for the accuracy of each GLM-5.2-GGUF quantization. Full guide: https://unsloth.ai/docs/models/glm-5.2 GGUF:

Repo	Description	Stars Today	Language
Kilo-Org/kilocode	Kilo is the all-in-one agentic engineering platform. Build, ship, and iterate faster with the most popular open source coding agent.	1345	typescript
K-Dense-AI/scientific-agent-skills	Turn any AI agent into an AI Scientist. The #1 Agent Skills library for science, used by 160,000+ scientists worldwide. 140 ready-to-use skills plus 100+ scientific databases covering biology, chemistry, medicine, and drug discovery. Compatible with Cursor, Claude Code, Codex, Pi, Antigravity, and the open Agent Skills standard.	174	python
BuilderIO/agent-native	A framework for building agent-native applications.	172	typescript
garrytan/gbrain	Garry's Opinionated OpenClaw/Hermes Agent Brain	167	typescript
microsoft/qlib	Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, including supervised learning, market dynamics modeling, and RL, and is now equipped withhttps://github.com/microsoft/RD-Agentto automate R&D process.	92	python
openai/skills	Skills Catalog for Codex	75	python
Lightricks/LTX-2	Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.	51	python
cocoindex-io/cocoindex-code	A super light-weight embedded code search engine CLI (AST based) that just works - saves 70% token and improves speed for coding agent 🌟 Star if you like it!	48	python
labring/FastGPT	FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.	42	typescript
livekit/agents	A framework for building realtime voice AI agents 🤖🎙️📹	19	python

📄 New Papers

Title	Category	Hotness	Link
S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence	research_paper	22	Open
Playful Agentic Robot Learning	research_paper	29	Open
DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects	research_paper	14	Open
JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising	research_paper	12	Open
FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows	research_paper	8	Open
DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis	research_paper	5	Open
Deontic Policies for Runtime Governance of Agentic AI Systems	cs.AI	0	Open
Measuring Curriculum Alignment across Topical Coverage, Competency, and Cognitive Depth: A Longitudinal Framework Applied to CS2013 and CS2023	cs.AI	0	Open
Diffusion Language Models: An Experimental Analysis	cs.AI	0	Open
Hidden Anchors in Multi-Agent LLM Deliberation	cs.AI	0	Open
DeXposure-Claw: An Agentic System for DeFi Risk Supervision	cs.AI	0	Open
LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data	cs.AI	0	Open
REVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer's Disease Risk	cs.AI	0	Open
Emergent Alignment	cs.AI	0	Open
ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: New usage analytics and updated spend controls for enterprises
OpenAI: Improving health intelligence in ChatGPT

🐦 Twitter/X Highlights

Account	Tweet Summary
xai	Grok models are now available on Databricks Agent Bricks. Bring SpaceXAI's latest models to your enterprise data to power capable AI agents. https://x.ai/news/grok-databricks Post
AnthropicAI	New Frontier Red Team blog: Phase 2 of Project Fetch, where we test how well Claude can program a robodog. Opus 4.7, on its own, was ~20x faster than last year's best human team aided by Opus 4.1. (The robodog, alas, still failed to fetch a beach ball.) https://www.anthropic.com/research/project-fet Post
GoogleDeepMind	Pinned: Instead of assuming AI will always do what we intend, we ask: what if it doesn't? That’s why we’ve developed our AI Control Roadmap: a framework for building and managing the advanced AI we deploy within Google. 🧵 Post

Repeated From Recent Briefings

google-research/timesfm — TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting. - first seen 2026-05-02
Egonex-AI/Understand-Anything — Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. - first seen 2026-05-21
Next-Latent Prediction Transformers [R] - first seen 2026-06-17
I think most AI voice agent demos hide the hardest part: the listening layer - first seen 2026-06-18
calesthio/OpenMontage — World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio. - first seen 2026-06-18
anthropics/financial-services - first seen 2026-05-07
unsloth GLM-5.2-GGUF , including 2bit at 238GB - first seen 2026-06-18
Is foundational AI research still something that can be done without access to HPC? [D] - first seen 2026-06-18
openai/codex — Lightweight coding agent that runs in your terminal - first seen 2026-05-10
continuedev/continue — open-source coding agent - first seen 2026-06-18
... plus 87 more repeated items in processed data

AI Watchtower Briefing — 2026-06-19

🔴 High Significance

Model Releases

Developer Tools

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Laguna M.1 Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token designed for agentic coding and long-horizon work. # Highlights * Large sparse MoE for agentic coding: Laguna M.1 is a 70-layer MoE transformer with 225B total parameters and 23B ac

Infrastructure & Compute

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Business & Funding

Enterprise Adoption

Research Papers

Other Signals

📄 New Papers

🏢 Lab Blog Posts

🐦 Twitter/X Highlights

Repeated From Recent Briefings

AI Watchtower Briefing — 2026-06-19

🔴 High Significance

Model Releases

Developer Tools

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Laguna M.1 Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token designed for agentic coding and long-horizon work. # Highlights * Large sparse MoE for agentic coding: Laguna M.1 is a 70-layer MoE transformer with 225B total parameters and 23B ac

Infrastructure & Compute

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Business & Funding

Enterprise Adoption

Research Papers

Other Signals

📈 Trending Repos

📄 New Papers

🏢 Lab Blog Posts

🐦 Twitter/X Highlights

Repeated From Recent Briefings