AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 The captcha arms race is making autonomous web tasks practically impossible — score 94 Sources: reddit/r/AIAgents

Im working on a fairly standard procurement agent for a client right now. the stack is just python, playwright, and gpt-4o for vision and dom parsing. the reasoning logic works perfectly on my local machine, but the second I deploy it to a cloud server, it just gets absolutely obliterated by bot pro

🔴 I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model. — score 89 Sources: reddit/r/LocalLLaMA

I’ve been experimenting with how small a usable neural TTS model can realistically get, and I just released Inflect-Nano-v1. Inflect-Nano is one of the smallest TTS models, and it performs surprisingly well for its model weight. Even if you have a certified potato computer, it can run on that. I

🔴 Your most useful AI so far? (can be a tool or an agent) — score 83 Sources: reddit/r/AIAgents

ngl AI has really helped me boost my productivity and the more I go deeper with it the more it gets more interesting! I started discovering AI agents wherein you combine all the tools through LLMs, API and such in the company I am working with right now and so far I wish I learned it for a long time

🔴 A robot is sprinting towards you. Do you want it running on Claude or Grok? — score 83 Sources: hackernews

🔴 We need a 80-160B model urgently. The unified memory device market needs more Models. — score 82 Sources: reddit/r/LocalLLaMA

Hello guys, I will keep myself short. There are so many people that have a lot but not enough of "slow" RAM. Anybody with a Apple Device with >96GB Anybody with a Ryzen AI 395 Device with >96GB Anybody with a DGX Spark Even people with RTX 6000 Pros or 4x3090s or other configurations. Or P

Developer Tools

🔴 GLM-5.2 is a win for local AI — score 96 Sources: reddit/r/LocalLLaMA

I know GLM 5.2's massive 753B footprint means none of us are running it at home without an enterprise cluster, but having a true frontier-level, MIT-licensed coding agent out in the wild makes me optimistic. The distillation potential here is massive. Once the community starts fine-tuning smaller 8B

🔴 Multivariate Probability Models in Machine Learning [D] — score 81 Sources: reddit/r/MachineLearning

Hello Folks, we start our discussion on Lecture 10 of Probabilistic Machine Learning, now starting with Probability Multivariate Models. Univariate models are toy cases, in real life, ML models are multivariate. To understand dependence of more than one variables on each other we study ideas as Cova

🔴 Full Hermes setup guide with Lm studio — score 72 Sources: reddit/r/AIAgents

https://youtu.be/c_Yh2bTP0nQ

Research Papers

🔴 SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior — score 70 Sources: huggingface · arxiv/cs.LG

Sparse Autoencoders (SAEs) decompose residual-stream activations into interpretable features. Recent latent-space defenses increasingly rely on these decompositions, assuming that identified "unsafe" SAE features serve as actionable handles for monitoring and intervention. In this paradigm, clamping

Other Signals

🔴 PSA: unsloth/GLM-5.2-GGUF is uploading — score 75 Sources: reddit/r/LocalLLaMA

Went to check Unsloth's HF to see if they uploaded GLM-5.2 GGUFs, and found the repo was created half an hour ago. It only has the readme for now. I suspect GGUFs are uploading

🟡 Notable

Model Releases

🟡 @OpenAI: Introducing LifeSciBench, a benchmark for measuring and improving how well AI supports real-world life science research. Developed with 173 scientists from biotechnology and pharmaceutical research, — score 60 Sources: twitter_rss

Introducing LifeSciBench, a benchmark for measuring and improving how well AI supports real-world life science research. Developed with 173 scientists from biotechnology and pharmaceutical research, LifeSciBench includes 750 expert-authored tasks across seven biological research workflows. https://o

🟡 llama.cpp now supports model management (downloading etc) via API — score 54 Sources: reddit/r/LocalLLaMA

#23976 got merged a couple hours ago, which means llama.cpp can now not only load/unload models on demand from a directory, but also download them on demand. No UI yet, but that's coming pretty soon. This means you can now deploy llama.cpp, expose

🟡 Local Qwen isn't a worse Opus, it's a different tool — score 50 Sources: hackernews

🟡 Lin Junyang AI Lab Closes Round at $2B Valuation — score 46 Sources: reddit/r/LocalLLaMA

A new lab from Lin Junyang can only be good news for open source / weights, I think. Excited to see what the lead responsible for the Qwen line does next.

Developer Tools

🟡 ACL 2026 first author with weak GPA. How should I approach PhD applications? [D] — score 69 Sources: reddit/r/MachineLearning

Hi everyone, I have a fairly weak undergraduate: a 3.3/5 GPA in Computer Engineering from an average Nigerian university. For my Master's, I studied Artificial Intelligence at an average European university, where I finished with an 8/10 GPA. A condensed version of my Master's thesis was recently ac

🟡 bytedance/UI-TARS-desktop — The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra — score 68 Sources: github_trending

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

🟡 infiniflow/ragflow — RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs — score 61 Sources: github_trending

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

🟡 calesthio/OpenMontage — World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio. — score 57 Sources: github_trending

World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio.

🟡 roboflow/rf-detr — RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO, designed for fine-tuning. [ICLR 2026] — score 51 Sources: github_trending

RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO, designed for fine-tuning. [ICLR 2026]

Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟡 Is foundational AI research still something that can be done without access to HPC? [D] — score 50 Sources: reddit/r/MachineLearning

I'm not that well versed in ML yet. I know that "Attention is all you need" was based on work that was done with a couple of high end gaming GPUs at the time. I can afford that. Suppose for arguments sake that I have caught up on ML such that I have the competence to recreate state of the art result

Business & Funding

🟡 Leaked financial docs show OpenAI is losing billions of dollars a year — score 68 Sources: reddit/r/LocalLLaMA

Research Papers

🟡 Native Active Perception as Reasoning for Omni-Modal Understanding — score 62 Sources: huggingface · arxiv/cs.CL

Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive frameworks have emerged, they often rely on global pre-scanning, and their

🟡 Sumi: Open Uniform Diffusion Language Model from Scratch — score 58 Sources: huggingface · arxiv/cs.CL

Diffusion models have become a promising alternative to autoregressive models. Among these, uniform diffusion language models (UDLMs) permit any token to be updated at any step, in principle enabling more flexible generation. However, no UDLM has yet been pretrained from scratch at both large parame

🟡 Learning User Simulators with Turing Rewards — score 45 Sources: huggingface · arxiv/cs.CL

Learning to simulate human users in interactive settings could advance the training of agent assistants, evaluation of personalization systems, research in the social sciences, and more. Existing approaches generally do so by training a large language model (LLM) to match a single ground truth respo

Other Signals

🟡 llama.cpp - how to free up even more space on your GPU — score 61 Sources: reddit/r/LocalLLaMA

For the past week or two, llama.cpp has been working much better from the RAM usage prespective. I no longer see any memory leaks, and everything fits nicely on the GPU - my defaults are --n-gpu-layers 99 --no-mmap --mlock to avoid using the regular RAM, since I use my 3090 with an eGPU setup: Q

🟡 @xai: Grok is now available on Amazon Bedrock. AWS developers can now build with Grok 4.3, the industry leader in hallucination rate and tool calling, powered by Bedrock’s secure inference engine. — score 60 Sources: twitter_rss

Grok is now available on Amazon Bedrock. AWS developers can now build with Grok 4.3, the industry leader in hallucination rate and tool calling, powered by Bedrock’s secure inference engine.

🟡 @GoogleDeepMind: We’re working with @SciTechgovuk, >@mhclg and @i_dot_ai on a new AI housing application planning prototype. 🏡 By cutting down the time spent on repetitive tasks, it could help planning officers focus — score 50 Sources: twitter_rss

We’re working with @SciTechgovuk, >@mhclg and @i_dot_ai on a new AI housing application planning prototype. 🏡 By cutting down the time spent on repetitive tasks, it could help planning officers focus their attention on complex projects and reduce processing times by up to 50%. → https://goo.gle/4xzq

🟢 Incremental

Model Releases

🟢 Quick thoughts on GLM-5.2 (Bonus: Censorship question answers) — score 11 Sources: reddit/r/LocalLLaMA

I've been working with GLM-5.2 pretty much non-stop since it was released as an API. So yeah, take it with a grain of salt as API inference is not perfectly controllable. I'm calling it through Z.ai - so I'd like to think that it's a high quality iteration of the model, but I can't k

Developer Tools

🟢 promptfoo/promptfoo — Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic. — score 36 Sources: github_trending

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.

🟢 microsoft/RD-Agent — Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through R&D-Agent, which lets AI drive data-driven AI. 🔗https://aka.ms/RD-Agent-Tech-Report — score 31 Sources: github_trending

Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through R&D-Agent, which lets AI drive data-d

🟢 Any idea if AAAI will be harsh on computer vision paper as last year? [R] — score 19 Sources: reddit/r/MachineLearning

Hello everyone, I have a computer vision paper ready for submission, a coauthor have suggested submitting it to AAAI. However last year computer vision papers have gotten a very small acceptance rate at AAAI, with reviewers receiving emails to specifically tell them that the acceptance rate for comp

🟢 Building an agent that needs live web data, the fetch part keeps killing it — score 17 Sources: reddit/r/AIAgents

Building an agent that needs live web data, the fetch part keeps killing it. Put together an agent that pulls live info off the web to answer queries. The agent logic was honestly the easy part. The data fetching keeps breaking. Soon as it needs to scrape google search results or hit anything with a

🟢 I think most AI voice agent demos hide the hardest part: the listening layer — score 17 Sources: reddit/r/AIAgents

Everyone shows the cool part of AI voice agents: * realistic voice * smart answer * booking appointment * CRM update * call summary * "wow it sounds human" But the boring listening layer is where a lot of these products break. If the speech-to-text is bad, the agent becomes dumb before the LLM even

Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟢 alexzhang13/rlm — General plug-and-play inference library for Recursive Language Models (RLMs), supporting various sandboxes. — score 38 Sources: github_trending

General plug-and-play inference library for Recursive Language Models (RLMs), supporting various sandboxes.

🟢 openobserve/openobserve — Open source observability platform for logs, metrics, traces, frontend monitoring, pipelines and LLM observability. A sophisticated, simple and highly performant alternative to Datadog, Splunk, and Elasticsearch with 140x lower storage costs and single binary deployment. — score 28 Sources: github_trending

Open source observability platform for logs, metrics, traces, frontend monitoring, pipelines and LLM observability. A sophisticated, simple and highly performant alternative to Datadog, Splunk, and Elasticsearch with 140x lower storage costs and single binary deployment.

🟢 How do you analyze the relative "strength" of probes? [R] — score 6 Sources: reddit/r/MachineLearning

This question is related to topics like language+ models (including multimodal) and things like "circuit" analyses. I think something related might come up in my work (factuality guarantees for model outputs) and I'm trying to orient to the SoTA. I found [this old post](https://www.neelnanda.io/mech

Research Papers

🟢 IndustryBench-MIPU: Benchmarking Multi-Image Attribute Value Extraction for Industrial Products — score 20 Sources: huggingface

Industrial products such as valves and circuit breakers are defined by dense technical specifications that govern procurement, compatibility, and safety across supply chains. These specifications are scattered across multiple heterogeneous product images, including specification tables, nameplates,

Other Signals

🟢 CEOs of Anthropic and Google DeepMind call for U.S.-led AI coalition in meeting at G7 — score 32 Sources: reddit/r/LocalLLaMA

https://www.cnbc.com/2026/06/17/anthropic-amodei-google-hassabis-us-ai-coalition-g7.html [https://www.politico.eu/article/ai-artificial-intelligence-anthropic-china-g7/](https://www.politico.eu/article/ai-art

🟢 What does provisional paper acceptance mean in ECCV? Is that the default message everyone gets? [D] — score 31 Sources: reddit/r/MachineLearning

What does provisional paper acceptance mean in ECCV? Is that the default message everyone gets?

🟢 price rising effect is wild.. — score 18 Sources: reddit/r/LocalLLaMA

https://preview.redd.it/6f2gghbgqy7h1.png?width=2980&format=png&auto=webp&s=eacde26d8d0154aabc9884d4c31607aa12ace68c Q.01 out soon? i dont really need precision after all..

🟢 I Emailed 12,000 Businesses About Their Websites. Here's What Happened. — score 17 Sources: reddit/r/AIAgents

A few weeks ago I analyzed around 12,000 business websites and emailed each business explaining the issues I found on their website and why those issues could be hurting their business. The interested reply rate was bouncing between 5% and 9%. I've been having a lot of fun lately automating a proces

🟢 I open sourced a vendor-neutral authorization for AI agents. — score 14 Sources: reddit/r/AIAgents

TL;DR: I open-sourced apparitor. It checks every agent action (an MCP tool call, an agent-to-agent invoke, a tool call inside a guardrail) against a policy engine before it runs and blocks the ones that aren't allowed. Fail-closed, works with OPA/Cedar/OpenFGA, Apache-2.0. AI agents act through tool

Omitted 1 additional other signals items from the main section; see raw data and source-specific sections below.

Repo	Description	Stars Today	Language
bytedance/UI-TARS-desktop	The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra	150	typescript
infiniflow/ragflow	RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs	105	python
calesthio/OpenMontage	World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio.	98	python
roboflow/rf-detr	RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO, designed for fine-tuning. [ICLR 2026]	80	python
continuedev/continue	open-source coding agent	49	typescript
alexzhang13/rlm	General plug-and-play inference library for Recursive Language Models (RLMs), supporting various sandboxes.	43	python
promptfoo/promptfoo	Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.	41	typescript
microsoft/RD-Agent	Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through R&D-Agent, which lets AI drive data-driven AI. 🔗https://aka.ms/RD-Agent-Tech-Report	32	python
openobserve/openobserve	Open source observability platform for logs, metrics, traces, frontend monitoring, pipelines and LLM observability. A sophisticated, simple and highly performant alternative to Datadog, Splunk, and Elasticsearch with 140x lower storage costs and single binary deployment.	26	typescript
Lampese/codex-switcher	A Desktop Application for Managing Multiple OpenAI Codex CLI Accounts	8	rust

📄 New Papers

Title	Category	Hotness	Link
SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior	research_paper	13	Open
Native Active Perception as Reasoning for Omni-Modal Understanding	research_paper	10	Open
Sumi: Open Uniform Diffusion Language Model from Scratch	research_paper	6	Open
Continuous Audio Thinking for Large Audio Language Models	cs.CL	0	Open
Redact or Keep? A Fully Local AI Cascade for Educational Dialogue De-Identification	cs.CL	0	Open
SproutRAG: Attention-Guided Tree Search with Progressive Embeddings for Long-Document RAG	cs.CL	0	Open
Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation	cs.CL	0	Open
JetFlow: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting	cs.CL	0	Open
CoreMem: Riemannian Retrieval and Fisher-Guided Distillation for Long-Term Memory in Dialogue Agents	cs.CL	0	Open
VISUALSKILL: Multimodal Skills for Computer-Use Agents	cs.CL	0	Open
LLM Parameters for Math Across Languages: Shared or Separate?	cs.CL	0	Open
Montreal Forced Aligner and the state of speech-to-text alignment in 2026	cs.CL	0	Open
Possible or Definite? A Benchmark for Evaluating Diagnostic Uncertainty Preservation in Clinical Text	cs.CL	0	Open
PreUnlearn: Auditing Collateral Knowledge Damage Before Large Language Model Unlearning	cs.CL	0	Open
Towards Scalable Customization and Deployment of Multi-Agent Systems for Enterprise Applications	cs.CL	0	Open

🐦 Twitter/X Highlights

Account	Tweet Summary
OpenAI	Introducing LifeSciBench, a benchmark for measuring and improving how well AI supports real-world life science research. Developed with 173 scientists from biotechnology and pharmaceutical research, LifeSciBench includes 750 expert-authored tasks across seven biological research workflows. https://o Post
xai	Grok is now available on Amazon Bedrock. AWS developers can now build with Grok 4.3, the industry leader in hallucination rate and tool calling, powered by Bedrock’s secure inference engine. Post
GoogleDeepMind	We’re working with @SciTechgovuk, >@mhclg and @i_dot_ai on a new AI housing application planning prototype. 🏡 By cutting down the time spent on repetitive tasks, it could help planning officers focus their attention on complex projects and reduce processing times by up to 50%. → https://goo.gle/4xzq Post

Repeated From Recent Briefings

Panniantong/Agent-Reach — Give your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees. - first seen 2026-06-06
farion1231/cc-switch — A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io - first seen 2026-05-08
Next-Latent Prediction Transformers [R] - first seen 2026-06-17
google-research/timesfm — TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting. - first seen 2026-05-02
anthropics/skills — Public repository for Agent Skills - first seen 2026-05-11
rohitg00/ai-engineering-from-scratch — Learn it. Build it. Ship it for others. - first seen 2026-05-21
Kairos: A Native World Model Stack for Physical AI - first seen 2026-06-16
earendil-works/pi — AI agent toolkit: unified LLM API, agent loop, TUI, coding agent CLI - first seen 2026-05-09
PaddlePaddle/PaddleOCR — Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages. - first seen 2026-05-09
openai/codex — Lightweight coding agent that runs in your terminal - first seen 2026-05-10
... plus 405 more repeated items in processed data

AI Watchtower Briefing — 2026-06-18

🔴 High Significance

Model Releases

Developer Tools

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Business & Funding

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

📈 Trending Repos

📄 New Papers

🐦 Twitter/X Highlights

Repeated From Recent Briefings