AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Apple reveals new AI architecture built around Google Gemini models — score 90 Sources: hackernews

🔴 Me: Arguing with an AI bot who just posted something on this sub about Llama 3.1. — score 77 Sources: reddit/r/LocalLLaMA

For real tho, these bots need to turn on their web search functions and quit living in the past. It’s bad enough we gotta deal with all the “Qwen3.6 27b helped me quit drinking and brought my dog back from the dead” posts. Sheesh /s

🔴 Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax — score 70 Sources: reddit/r/LocalLLaMA

Hey fellow Llamas, your time is precious, so I'll keep it short (while trying to explain everything lol). TL;DR: * 33-35B MoE on a 16 GB GPU. Qwen3.6 35B-A3B: 13.3 GiB (was ~20.5). Laguna XS.2 33B-A3B: 14.6 GiB (was 18.8). Both measured on an RTX 3090, both under 16 GiB. * **Only the active

Developer Tools

🔴 Anyone running browser-using agents at any kind of scale? What's your infrastructure looking like? — score 94 Sources: reddit/r/AIAgents

We have a research agent that browses the web to gather info (think: shopping comparison, lead enrichment, competitive intel). Works fine for low concurrency. Falls apart at 20+ concurrent browser sessions. We've tried local Playwright pool, Browserbase, and a few self-hosted setups. None of them ar

🔴 My company is having me vibecode an Argus replacement — score 81 Sources: reddit/r/AIAgents

AI has given a lot of non-technical people delusional amounts of self-confidence. The company I work at is big into AI and everyone is vibecoding things left and right. They are having me build an Argus (the real estate valuation suite) replacement and I have been given a deadline of 2 weeks to repl

🔴 google/skills — Agent Skills for Google products and technologies — score 77 Sources: github_trending

Agent Skills for Google products and technologies

Infrastructure & Compute

🔴 Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server — score 97 Sources: reddit/r/LocalLLaMA

Just saw Xiaomi MiMo announce MiMo-V2.5-Pro UltraSpeed, claiming they broke the 1,000 tokens/sec output barrier on a 1 trillion parameter MoE model. According to them, they’re doing it on a single standard 8-GPU node, not custom wafer-scale hardware like Cerebras and not SRAM-heavy hardw

Research Papers

🔴 SwiftVR: Real-Time One-Step Generative Video Restoration — score 80 Sources: huggingface

Real-time video restoration (VR) for live streams requires high-resolution outputs under strict per-frame latency constraints. Existing one-step diffusion-based VR models remain difficult to deploy on consumer-grade GPUs due to two main bottlenecks: quadratic spatial attention at high resolutions an

Other Signals

🔴 STOP racist posts about Chinese researchers [D] — score 94 Sources: reddit/r/MachineLearning

Edit: the original post targeting Chinese researchers is removed by the mods. Sorry for any confusion. Yes, I'm calling it out. It IS racism. As an active member of r/MachineLearning and a researcher who is ethnic Chinese, I am DISGUSTED by unfounded accusations against the group of researchers who

🔴 When every other post is an AI generated benchmark report, a question about the best model, or a slop-coded application or engine that pretends to be groundbreaking — score 90 Sources: reddit/r/LocalLLaMA

🔴 Gemma 4 Chat Template now has preserve thinking — score 83 Sources: reddit/r/LocalLLaMA

🔴 Should ArXiv backtrack endorsement? [D] — score 81 Sources: reddit/r/MachineLearning

ArXiv has an endorsement system for a reason. I would only offer endorsement to whom I have direct academic collaboration or mentorship with, since I'm putting my own academic reputation on the stake. This is also the standard of almost any serious academic researcher I am aware of. Now ArXiv is mak

🔴 xAI is looking more like a datacentre REIT than a frontier lab — score 70 Sources: hackernews

🟡 Notable

Model Releases

🟡 Quick note on the QAT of recent — score 57 Sources: reddit/r/LocalLLaMA

tldr: Googles quant is broken, use unsloth UD Q4_K_XL for now This might be low quality post, but oh well, we ball llama-quantize will quant the token embed to q6k when Google really was supposed to use "--pure" but that’s only the first problem The llama-quantize quant function is hardcoded to -7 w

🟡 2X tk/s (from 19.4 -> 38.1 tk/s on 1 x MI50) Playing with a hypothesis like speculative decoding.. but instead of an additional side model, exploiting that I can run multiple computations side-by-side AS IF I had Qwen3.6-27B loaded twice in memory - small quants don't use all the available compute. — score 50 Sources: reddit/r/LocalLLaMA

MODS: if you wanna remove for slop, that's cool - once I have something like a llama.cpp patch I'll repost something people can use. I'll write a full article on my Medium account with how it works. Just got excited and wanted to share. *** Forgive the claude summary, in the readme, but the base

🟡 Be very FR - would someone (or you) pay a virtual AI assitant? should I build this? It can read emails, reschedule your whole week, manage your calls, and improve as it is used more. — score 50 Sources: reddit/r/AIAgents

Did not find a pinned weekly post for feedback, but do let me know if this post needs to be removed. I've been experimenting with many many AI agents since last year, and while i love my claude subscription, it exhausts out pretty quick. So i started tinkeding with self hosted solutions and I now

Developer Tools

🟡 My team's AI usage got so expensive they quietly rolled back the mandate — score 69 Sources: reddit/r/AIAgents

Our engineering leadership went all in on AI about three months ago. Every ticket, every PR review, every design doc had to go through their shiny new enterprise copilot setup. They even started tracking adoption metrics in standups. So we used it. For everything. Pasting entire codebases into conte

🟡 danny-avila/LibreChat — Enhanced ChatGPT Clone: Features Agents, MCP, Skills, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User Auth, Presets, open-source for self-hosting. Active — score 57 Sources: github_trending

Enhanced ChatGPT Clone: Features Agents, MCP, Skills, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User A

🟡 Apple Core AI Framework — score 50 Sources: hackernews

🟡 Confidential submission of draft S-1 to the SEC — score 50 Sources: lab_blog/OpenAI

OpenAI confirms a confidential S-1 submission to the SEC and has not yet determined timing for further action.

🟡 @AnthropicAI: New Science Blog: Why has AI advanced faster in coding than in biology? To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffi — score 50 Sources: twitter_rss

New Science Blog: Why has AI advanced faster in coding than in biology? To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic. How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

Omitted 5 additional developer tools items from the main section; see raw data and source-specific sections below.

Research Papers

🟡 Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short — score 62 Sources: huggingface · arxiv/cs.AI

Reinforcement learning with verifiable rewards (RLVR) has become a leading paradigm for improving the reasoning ability of large language models through outcome-based supervision. However, verifiable rewards frequently become uninformative at the group level: when all sampled traces of a given promp

🟡 Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops — score 48 Sources: huggingface · arxiv/cs.AI

Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 (16%) hackable by frontier models given only the task description. This corrupts both le

🟡 Chiaroscuro Attention: Spending Compute in the Dark — score 48 Sources: huggingface · arxiv/cs.AI

Standard transformers apply self-attention uniformly at every layer and token, regardless of whether the input requires dynamic cross-token interaction. We propose CHIAR-Former (Chiaroscuro Attention), a 4-layer hybrid transformer that routes each token to one of three operators - DCT spectral mixin

🟡 Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text — score 48 Sources: huggingface · arxiv/cs.AI

Chain-of-Thought (CoT) improves the performance of Large Language Models (LLMs) and has been extended to Multimodal Large Language Models (MLLMs). More recent work further moves from text-based multimodal reasoning toward interleaved-modal reasoning, where intermediate steps can incorporate both tex

🟡 PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment — score 48 Sources: huggingface · arxiv/cs.CL

Long-horizon agentic tasks pose a fundamental credit assignment challenge for outcome-base reinforcement learning: trajectory-level rewards verify final correctness but provide limited guidance on which intermediate reasoning steps or tool interactions contribute to the outcome. The difficulty is es

Other Signals

🟡 Was BitNet a dead end? What happened to ternary LLMs? — score 63 Sources: reddit/r/LocalLLaMA

They seemed so promising at one point but the biggest ternary model is still 2B. What happened? Why aren't the frontier open weights AI labs attempting to use them?

🟢 Incremental

Model Releases

🟢 ggml-webgpu: Improve prefill speeds for k-quants + refactor matmul for Q4/Q5/Q8 and k-quants by yomaytk · Pull Request #24225 · ggml-org/llama.cpp — score 30 Sources: reddit/r/LocalLLaMA

This PR improves matmul performance for k-quants. The following table shows the improvement on the pp512 test in M2 pro. |quant|model|master (t/s)|PR (t/s)|speedup| |:-|:-|:-|:-|:-| |Q2_K|qwen3 0.6B Q2_K - Med

🟢 Best agentic workflows for finance research, prospecting, and team productivity? — score 25 Sources: reddit/r/AIAgents

I’m currently an intern at a small investment bank / securities firm, mostly around sales and trading, but the internship is pretty broad. I work with traders, researchers, market makers, and generally help wherever I can. One thing I’ve noticed is that there does not seem to be anyone in the office

Developer Tools

🟢 Université Paris Saclay or TU Delft for Applied Mathematics Masters [R] — score 38 Sources: reddit/r/MachineLearning

I've been admitted into both UPS and TUD for Applied Mathematics, and I wanted to hear some advice on which one would be better. For context, I'd like to work in some form of AI research, most likely within industry. At the moment, I'm most interested in privacy preserving machine learning or mechan

🟢 777genius/agent-teams-ai — You're the boss, agents are your team. They handle tasks on their own, message each other, and review each other's work. You just watch the kanban board and give high-level commands. Codex/Claude/OpenCode(200+ models, 75+ LLM providers, free models no auth). Build your AI company with multiple teams. — score 37 Sources: github_trending

You're the boss, agents are your team. They handle tasks on their own, message each other, and review each other's work. You just watch the kanban board and give high-level commands. Codex/Claude/OpenCode(200+ models, 75+ LLM providers, free models no auth). Build your AI company with multiple teams

🟢 xerrors/Yuxi — 结合知识库、知识图谱管理的多租户 Agent Harness 平台。 An agent harness that integrates a LightRAG knowledge base and knowledge graphs. Build with LangChain + Vue + FastAPI, support DeepAgents、MinerU PDF、Neo4j 、MCP. — score 28 Sources: github_trending

结合知识库、知识图谱管理的多租户 Agent Harness 平台。 An agent harness that integrates a LightRAG knowledge base and knowledge graphs. Build with LangChain + Vue + FastAPI, support DeepAgents、MinerU PDF、Neo4j 、MCP.

🟢 I want to create an agent that sends me an email every morning? — score 25 Sources: reddit/r/AIAgents

How do I do this? It will gather data and send me an email every morning, what’s the free way to do this? Nothing complex

🟢 marin-community/marin — Open-source framework for the research and development of foundation models. — score 21 Sources: github_trending

Open-source framework for the research and development of foundation models.

Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.

Research Papers

🟢 Text-to-Image Models Need Less from Text Encoders Than You Think — score 25 Sources: huggingface

Text-to-image models rely on text prompts as their primary interface to human intent. Prompts are encoded by a text encoder into embeddings that condition the image generation process. Beyond individual token meanings, text embeddings encode contextual information across the full prompt, such as com

Other Signals

🟢 Jetbrains Mellum 2: a really good and performant model — score 37 Sources: reddit/r/LocalLLaMA

Oh Hey Folks, I took the Mellum 2 model for a spin, so I wanted to share my impressions here. >Disclaimer: the tests presented here are not cientific nor have those nice names like perplexity,etc. These tests are somewhat more akin to what Im working in a daily basis or how useful a model is help

🟢 Screen recording instead of prompting to pass more context fast — score 30 Sources: reddit/r/AIAgents

I had a task where I wanted to: * connect to a vpn * ssh into a server port forward * then open a grafana dashboard, capture some stat * capture the detail in the right place in an excel and send a message on slack All the tasks that I want to automate required a lot of prompting with minute details

🟢 Microsoft's open source tools were hacked to steal passwords of AI developers — score 30 Sources: hackernews

🟢 Anyone seen benchmarks comparing Gemma 4 4-bit QAT vs. 8-bit standard quants? — score 23 Sources: reddit/r/LocalLLaMA

I'm trying to find out if anyone has done any benchmarking comparing the Gemma 4 4-bit QAT models (via Unsloth) against standard 8-bit non-QAT quants. I know QAT is supposed to retain a ton of accuracy compared to the baseline BF16, but I'm curious how a 4-bit QAT model actually fares against a trad

🟢 Gemma 4 26B A4B IT QAT Comparison — score 17 Sources: reddit/r/LocalLLaMA

Hopefully this isn't too low effort of a post. I just finished the benchmarks and I figured I'd post them online because they certainly were insightful for me. I did not use any AI other than asking Gemini 3.1 Pro if it was statistically significant because I was too tired to do inferential statisti

Omitted 3 additional other signals items from the main section; see raw data and source-specific sections below.

Repo	Description	Stars Today	Language
google/skills	Agent Skills for Google products and technologies	461	python
danny-avila/LibreChat	Enhanced ChatGPT Clone: Features Agents, MCP, Skills, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User Auth, Presets, open-source for self-hosting. Active	141	typescript
langchain-ai/deepagents	The batteries-included agent harness.	96	python
alistaitsacle/free-llm-api-keys	Free LLM API keys for GPT-5.5, Claude, DeepSeek, Gemini, Grok — copy, paste, use. Updated 3-5x daily. No credit card needed.	76	python
magenta/magenta-realtime	Magenta RealTime 2: An Open-Weights Live Music Model	74	python
777genius/agent-teams-ai	You're the boss, agents are your team. They handle tasks on their own, message each other, and review each other's work. You just watch the kanban board and give high-level commands. Codex/Claude/OpenCode(200+ models, 75+ LLM providers, free models no auth). Build your AI company with multiple teams.	60	typescript
xerrors/Yuxi	结合知识库、知识图谱管理的多租户 Agent Harness 平台。 An agent harness that integrates a LightRAG knowledge base and knowledge graphs. Build with LangChain + Vue + FastAPI, support DeepAgents、MinerU PDF、Neo4j 、MCP.	39	python
marin-community/marin	Open-source framework for the research and development of foundation models.	33	python

📄 New Papers

Title	Category	Hotness	Link
SwiftVR: Real-Time One-Step Generative Video Restoration	research_paper	9	Open
Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short	research_paper	5	Open
PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow	cs.AI	0	Open
OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs	cs.AI	0	Open
Syll: Open-Source Personal Automation with Cross-Surface Execution	cs.AI	0	Open
A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline	cs.AI	0	Open
Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning	cs.AI	0	Open
Automatic Extraction of Structured Information from Brain MRI Reports Using an Open-Weight Large Language Model	cs.AI	0	Open
Some hypotheses on how chatbots work in problem-solving-driven conversations. Large Language Models as confirmation of the Innovation Illusion	cs.AI	0	Open
Land cover and flood type govern the detection limits of satellite-based flood mapping across diverse global flood events	cs.AI	0	Open
Reconstructing and forecasting disease trajectories of patients with Alzheimer's disease using routine data in resource-constrained settings	cs.AI	0	Open
Improving Multimodal Reasoning via Worst Dimension Optimization	cs.AI	0	Open
Beyond Goodhart's Law: A Dynamic Benchmark for Evaluating Compliance in Multi-Agent Systems	cs.AI	0	Open
Where Instruction Hierarchy Breaks: Diagnosing and Repairing Failures in Reasoning Language Models	cs.AI	0	Open
Scaling Participation in Modular AI Systems	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: Confidential submission of draft S-1 to the SEC

🐦 Twitter/X Highlights

Account	Tweet Summary
AnthropicAI	New Science Blog: Why has AI advanced faster in coding than in biology? To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic. How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology Post
swyx	It's finally out!!! @METR_Evals found that more than half of SWEBench results is unmergeable slop. FrontierCode represents over 1000+ hours of maintainer validated software engineering work most frontier models cannot yet solve, much less solve with high quality. Cog had IOI Gold medalists and top c Post

Repeated From Recent Briefings

mvanhorn/last30days-skill — AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary - first seen 2026-06-05
LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents - first seen 2026-06-05
yikart/AiToEarn — Let's use AI to Earn! - first seen 2026-05-11
aaif-goose/goose — an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM - first seen 2026-05-07
Panniantong/Agent-Reach — Give your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees. - first seen 2026-06-06
TauricResearch/TradingAgents — TradingAgents: Multi-Agents LLM Financial Trading Framework - first seen 2026-05-02
Crosstalk-Solutions/project-nomad — Project N.O.M.A.D, is a self-contained, offline survival computer packed with critical tools, knowledge, and AI to keep you informed and empowered—anytime, anywhere. - first seen 2026-05-08
SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating - first seen 2026-06-08
Imbad0202/academic-research-skills — Academic Research Skills for Claude Code: research → write → review → revise → finalize - first seen 2026-05-13
heygen-com/hyperframes — Write HTML. Render video. Built for agents. - first seen 2026-05-10
... plus 194 more repeated items in processed data

AI Watchtower Briefing — 2026-06-09

🔴 High Significance

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Research Papers

Other Signals

📈 Trending Repos

📄 New Papers

🏢 Lab Blog Posts

🐦 Twitter/X Highlights

Repeated From Recent Briefings