AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Claude Opus 4.8 — score 92 Sources: hackernews

Developer Tools

🔴 I spent three months researching AI phone control, and trust seems more important than features — score 94 Sources: reddit/r/AIAgents

Not long ago, I wanted to find an AI agent that could control a phone, so I went on Reddit to see what people were actually using these tools for. What struck me most was how specific and everyday the use cases were. A lot of the things people want AI to handle are repetitive phone tasks, but they a

🔴 Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild — score 89 Sources: reddit/r/LocalLLaMA

Been following the infrastructure side of AI more lately and stumbled on this from Zai. They upgraded the network architecture on a thousand-GPU cluster running GLM-5.1 coding inference from the standard ROFT setup to something they built called ZCube, developed with Tsinghua University and HarnetsA

🔴 run-llama/liteparse — A fast, helpful, and open-source document parser — score 87 Sources: github_trending

A fast, helpful, and open-source document parser

🔴 Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue — score 75 Sources: hackernews

🔴 How do companies protect proprietary prompts from contractors and consulting engineers? — score 72 Sources: reddit/r/AIAgents

Prompts are a core part of the IP for my client. We’re speeding up development by bringing in 2–3 external contract engineers, but we don’t want to fully expose the underlying prompts/workflows to them. Are there any tools, gateways, or architectures people are using to partially protect prompts fro

Business & Funding

🔴 A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P] — score 81 Sources: reddit/r/MachineLearning

Hello everyone. The new dataset is named MONET, is Apache 2.0 and available on HF: https://huggingface.co/datasets/jasperai/monet **MONET is open, Apache 2.0-licensed image–text dataset. It was built from 2.9 billion images and refined to 104.9 milli

Research Papers

🔴 UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering — score 95 Sources: huggingface

Activation-based control steers large language models (LLMs) by intervening on their internal representations during inference, and has emerged as an effective paradigm for controlling behaviors such as persona and style. However, existing methods often rely on fixed steering directions or task-spec

🔴 When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems — score 78 Sources: huggingface · arxiv/cs.AI

The design space of agentic AI inference spans two extremes: frontier large language models (LLMs), typically hosted in the cloud and offering strong performance across a wide range of tasks at substantially high cost, and more cost-efficient small language models (SLMs), which are amenable to on-de

🔴 RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains — score 72 Sources: huggingface · arxiv/cs.LG

Pointwise reward modeling offers critical signals for LLM post-training, yet struggles with absolute scoring in subjective, non-verifiable settings. Rubric-based methods address this by decomposing evaluation into explicit criteria, but existing approaches typically depend on frontier LLMs and suffe

Other Signals

🔴 I've just benchmarked myself: — score 96 Sources: reddit/r/LocalLLaMA

🔴 StepFun 3.7 Flash — score 82 Sources: reddit/r/LocalLLaMA

StepFun dropped Step 3.7 Flash, 196B total / 11B active MoE, runs locally on 128GB RAM It's a multimodal MoE (196B total params, only 11B active) with a built-in 1.8B ViT for vision. Benchmark highlights vs. other flash-tier models: - SWE-Bench Pro: 56.26% (beats DeepSeek V4 Flash at 55.6%, matches

🔴 HF models page now has a "Base only" toggle to filter out finetunes/quants/etc — score 75 Sources: reddit/r/LocalLLaMA

a feature that was requested a lot: https://huggingface.co/models?base_model_relation=base

🟡 Notable

Model Releases

🟡 @AnthropicAI: We've raised $65 billion in Series H funding at a $965 billion post-money valuation, led by @AltimeterCap, Dragoneer, @Greenoaks, and @sequoia. This investment will help us advance our research and e — score 50 Sources: twitter_rss

We've raised $65 billion in Series H funding at a $965 billion post-money valuation, led by @AltimeterCap, Dragoneer, @Greenoaks, and @sequoia. This investment will help us advance our research and expand our capacity to meet growing demand for Claude.

🟡 @xai: Grok Build 0.2.7 is now out, with /usage, /login, shared terminals across subagents, and improved image understanding See all updates at https://x.ai/build/changelog — score 50 Sources: twitter_rss

Grok Build 0.2.7 is now out, with /usage, /login, shared terminals across subagents, and improved image understanding See all updates at https://x.ai/build/changelog

🟡 @MistralAI: We're taking on the hardest problems in the real world 🏗️🚚 🛫⚛️ Today at The AI Now Summit, held at the Louvre, we announced AI solutions for aerospace, automotive, energy, and physics. Deployed in p — score 50 Sources: twitter_rss

We're taking on the hardest problems in the real world 🏗️🚚 🛫⚛️ Today at The AI Now Summit, held at the Louvre, we announced AI solutions for aerospace, automotive, energy, and physics. Deployed in production at @Airbus , @BMW, @EDFofficiel , and more. More below:

🟡 Wall-OSS-0.5: 4B VLA with open training code and zero-shot real-robot evaluation[D] — score 44 Sources: reddit/r/MachineLearning

Wall-OSS-0.5 is a new 4B VLA release from X Square Robot, built on a 3B VLM backbone with action experts in a Mixture-of-Transformers layout. What caught my eye is that the report evaluates the pretrained checkpoint on real robots before task-specific fine tuning, instead of only reporting downstrea

🟡 Claude Code – Everything You Can Configure That the Docs Don't Tell You — score 42 Sources: hackernews

Developer Tools

🟡 Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R] — score 69 Sources: reddit/r/MachineLearning

Just wanted to share my research regarding probe-targeted fine-tuning (LoRa) for verbal confidence calibration., If you probe the hidden states of an instruct-tuned LLM, it can tell correct from incorrect answers at 0.76–0.88 AUROC. But when you ask it directly it tends to respond with confidence at

🟡 anthropics/claude-code — Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands. — score 62 Sources: github_trending

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

🟡 The hidden tax of web search: 80% of my agent’s tokens are wasted on garbage — score 50 Sources: reddit/r/AIAgents

Spent the weekend digging into the unit economics of my research agent and honestly got a little horrified. Turns out most of the token usage isn’t going into actual reasoning, it’s getting burned on navigation menus, footers, and cookie banners. We’re feeding smart models gigabytes of boilerplate j

🟡 The most common AI memory failure isn't a hallucination. It's a stale fact that never got corrected. — score 50 Sources: reddit/r/AIAgents

A user changes a preference, the old fact keeps winning retrieval, and your AI confidently acts on something that stopped being true months ago. No error is thrown, no alert fires, the system just quietly gets it wrong forever. That is not a model problem, that is a memory architecture problem.

🟡 How Endava builds an agentic organization with Codex — score 50 Sources: lab_blog/OpenAI

Learn how Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks to hours.

Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟡 LiquidAI/LFM2.5-8B-A1B · Hugging Face — score 61 Sources: reddit/r/LocalLLaMA

looks like you can run it on any potato (A1B)! https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF from LiquidAI: LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-train

Research Papers

🟡 PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions — score 65 Sources: huggingface · arxiv/cs.AI

We address the task of generating physically accurate and visually faithful 4D Human-Object Interaction (HOI). Given a static 3D human and target object represented as 3D Gaussian Splats (3DGS), our goal is to synthesize dynamic scenes where the human actively engages with the object through actions

🟡 Thinking Before Constraining: A Unified Decoding Framework for Large Language Models — score 52 Sources: huggingface · arxiv/cs.AI

Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can inadvertently restrict reasoning capabilities by imposing cons

Other Signals

🟡 Beware!! Users trying to fork and steal your projects — score 68 Sources: reddit/r/LocalLLaMA

Context! User u/Worried_Goat_8604 claimed to have made a similar but unrelated project to my SmallCode. He framed it as "I made this before you, but we can collab if you make me co-founder". In reality, he made a low effort fork of MY project 2 day

🟡 Various LLM Smells — score 58 Sources: hackernews

🟡 Liquid AI releases LFM2.5-8B-A1B — score 54 Sources: reddit/r/LocalLLaMA

Liquid AI released LFM2.5-8B-A1B, an edge model designed to power real-life applications. It builds on LFM2-8B-A1B with three major upgrades: an expanded 128K context window, 38T tokens of pre-training (up from 12T), and large-scale reinforcement learning. It also comes with a doubled vocabulary to

🟡 How are people reducing inference costs in multi-step AI agents? — score 53 Sources: reddit/r/AIAgents

I’m on the Tensormesh team, and I’m trying to better understand how people building AI agents are handling inference costs when agents make many calls per task. One pattern we see is that the same context often gets processed repeatedly: - system prompts - tool definitions - retrieved docs - pol

🟡 llama: use f16 mask for FA to save VRAM by am17an · Pull Request #23764 · ggml-org/llama.cpp — score 46 Sources: reddit/r/LocalLLaMA

now you can download more VRAM ;) (by downloading new llama.cpp version)

🟢 Incremental

Model Releases

🟢 llama.cpp B9387 Significant AMD/ROCm PP Update — score 39 Sources: reddit/r/LocalLLaMA

https://github.com/ggml-org/llama.cpp/releases/tag/b9387 MFMA is restricted to AMD CDNA architecture that's MI100, MI200, MI300 series datacenter cards. Post your initial results if you try it! wink

🟢 Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days — score 17 Sources: reddit/r/AIAgents

🟢 Oculus Founders' AI Startup Sesame Launches Human-Like Voice AI App on iOS — score 17 Sources: reddit/r/AIAgents

🟢 Qwen 3.6 27B overdoing it — score 4 Sources: reddit/r/LocalLLaMA

Although I'm very impressed with Qwen3.6 and is my most used model, I feel that sometimes it being too proactive and start doing things I didn't ask, from creating tests for the last modification to reverting changes I made - eg removing an hardcoded value - that it thinks are instead useful to keep

Developer Tools

🟢 ariadng/metatrader-mcp-server — Model Context Protocol (MCP) to enable AI LLMs to trade using MetaTrader platform — score 36 Sources: github_trending

Model Context Protocol (MCP) to enable AI LLMs to trade using MetaTrader platform

🟢 The AI agent gold rush is skipping the consumer, and I think that's the actual opportunity — score 30 Sources: reddit/r/AIAgents

Quick disclosure: I'm building a vertical agent in sports, so I'm biased. But there's a gap here I can't stop thinking about and I want to know if this sub sees it too. We spent thirty years building software for a human with a cursor. That's quietly ending. The fastest-growing user of your product

🟢 OpenMOSS/MOSS-TTS — MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS. — score 29 Sources: github_trending

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental soun

🟢 mastra-ai/mastra — From the team behind Gatsby, Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack. — score 27 Sources: github_trending

From the team behind Gatsby, Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.

🟢 Use HTML as the primary chat language for your agents so they can draw diagrams — score 25 Sources: reddit/r/LocalLLaMA

A week or two ago Thariq published an article on how good AI's were at working with HTML and that there was not really any reason to use markdown anymore. And yet all of our coding agents work with markdown and output markdown and have been trained

Omitted 8 additional developer tools items from the main section; see raw data and source-specific sections below.

Research Papers

🟢 Geometry Matters: 3D Foundation Priors for Learning Semantic Correspondence — score 35 Sources: huggingface

Foundation features from self-supervised vision models and text-to-image diffusion models have proven effective for semantic correspondence estimation. However, because these features are learned primarily from 2D image objectives, they lack explicit 3D awareness and often confuse symmetric object s

🟢 Towards Consistent Video Geometry Estimation — score 10 Sources: huggingface

This work presents ViGeo, a feed-forward foundation model for recovering spatially dense and temporally consistent geometry from video sequences. Built upon a plain transformer architecture without task-specific architectural modifications, ViGeo supports streaming, full-sequence, and long-video inf

Other Signals

🟢 StepFun 3.7 Flash - Speed Benchmark in M5 Max — score 32 Sources: reddit/r/LocalLLaMA

Just ran a benchmark with day-0 shipped llama.cpp's branch. M5 Max: 128 GB - Q4_K_S / memory peak around ~120+ GB making things sluggish but still usable once cmd+tab landed. Short context < 16k feels fast and very responsive. 32k-64k's speed is not bad, usable. |PP|TG|B|N_KV|T_PP s|S_PP t/

🟢 The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin — score 25 Sources: hackernews

🟢 Orchestrating AI code review at scale — score 8 Sources: hackernews

Repo	Description	Stars Today	Language
run-llama/liteparse	A fast, helpful, and open-source document parser	932	rust
anthropics/claude-code	Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.	319	python
apurvsinghgautam/robin	AI-Powered Dark Web OSINT Tool	91	python
ariadng/metatrader-mcp-server	Model Context Protocol (MCP) to enable AI LLMs to trade using MetaTrader platform	83	python
OpenMOSS/MOSS-TTS	MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.	71	python
mastra-ai/mastra	From the team behind Gatsby, Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.	65	typescript
microsoft/RAMPART	A pytest-native safety and security testing framework for agentic AI applications	62	python
CodeWithCJ/SparkyFitness	SparkyFitness: Built for Families. Powered by AI. Track food, fitness, water, and health — together.	45	typescript

📄 New Papers

Title	Category	Hotness	Link
UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering	research_paper	18	Open
When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems	research_paper	6	Open
RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains	research_paper	4	Open
PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions	research_paper	3	Open
Thinking Before Constraining: A Unified Decoding Framework for Large Language Models	research_paper	2	Open
Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction	cs.AI	0	Open
Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction	cs.AI	0	Open
The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling	cs.AI	0	Open
Ultra-Reduced-Impact-Encased-Logging (URIEL): propose a new method for selective sustainable logging and post-harvest silvicultural treatment in tropical forest using airborne robotics systems	cs.AI	0	Open
Review Arcade: On the Human Alignment and Gameability of LLM Reviews	cs.AI	0	Open
Orthogonal Concept Erasure for Diffusion Models	cs.AI	0	Open
Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes	cs.AI	0	Open
VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis	cs.AI	0	Open
BEAMS: Benchmarking and Evaluating AI for Modeling and Simulation	cs.AI	0	Open
Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: How Endava builds an agentic organization with Codex

🐦 Twitter/X Highlights

Account	Tweet Summary
AnthropicAI	We've raised $65 billion in Series H funding at a $965 billion post-money valuation, led by @AltimeterCap, Dragoneer, @Greenoaks, and @sequoia. This investment will help us advance our research and expand our capacity to meet growing demand for Claude. Post
xai	Grok Build 0.2.7 is now out, with /usage, /login, shared terminals across subagents, and improved image understanding See all updates at https://x.ai/build/changelog Post
MistralAI	We're taking on the hardest problems in the real world 🏗️🚚 🛫⚛️ Today at The AI Now Summit, held at the Louvre, we announced AI solutions for aerospace, automotive, energy, and physics. Deployed in production at @Airbus , @BMW, @EDFofficiel , and more. More below: Post

Repeated From Recent Briefings

harry0703/MoneyPrinterTurbo — 利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM. - first seen 2026-05-28
Lum1104/Understand-Anything — Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. - first seen 2026-05-21
AI-generated CUDA kernels silently break training and inference [R] - first seen 2026-05-28
NousResearch/hermes-agent — The agent that grows with you - first seen 2026-05-11
farion1231/cc-switch — A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io - first seen 2026-05-08
mukul975/Anthropic-Cybersecurity-Skills — 754 structured cybersecurity skills for AI agents · Mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND & NIST AI RMF · agentskills.io standard · Works with Claude Code, GitHub Copilot, Codex CLI, Cursor, Gemini CLI & 20+ platforms · 26 security domains · Apache 2.0 - first seen 2026-05-24
anthropics/skills — Public repository for Agent Skills - first seen 2026-05-11
earendil-works/pi — AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods - first seen 2026-05-09
twentyhq/twenty — The open alternative to Salesforce, designed for AI. - first seen 2026-05-25
anthropics/financial-services - first seen 2026-05-07
... plus 450 more repeated items in processed data

AI Watchtower Briefing — 2026-05-29

🔴 High Significance

Model Releases

Developer Tools

Business & Funding

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Research Papers

Other Signals

📈 Trending Repos

📄 New Papers

🏢 Lab Blog Posts

🐦 Twitter/X Highlights

Repeated From Recent Briefings