AW · AI Watchtower

🔴 High Significance

Model Releases

🔴 Browse CVPR 2026 papers on PapersWithCode [P] — score 81 Sources: reddit/r/MachineLearning

https://preview.redd.it/se5nr2z7tt4h1.png?width=3046&format=png&auto=webp&s=7db15b73afb749da236e5bb50ff96372f6a3239b Hi, Niels here from the open-source team at Hugging Face. It's been 2 weeks since I [launched](https://www.reddit.com/r/MachineLearning/comments/1tgmwqr/reviving_paperswit

Developer Tools

🔴 What's the hardest part of operating AI agents at scale? — score 83 Sources: reddit/r/AIAgents

Building agents seems to be getting easier thanks to frameworks and tooling, but operating them in production still feels like an open question. When something goes wrong, how do teams investigate it? How do you track tool usage, audit decisions, and understand why an agent took a particular action?

🔴 RTX Spark does not have 600GB/s Bandwith — score 79 Sources: reddit/r/LocalLLaMA

Check the slides from Computex. Every outlet that reported 600GB/s is completely wrong. That is the NvLink speed like everyone here said.

🔴 AI Agent Guidelines for CS336 at Stanford — score 79 Sources: hackernews

🔴 VibeCoding is becoming the biggest illusion in software engineering. — score 74 Sources: reddit/r/AIAgents

People are celebrating: "I built a SaaS app in 4 hours." "Al replaced my backend team." "Production-ready with one prompt." But almost nobody shows what happens 3 months later. That's where the real engineering starts. The problem with vibe coding is simple: It optimizes for speed of cr

🔴 What LLM eval tools are people actually using in production? — score 72 Sources: reddit/r/AIAgents

I've reached the point where manually checking outputs doesn't really scale anymore. Started looking at different evaluation tools, but honestly it's hard to tell which ones people are genuinely using versus which ones just look good in demos. For teams running LLM apps or agents in production: What

Infrastructure & Compute

🔴 NVIDIA GB300 Grace Blackwell Ultra pricetags — score 71 Sources: reddit/r/LocalLLaMA

https://www.scan.co.uk/shop/ai-and-robotics/workstations-ai/nvidia-dgx-station

Research Papers

🔴 LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation — score 75 Sources: huggingface

Autoregressive (AR) video diffusion enables variable-length synthesis, but long-horizon generation often suffers from accumulated errors and identity drift. For efficiency, existing methods commonly adopt sliding-window attention during generation. This creates an irreversible generation trajectory:

Other Signals

🔴 Stop asking what model to run. There are literally only two. — score 96 Sources: reddit/r/LocalLLaMA

Can we please ban the daily "I have an RTX 3060, what should I run?" slop threads? It’s not complicated. As of right now, Hugging Face is empty and exactly two local models exist on this entire planet: * Qwen 3.6 35b a3b * Qwen 3.6 27b That is the entire list. Your specs don’t matter. Your u

🔴 CS336: Language Modeling from Scratch — score 93 Sources: hackernews

🔴 I trusted random person on this subreddit and bought 3080 20gb made of chinesium — score 88 Sources: reddit/r/LocalLLaMA

I don't know how long it will last, but it works, and I want 2 more now.

🟡 Notable

Model Releases

🟡 @xai: Composer 2.5 is now available inside Grok Build. Composer 2.5 is a fast, highly intelligent model that excels on long-running tasks and following complex instructions. — score 60 Sources: twitter_rss

Composer 2.5 is now available inside Grok Build. Composer 2.5 is a fast, highly intelligent model that excels on long-running tasks and following complex instructions.

Developer Tools

🟡 How Bad MCP design cost your Agent 5× more tokens — score 56 Sources: reddit/r/AIAgents

MCP is the golden standard for LLM Agent tools, but the quality of MCP tools design can dramatically impact the Agent's token and context window consumption. I recently did some experiments on two MCP implementations with identical functionalities, and found that one of them has really bad performan

🟡 OpenAI frontier models and Codex are now available on AWS — score 50 Sources: hackernews

🟡 Codex is becoming a productivity tool for everyone — score 50 Sources: lab_blog/OpenAI

The Next Era of Knowledge Work report explores how Codex is transforming productivity through AI-powered research, data analysis, workflow automation, and content creation.

🟡 Our views on AI policy and political advocacy — score 50 Sources: lab_blog/OpenAI

Our approach to AI policy and political advocacy, transparency, support for thoughtful regulation and AI safety, and that no outside political group speaks on the company’s behalf.

🟡 @AnthropicAI: Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission. Pending completion of SEC review, this gives us the option to pursue an initial pu — score 50 Sources: twitter_rss

Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission. Pending completion of SEC review, this gives us the option to pursue an initial public offering. Read more: https://www.anthropic.com/news/confidential-draft-s1-sec

Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟡 Finetuning a Reasoning LLM with Supervised or Reinforcement Learning? [D] — score 69 Sources: reddit/r/MachineLearning

Hello, I have a task to fine-tune small LLMs on annotated conversational data. The dataset contains not only the final answers, but also reasoning traces and tool-calling decisions (i.e., when the model should think and when it should call a tool). I am wondering what the best training approach woul

🟡 Building the infrastructure for the Intelligence Age in Michigan — score 50 Sources: lab_blog/OpenAI

OpenAI breaks ground on a 1GW data center project in Michigan as part of Stargate, building AI infrastructure to expand access, create jobs, and support communities.

Research Papers

🟡 FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search — score 60 Sources: huggingface · arxiv/cs.CL

Agentic search requires language model agents to explore many sources and answer complex information-seeking questions. Scaling test-time compute is a promising way to improve these agents, but current approaches can fail, because correct answers are often sparse and score-based selection depends on

🟡 Can Predicted Dynamics Exist in the Physical World? — score 60 Sources: huggingface · arxiv/cs.AI

Predictive Physical AI systems output state rollouts, action chunks, and latent plans, yet a low root-mean-square error (RMSE) does not imply that a particular proposal is physically executable. We formulate physical admissibility as a prediction-control interface: before execution, a decoded propos

🟡 Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation — score 50 Sources: huggingface

Sentence embeddings are a foundational component for semantic search, clustering, classification, and retrieval-augmented generation. This paper presents embeddingmagibu-200m, a Turkish-focused sentence embedding model that produces 768-dimensional L2-normalized vectors and supports an 8,192-token c

🟡 EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers — score 50 Sources: huggingface

This paper addresses the challenge of integrating 3D meshes as a native modality within Multimodal Large Language Models (MLLMs). Diffusion-based large reconstruction models decouple semantic understanding from geometric reasoning, operating as stateless reconstructors conditioned on dense 2D pixel

🟡 Confidence-Adaptive SwiGLU for Mixture-of-Experts — score 42 Sources: huggingface · arxiv/cs.CL

SwiGLU has become a standard gated activation in modern Transformer MLPs, yet its gate sharpness -- the smoothness and selectivity of the gating function -- is typically fixed throughout training. In this work, we propose Confidence-Aware SwiGLU (κ-SwiGLU), a variant of SwiGLU for Mixture-of-Experts

Omitted 1 additional research papers items from the main section; see raw data and source-specific sections below.

Other Signals

🟡 Can the stockmarket swallow Anthropic, SpaceX and OpenAI? — score 64 Sources: hackernews

🟡 Man trains local model to detect and kill mosquitos with a laser — score 62 Sources: reddit/r/LocalLLaMA

Now this is local AI innovation we can all get behind. https://x.com/stevencheng/status/2059836738449854898

🟡 I hate to be this guy but: Any good, recent CODING models in the 70-80B range? — score 54 Sources: reddit/r/LocalLLaMA

3x 24GB vram. - Qwen-coder-next is not bad. I'll continue to use it if you yell enough at me. - I do a lot of front-end work, which develops rapidly, so the most recent the model the better. - Larger than 80B and I'll have to sacrifice the decentish Q6 quant, or the minimum (for coding) 256k conte

🟡 @OpenAI: OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new way to build on Amazon Bedrock with OpenAI through the security, compliance, and governance workflows they — score 50 Sources: twitter_rss

OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new way to build on Amazon Bedrock with OpenAI through the security, compliance, and governance workflows they already use. This is also the beginning of a broader expansion of OpenAI capabilities on AWS, inclu

🟡 Intel Arc Pro B70 llama.cpp benchmarks posted — score 46 Sources: reddit/r/LocalLLaMA

https://www.reddit.com/r/LocalLLM/comments/1tuf6l1/intel_arc_pro_b70_llamacpp_sycl_63_ts_on_qwen/

🟢 Incremental

Model Releases

🟢 I asked each of my AI agents to describe their own role. The answers were surprisingly honest. — score 36 Sources: reddit/r/AIAgents

I've been running a fleet of 5 agents (Claude, Gemini, Codex, Mistral, local Qwen) on a Mac Mini M4 for several months. They coordinate through a shared state layer I built called Flotilla. Last week a VC asked me who my team was. His face when I explained it was worth documenting. So I did somethin

Developer Tools

🟢 nvidia-LocateAnything-3B detects sushi as sweet in the video demo — score 29 Sources: reddit/r/LocalLLaMA

https://preview.redd.it/xc0l68bj7t4h1.png?width=616&format=png&auto=webp&s=48a8b14bc4ae95700cd4efa76772f4e71fb2d41a https://huggingface.co/nvidia/LocateAnything-3B funny how they left this in the demo atleast it's honest

🟢 ICML 2026 | PIEVO: Overcoming Static Priors in AI Scientists via Principle-Evolvable Scientific Discovery (SOTA Solution Quality & 83.3% Faster Convergence) — score 28 Sources: reddit/r/AIAgents

We are excited to share our latest framework, PIEVO (Principle-Evolvable scientific Discovery via Uncertainty Minimization), designed to address a fundamental limitation in current LLM-based scientific agents. # The Problem Existing AI Scientists (such as The AI Scientist, AI-Researcher, and

🟢 JetBrains open-sources Mellum2 - anyone tried these? — score 12 Sources: reddit/r/LocalLLaMA

🟢 What's the status of non-CUDA inference? — score 12 Sources: reddit/r/LocalLLaMA

I got a reminder e-Mail from eBay about a MI50 I had put on my watch list after quite a while. Aside from needing to jerryrig a blower into the back and bootstrapping ROCm - how is it? In fact, what's inference for LLMs like for non-CUDA? I know that image-gen is veeeeery hit or miss (although Comfy

🟢 The moment your AI agent's memory becomes load-bearing is the moment you realise you never built it to be infrastructure. — score 6 Sources: reddit/r/AIAgents

No audit trail. No correction interface. No migration path. Just six months of accumulated context that everything downstream depends on and nobody fully understands anymore. When did your memory layer stop feeling like a feature and start feeling like a liability?

Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟢 Alphabet announces $80B equity capital raise to expand AI infra and compute — score 21 Sources: hackernews

Business & Funding

🟢 I scraped over 2 million job postings across 100,000+ company career sites into a unified, daily-updated dataset. [P] — score 12 Sources: reddit/r/MachineLearning

Over the past few months, I've been working on a high-scale scraping pipeline to aggregate listings directly from company job boards and applicant tracking systems. Mapping over 100,000 distinct companies to their career pages turned out to be a massive engineering headache, but it's finally stable.

Research Papers

🟢 ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and Formats — score 15 Sources: huggingface

Charts are a primary medium for conveying quantitative and relational information, yet systematically evaluating chart parsing models remains difficult. Existing benchmarks focus on narrow chart types and leave diagrammatic structures such as flowcharts and mind maps largely unaddressed, while model

Other Signals

🟢 Real-time multilingual ASR using rolling buffers and monolingual models [P] — score 36 Sources: reddit/r/MachineLearning

I built a routing-based approach to lightweight real-time multilingual ASR as part of my research at Gladia. The core problem was how multilingual models that accurately handle mid-conversation language switches are often too big for most local hardware and have poor accuracy. So rather than relying

🟢 Florida sues OpenAI and Sam Altman over AI risks — score 36 Sources: hackernews

🟢 WiML at icml waitlist for travel funds [D] — score 31 Sources: reddit/r/MachineLearning

presenting a poster there, and have registration covered. but they are placing me on waitlist for travel funds. As my travel depends on whether I get the travel grant, I need to get this off of my mind, either invite me or just say no. I'm waiting forever for this, more wait again? should i ask for

🟢 Building an AI assistant for a complex multi-repo backend system — what's the right approach? — score 28 Sources: reddit/r/AIAgents

I work on a distributed backend system split across multiple microservices in separate repos. Understanding how a failure propagates across services is non-trivial even for experienced team members. I've been using Claude Code with context files describing each service's role, key code paths, and go

🟢 Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro — score 12 Sources: reddit/r/LocalLLaMA

Llama benchmark results |model|size|params|backend|ngl|threads|type_k|type_v|fa|test|t/s| |:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-| |qwen35moe 35B.A3B Q4_K - Medium|20.81 GiB|34.66 B|SYCL|99|1|q8_0|q8_0|1|pp512|977.40 ± 2.02| |qwen35moe 35B.A3B Q4_K - Medium|20.81 GiB|34.66 B|SYCL|99|1|q8_0|q8_0|

Omitted 2 additional other signals items from the main section; see raw data and source-specific sections below.

📄 New Papers

Title	Category	Hotness	Link
LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation	research_paper	9	Open
FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search	research_paper	3	Open
Can Predicted Dynamics Exist in the Physical World?	research_paper	3	Open
Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation	research_paper	3	Open
EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers	research_paper	3	Open
Position Paper: Post-Solve Robustness in Decision Engines: Feasible Regions and Smoothness Under Perturbations	cs.AI	0	Open
Emergent Collaborative Deliberation in Multi-Model AI Systems: A BFT-Derived Protocol for Epistemic Synthesis	cs.AI	0	Open
Deliberative Curation: A Protocol for Multi-Agent Knowledge Bases	cs.AI	0	Open
Agents on a Tree: Pathwise Coordination for Multi-Objective Molecular Optimization	cs.AI	0	Open
Optimal Transport-based Permutation-Invariant Bayesian Optimization of Offshore Wind Farm Layouts	cs.AI	0	Open
MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution	cs.AI	0	Open
Universal Quantum Transformer	cs.AI	0	Open
Grokers: Bottom-Up Inductive Comprehension and Write-Time Intelligence over Typed Knowledge Graphs	cs.AI	0	Open
Product-Aware Deep Autoencoders for Robust Process Monitoring in Multi-Product Cyber-Physical Systems	cs.AI	0	Open
On the evolution of the concept of probability as a mirror of the evolution of reason	cs.AI	0	Open

🏢 Lab Blog Posts

OpenAI: Codex is becoming a productivity tool for everyone
OpenAI: Our views on AI policy and political advocacy
OpenAI: Building the infrastructure for the Intelligence Age in Michigan

🐦 Twitter/X Highlights

Account	Tweet Summary
xai	Composer 2.5 is now available inside Grok Build. Composer 2.5 is a fast, highly intelligent model that excels on long-running tasks and following complex instructions. Post
AnthropicAI	Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission. Pending completion of SEC review, this gives us the option to pursue an initial public offering. Read more: https://www.anthropic.com/news/confidential-draft-s1-sec Post
OpenAI	OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new way to build on Amazon Bedrock with OpenAI through the security, compliance, and governance workflows they already use. This is also the beginning of a broader expansion of OpenAI capabilities on AWS, inclu Post

Repeated From Recent Briefings

harry0703/MoneyPrinterTurbo — 利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM. - first seen 2026-05-28
A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks - first seen 2026-05-28
What’s the actual focus in World Models right now? [R] - first seen 2026-06-01
farion1231/cc-switch — A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io - first seen 2026-05-08
nesquena/hermes-webui — Hermes WebUI: The best way to use Hermes Agent from the web or from your phone! - first seen 2026-06-01
LVSA: Training-Free Sparse Attention for Long Video Diffusion - first seen 2026-06-01
supermemoryai/supermemory — Memory engine and app that is extremely fast, scalable. The Memory API for the AI era. - first seen 2026-06-01
anthropics/claude-code — Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands. - first seen 2026-05-29
EveryInc/compound-engineering-plugin — Official Compound Engineering plugin for Claude Code, Codex, Cursor, and more - first seen 2026-05-13
run-llama/liteparse — A fast, helpful, and open-source document parser - first seen 2026-05-29
... plus 226 more repeated items in processed data

AI Watchtower Briefing — 2026-06-02

🔴 High Significance

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟡 Notable

Model Releases

Developer Tools

Infrastructure & Compute

Research Papers

Other Signals

🟢 Incremental

Model Releases

Developer Tools

Infrastructure & Compute

Business & Funding

Research Papers

Other Signals

Llama benchmark results |model|size|params|backend|ngl|threads|type_k|type_v|fa|test|t/s| |:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-| |qwen35moe 35B.A3B Q4_K - Medium|20.81 GiB|34.66 B|SYCL|99|1|q8_0|q8_0|1|pp512|977.40 ± 2.02| |qwen35moe 35B.A3B Q4_K - Medium|20.81 GiB|34.66 B|SYCL|99|1|q8_0|q8_0|

📄 New Papers

🏢 Lab Blog Posts

🐦 Twitter/X Highlights

Repeated From Recent Briefings