πŸ”΄ High Significance

Model Releases

πŸ”΄ Qwen Who? DiffusionGemma running at 1,500 tk/s on a Digital Pregnancy Test. β€” score 96 Sources: reddit/r/LocalLLaMA

First Doom, now DiffusionGwmma 4. We are truly living in the future. Who even needs a new Qwen release anymore? /s (Satire - Shaq doesn’t actually make a digital pregnancy test capable of running diffusion-based LLMs) Credit to Obvious Plant for the original Shaq pregnancy test box (that I doctored

πŸ”΄ Gemma 4 Quadruple Release, 12B, 12B QAT, 26B-A4B QAT and 31B QAT Uncensored Heretics! β€” score 89 Sources: reddit/r/LocalLLaMA

gemma-4-31B-it-qat-q4_0-unquantized-uncensored-heretic: Safetensors: https://huggingface.co/llmfan46/gemma-4-31B-it-qat-q4_0-unquantized-uncensored-heretic GGUF: [https://huggingface.co/llmfan46/gemma-4-3

πŸ”΄ Claude Fable is relentlessly proactive β€” score 70 Sources: hackernews

Developer Tools

πŸ”΄ We spent decades fixing software deployment. Why are we letting AI agents break it all over again? β€” score 94 Sources: reddit/r/AIAgents

I’ve been spending a lot of time setting up multi-agent workflows lately, and I can’t shake the feeling that we are aggressively re-inventing a bunch of structural problems that software engineering spent thirty years solving. it kinda feels like business bro's are creating a problem so that they ca

πŸ”΄ AI agent bankrupted their operator while trying to scan DN42 β€” score 90 Sources: hackernews

πŸ”΄ hexo-ai/sia β€” SIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task. β€” score 76 Sources: github_trending

SIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task.

πŸ”΄ Can you realistically start an automation business without a lot of money? β€” score 72 Sources: reddit/r/AIAgents

I've been thinking about getting into business automation, but most of the content I see makes it sound like you need a bunch of paid tools, subscriptions, software, ads, and a whole setup before you can even get started. For those of you who actually do automation for clients: Can someone start wit

Research Papers

πŸ”΄ HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers β€” score 82 Sources: huggingface Β· arxiv/cs.AI

Holistic visual tokenizers are fundamental to unified multimodal models (UMMs) as they map diverse visual inputs into a unified representation space. In this paper, we present HYDRA-X, the first UMM that unifies image and video tokenization within a single Vision Transformer (ViT). Our design is dri

πŸ”΄ From 2D Grids to 1D Tokens: Reforming Shared Representations for Multimodal Image Fusion β€” score 75 Sources: huggingface

Multimodal image fusion aims to integrate complementary information from different modalities into a fused image that preserves rich local details while maintaining globally consistent appearance. Existing approaches build shared representations on 2D feature grids, which excel at modeling local str

Other Signals

πŸ”΄ What models you guys running on 8GB? 16GB VRAM? 24GB? 32GB? 48GB? β€” score 82 Sources: reddit/r/LocalLLaMA

And what are you using for kv cache and context? What kind of performance are you getting? What is your hardware? And what are you using your models for? I figure with how fast everything moves, its worth asking once in a while to congeal our experiences.

πŸ”΄ New models released: Nex-N2 Pro 397B and Nex-N2 Mini 35B β€” score 75 Sources: reddit/r/LocalLLaMA

They are FTs of Qwen3.5 and the benchmarks look pretty good https://huggingface.co/nex-agi/Nex-N2-mini https://huggingface.co/nex-agi/Nex-N2-Pro

🟑 Notable

Model Releases

🟑 I distilled my 12 year experience as a product manager and built a free skill that takes you from "I have an app idea" to a real plan and solid MVP β€” score 63 Sources: reddit/r/AIAgents

I'm a PM. 12 years, mostly zero-to-one. I built a free skill that does the part of app-building everyone skips and then regrets. It's called vibe-check. Open-source, drops into Claude, Codex, or Antigravity. It doesn't write your code. AI does that now. It does the harder thing that comes before the

🟑 EAGLE3 has landed in llama.cpp β€” score 61 Sources: reddit/r/LocalLLaMA

After half a year of development, EAGLE3 has been merged into llama.cpp. EAGLE3 is similar to MTP, but different: the helper model gets extra guidance from the main model instead of guessing completely on its own.

🟑 is Gemini your main AI model today, or just a secondary option β€” score 61 Sources: reddit/r/AIAgents

I recently had a discussion with a friend who strongly prefers Gemini and Google products in general , his argument is that Google has access to massive amounts of data and arguably the best search engine in the world, so Gemini should have a significant advantage my opinion and experience has been

🟑 PSA: Test your "threads" argument in llama.cpp (+80% performance in my case) β€” score 54 Sources: reddit/r/LocalLLaMA

When GPT-OSS 120B has released last year I played around and tried to maximize it's performance. One thing that many people pointed out was that for hybrid CPU (Performance + Efficiency cores) you should use only P-cores with "--threads" argument and taskset/affinity. Back then I've setup that model

🟑 Anthropic apologizes for invisible Claude Fable guardrails β€” score 50 Sources: hackernews

Omitted 5 additional model releases items from the main section; see raw data and source-specific sections below.

Developer Tools

🟑 I put a hidden instruction in a document. My AI agent followed it. Here’s the repo. β€” score 50 Sources: reddit/r/AIAgents

Cloned a repo, ran an agent against a β€œresearch report,” watched it comply with instructions embedded in the document instead of summarizing it. The attack is in the repo. Run it yourself. Then run the protected version with Arc Gate and watch it get blocked. https://github.com/9hannahnine-jpg/vulne

🟑 @OpenAI: We heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business β€” score 50 Sources: twitter_rss

We heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business users with one free reset:

🟑 @xai: Install the @sentry plugin and ask your agent to find and fix errors, analyze stack traces, and triage alerts β€” score 50 Sources: twitter_rss

Install the @sentry plugin and ask your agent to find and fix errors, analyze stack traces, and triage alerts

🟑 @xai: Use the @vercel plugin to deploy to production, spin up sandboxes, or build apps with Shadcn. β€” score 50 Sources: twitter_rss

Use the @vercel plugin to deploy to production, spin up sandboxes, or build apps with Shadcn.

Research Papers

🟑 Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback β€” score 65 Sources: huggingface

Despite generating increasingly photorealistic images, text-to-image (T2I) models still exhibit localized, subtle, and structurally complex failures. Diagnosing these failures requires instance-level feedback that answers where a defect occurs, what type it is, why it is defective, and its importanc

🟑 ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages β€” score 45 Sources: huggingface Β· arxiv/cs.AI

Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, especially in multilingual and low-resource scenarios. This gap is critical in regions like rural India, where pa

Other Signals

🟑 Is Symbolic Regression still a thing, given LLMs' performance? [D] β€” score 69 Sources: reddit/r/MachineLearning

I've been teaching myself about Symbolic Regression (SR), which looks like a super exciting field. (A great intro resource below [1]). But then I was wondering: given LLMs' increasingly-growing power in generating code, which is in a way very similar to Symbolic Regression (or of course, even dire

🟑 Having some fun with LMX-Omni-52B-Halo in Open WebUI β€” score 68 Sources: reddit/r/LocalLLaMA

🟑 @GoogleDeepMind: Pinned: We’re teaming up @Palmeiras, the first football club to meaningfully build upon TacticAI: our AI system that can help simulate field scenarios and predict open play dynamics up to 8 seconds in β€” score 50 Sources: twitter_rss

Pinned: We’re teaming up @Palmeiras, the first football club to meaningfully build upon TacticAI: our AI system that can help simulate field scenarios and predict open play dynamics up to 8 seconds in advance. ⚽

🟑 Huawei Released openPangu 2.0 (Will open source on June 30) β€” score 46 Sources: reddit/r/LocalLLaMA

At the Huawei Developer Conference (HDC 2026) held on June 12, Richard Yu, Executive Director of Huawei, officially launched the brand-new, open-source Pangu large modelβ€”openPangu 2.0. The model is fully adapted to the HarmonyOS ecosystem and has achieved deep optimization and performance breakthrou

🟑 Post-docs in ML [D] β€” score 44 Sources: reddit/r/MachineLearning

Are there any websites listing post-doc job opening in machine learning? Currently I'm using LInkedIn to search for these. When I was a math post-doc, everyone used "MathJobs.org" to find jobs. Is there a similar website for machine learning? Thanks.

🟒 Incremental

Model Releases

🟒 Are AI agents making traditional software interfaces obsolete? β€” score 33 Sources: reddit/r/AIAgents

i was reading an enterprise tech trend report for 2026 and it got me thinking about how quickly the traditional SaaS GUI (graphical user interface) is losing its utility. for the last fifteen years, software design has been about building pretty, siloed dashboards. we’ve built our entire workflows a

🟒 Claude Fable 5: mid-tier results on coding tasks β€” score 30 Sources: hackernews

🟒 πŸš€PP-OCRv6 is officially released ! β€” score 25 Sources: reddit/r/LocalLLaMA

πŸ”₯PaddleOCR’s new OCR model series scales from 1.5M to 34.5M parameters, bringing stronger accuracy, faster inference, and broader deployment options β€” from browsers and edge devices to servers. πŸ“ŠWhat’s new: πŸ”ΈTiny / Small / Medium models: 1.5M, 7.7M, 34.5M params πŸ”Έ+4.9% detection accuracy and +5.1% r

🟒 Has anyone noticed that the behavior of the Kimi model has changed? β€” score 11 Sources: reddit/r/LocalLLaMA

I have been using Kimi K2.6 in Kimi Code for a while. Although it can complete most tasks, it often requires a long time to think and try. Today the model's CoT has become very short and concise, and it feels much improved on coding tasks compared to before I heard that GLM 5.2 is also about to be r

Developer Tools

🟒 mlflow/mlflow β€” The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data. β€” score 31 Sources: github_trending

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

🟒 always-further/nono β€” Capability-based agent runtime with fine-grained policies . Brokering access directly within the agent's operating context, with zero setup and zero latency β€” score 18 Sources: github_trending

Capability-based agent runtime with fine-grained policies . Brokering access directly within the agent's operating context, with zero setup and zero latency

🟒 How would you start selling automations? Where would you even begin? β€” score 17 Sources: reddit/r/AIAgents

I’m getting into building automations for businesses, but I’m a bit stuck on the first step. Like, I can imagine building solutions for repetitive work, internal processes, data entry, reporting, customer stuff, etc… but I don’t really know how people actually start selling this. So I’m curious: If

🟒 anthropics/claude-agent-sdk-python β€” score 13 Sources: github_trending

🟒 Building an Open Source Edge Semantic Cache for LLMs in Rust/WASM – Sanity check on the architecture? [D] β€” score 12 Sources: reddit/r/MachineLearning

Hey everyone, I am planning out a new open-source infrastructure project and want to get some brutal feedback on the architecture and use-case validity from people running high volume LLM workloads in production. The Problem: Python-based proxies/gateways introduce too much latency overhead for

Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.

Research Papers

🟒 Revisiting Articulated Parts Perception in Robot Manipulation β€” score 20 Sources: huggingface

We are surrounded by various objects with movable, articulated parts, e.g., box, handle, door. An accurate and generalizable perception of articulated parts is essential to enhance robotic manipulation capabilities. Building on this need, recent efforts in articulated parts perception have followed

🟒 Leveraging Morphology for Historical Script Metrological Analysis β€” score 20 Sources: huggingface

Advances in handwritten text recognition have enabled large-scale transcription of historical documents, but still provide limited access to interpretable visual measurements for paleography, the study of historical scripts. In this paper, our main insight is that morphological script analysis, in p

Other Signals

🟒 Best LLM for smut stories β€” score 39 Sources: reddit/r/LocalLLaMA

I'm trying to find the best LLM for writing erotica/smut, but there doesn't seem to be that many good models right now. I'm using Cydonia 24B v4.3, which gives great results, but I was wondering if there were even better models that could fit into 16GB VRAM with quantization. Sadly there doesn't see

🟒 Spent $3 running 4x4090 benchmarks for llama 3 70b (exl2 vs gguf). exl2 generation speed is kind of ridiculous. β€” score 33 Sources: reddit/r/AIAgents

Hey guys, so I wanted to run some heavy benchmarks comparing GGUF and EXL2 for Llama-3-70B on a 4x4090 setup. single card data is everywhere but 4 way tensor parallel stats are hard to find . The problem is I dont own a 4x4090 rig and normally renting one would immediately eat into my monthly budget

🟒 LLM context compression at 16x beats KV cache β€” score 32 Sources: reddit/r/LocalLLaMA

🟒 MICCAI 2026 Results [D] β€” score 31 Sources: reddit/r/MachineLearning

Results are almost here. Good luck to everyone waiting for the final decision πŸ™‚

🟒 Why hasn't any mainstream game integrated LLMs into NPCs yet? β€” score 18 Sources: reddit/r/LocalLLaMA

tech demos exist but nothing's actually shipped in a real game. Is it a latency problem or are game studios just not interested~

Omitted 3 additional other signals items from the main section; see raw data and source-specific sections below.

RepoDescriptionStars TodayLanguage
hexo-ai/siaSIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task.199python
mlflow/mlflowThe open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.24python
always-further/nonoCapability-based agent runtime with fine-grained policies . Brokering access directly within the agent's operating context, with zero setup and zero latency12rust
anthropics/claude-agent-sdk-python10python

πŸ“„ New Papers

TitleCategoryHotnessLink
HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizersresearch_paper20Open
From 2D Grids to 1D Tokens: Reforming Shared Representations for Multimodal Image Fusionresearch_paper10Open
Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedbackresearch_paper7Open
ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMscs.AI0Open
Arbor: Tree Search as a Cognition Layer for Autonomous Agentscs.AI0Open
Strategic Decision Support for AI Agentscs.AI0Open
Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisationcs.AI0Open
PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulationcs.AI0Open
"Did you lie?" Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organismscs.AI0Open
TrajGenAgent: A Hierarchical LLM Agent for Human Mobility Trajectory Generationcs.AI0Open
Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agentscs.AI0Open
From AGI to ASIcs.AI0Open
Deployment-Centered Evaluation: Predicting Query-Level Rejection Risk in a Clinical LLM Systemcs.AI0Open
Definitional alignment before capability alignment: a Design-Science framework for adjudicating claims about AGIcs.AI0Open
The Theory of Mind Utility: Formal Specification of a Mentalizing Mechanismcs.AI0Open

🏒 Lab Blog Posts

🐦 Twitter/X Highlights

AccountTweet Summary
AnthropicAIWe’re launching Claude Corps, a national fellowship program matching people early in their careers with US nonprofits. We'll teach 1,000 people to use Claude, and pay them to use AI to advance their hosts’ missions. https://www.anthropic.com/claude-corps Post
OpenAIWe heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business users with one free reset: Post
GoogleDeepMindPinned: We’re teaming up @Palmeiras, the first football club to meaningfully build upon TacticAI: our AI system that can help simulate field scenarios and predict open play dynamics up to 8 seconds in advance. ⚽ Post
GoogleDeepMindWhen millions of AI agents interact with each other, new collective behaviors can emerge. 🌐 Together with @schmidtsciences, @coop_ai, @ARIA_research and supported by @GoogleOrg, we’re launching a $10M research fund to help understand how AI systems behave as a group. β†’ https://goo.gle/3Si6rCl Post
xaiInstall the @sentry plugin and ask your agent to find and fix errors, analyze stack traces, and triage alerts Post
xaiUse the @vercel plugin to deploy to production, spin up sandboxes, or build apps with Shadcn. Post
simonwAfter two days with Claude Fable 5 the best way I can describe it is "relentlessly proactive" - here's an example where I dropped in a screenshot of a bug and it span up custom CORS Python servers and used pyobjc-framework-Quartz to capture screenshots https://simonwillison.net/2026/Jun/11/fable-is- Post
simonwNew Datasette release: 1.0a33, which finally brings documents the ?_extra= JSON API mechanism and brings it to the row and query pages in addition to the table pages (Most of the code in this release was built with the help of Claude Fable 5) https://datasette.io/blog/2026/api-extras/ Post

Repeated From Recent Briefings