π΄ High Significance
Model Releases
π΄ Opencode you naughty minx β score 96
Sources: reddit/r/LocalLLaMA
Man, AI agents getting pretty crazy these days. :) (local, I just decided to try to get an orchestrator in there, when Qwen and Gemma aren't up to it.)
π΄ I've shipped 3 products this year. None of them have users. Here's my problem. β score 79
Sources: reddit/r/AIAgents
I keep shipping products then stalling right before marketing. Anyone else break this pattern? I've noticed a recurring issue in my own work: I can build, design, and ship a product all the way to launch-ready β but when it's time to actually activate (cold outreach, Reddit posts, cold email sequenc
π΄ Dynamically allocating compute budget to hard set of problems and evolving the sections with Qwen-35B-A3B gets you near GPT-5.4-xHigh on HLE β score 71
Sources: reddit/r/LocalLLaMA
Developer Tools
π΄ Iβm done paying for LLMs until they learn token efficiency β score 93
Sources: reddit/r/AIAgents
Watching my agent spin in circles, re-explaining the same steps over and over before maybe doing something useful. I keep telling it use β/caveman fullβ skill β short, direct, no fluff β and it just ignores me. More verbose walls of text. More wasted tokens. Why arenβt these models trained for age
π΄ joeseesun/qiaomu-anything-to-notebooklm β Claude Skill: Multi-source content processor for NotebookLM. Supports WeChat articles, web pages, YouTube, PDF, Markdown, search queries β Podcast/PPT/MindMap/Quiz etc. β score 79
Sources: github_trending
Claude Skill: Multi-source content processor for NotebookLM. Supports WeChat articles, web pages, YouTube, PDF, Markdown, search queries β Podcast/PPT/MindMap/Quiz etc.
Infrastructure & Compute
π΄ internlm/Intern-S2-Preview Β· Hugging Face β score 79
Sources: reddit/r/LocalLLaMA
Introduction We introduce Intern-S2-Preview, an efficient 35B scientific multimodal foundation model. Beyond conventional parameter and data scaling, Intern-S2-Preview explores task scaling: increasing the difficulty, diversity, and coverage of scientific tasks to further unlock model
Other Signals
π΄ I believe there are entire companies right now under AI psychosis β score 90
Sources: hackernews
π΄ Backlash against Arxiv's proposed 1 year ban is genuinely perplexing. [D] β score 81
Sources: reddit/r/MachineLearning
Anyone else surprised at the enormous amount of backlash against Arxiv's proposed 1 year ban for authors and coauthors publishing papers with hallucinated reference and other obvious LLM/Gen AI artifacts? [https://x.com/tdietterich/status/2055000956144935055](https://x.com/tdietterich/status/2055000
π΄ Show HN: Watch a neural net learn to play Snake β score 70
Sources: hackernews
π‘ Notable
Model Releases
π‘ Orthrus-Qwen3-8B : up to 7.8Γtokens/forward on Qwen3-8B, frozen backbone, provably identical output distribution β score 69
Sources: reddit/r/LocalLLaMA Β· hackernews
π‘ KDD 2026 Cycle 2 Results [D] β score 69
Sources: reddit/r/MachineLearning
Results for the research track have been released.
π‘ [FOUNDING] SupraLabs - real open-source AI models for you! β score 54
Sources: reddit/r/LocalLLaMA
https://preview.redd.it/k6lub2ypva1h1.png?width=1500&format=png&auto=webp&s=cd44452c86b5216fec17113a72f43bbf169edafb Hey r/LocalLLaMA ! We founded SupraLabs, and it's huge! # What we do? We train, finetune and explore small models with good results to revolutionize small AI models by
π‘ @xai: You can now use your @grok subscription inside @NousResearch Hermes Agent. http://x.ai/news/grok-hermes β score 50
Sources: twitter_rss
You can now use your @grok subscription inside @NousResearch Hermes Agent. http://x.ai/news/grok-hermes
Developer Tools
π‘ Your AI agent says "transferring you to a human" and then... nothing happens. Here's the pattern that actually fixes this. β score 64
Sources: reddit/r/AIAgents
I made a YouTube video about the most common failure point I see in WhatsApp AI deployments, and it's almost never discussed. Would love to share the topic and read your thoughts on the subject. The bot tells the customer "I'll connect you with a human agent." The customer waits. No one comes. They
π‘ github/awesome-copilot β Community-contributed instructions, agents, skills, and configurations to help you make the most of GitHub Copilot. β score 61
Sources: github_trending
Community-contributed instructions, agents, skills, and configurations to help you make the most of GitHub Copilot.
Infrastructure & Compute
π‘ ROCm with PyTorch and PyTorch Lightning seems to still suck for research [D] β score 56
Sources: reddit/r/MachineLearning
So I asked about people's experiences with ROCm in a post a few weeks or so ago https://www.reddit.com/r/MachineLearning/comments/1t6cng3/rocm_status_in_mid_2026_d/ I actually went and procured a RX 7900XTX
π‘ Are the rich RAM /poor GPU people wrong here? β score 46
Sources: reddit/r/LocalLLaMA
Hello Guys, I know everyone has his definition of local models, but for me i see 2 "reasonable" type of frontier local models. a dense one that barely fit in a 32GB ou 24GB of gpu for the most "reasonable" GPU wealthy guys and a MOE in the 100B params, the 100ish B billion params can be run on hybri
Research Papers
π‘ WildTableBench: Benchmarking Multimodal Foundation Models on Table Understanding In the Wild β score 65
Sources: huggingface
Using multimodal foundation models to analyze table images is a high-value yet challenging application in consumer and enterprise scenarios. Despite its importance, current evaluations rely largely on structured-text tables or clean rendered images, leaving the visual complexity of in-the-wild table
π‘ Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image β score 60
Sources: huggingface Β· arxiv/cs.AI
Generating a street-level 3D scene from a single satellite image is a crucial yet challenging task. Current methods present a stark trade-off: geometry-colorization models achieve high geometric fidelity but are typically building-focused and lack semantic diversity. In contrast, proxy-based models
π‘ Aligning Latent Geometry for Spherical Flow Matching in Image Generation β score 50
Sources: huggingface
Latent flow matching for image generation usually transports Gaussian noise to variational autoencoder latents along linear paths. Both endpoints, however, concentrate in thin spherical shells, and a Euclidean chord leaves those shells even when preprocessing aligns their radii. By decomposing each
Other Signals
π‘ Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard! β score 62
Sources: reddit/r/LocalLLaMA
Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard! little-coder Γ Qwen3.6-35B-A3B hit 24.6% (Β±3.2), and now land above Gemini 2.5 Pro on Gemini CLI (19.6%) and Qwen3-Coder-480B on Terminus 2 (23.9%). I didnβt expect the scaffold-model gap from Polyglot to hold on
π‘ Frontier AI has broken the open CTF format β score 50
Sources: hackernews
π’ Incremental
Model Releases
π’ Luce Megakernal: Why nobody is taking about this? β score 29
Sources: reddit/r/LocalLLaMA
Everyone has been taking about Luce DFlash and PFlash. I just came across their megakernal and it seems it was released along with Dflash and PFlash. It seems it's giving them 1.8x greater speed with much more power efficiency on nvidia gpu comparable to the efficacy achieved on apple silicon! How's
π’ [R] Which LLMs are actually best for bleeding-edge Linux/ML debugging workflows in 2026? [R] β score 6
Sources: reddit/r/MachineLearning
Iβm trying to optimize an AI workflow for bleeding-edge Linux/ML debugging (Arch/CachyOS, CUDA, Python, unsloth, etc.). Current stack: - Claude = deep reasoning/mastermind - Gemini 3.1 Pro = execution/logistics - Perplexity = retrieval Main problem: Gemini often gives high-friction or impractical
Developer Tools
π’ One thing that feels underdiscussed in AI right now: β score 29
Sources: reddit/r/AIAgents
We keep treating memory like a UX feature when itβs really becoming an operational problem. The hard part is not βcan the model remember stuff?β The hard part is: * what happens when the memory is wrong * how you trace where a belief came from * how stale context gets replaced * how you migrate syst
π’ i asked 23 companies how they actually test their AI agents before shipping. the answers genuinely scared me. β score 29
Sources: reddit/r/AIAgents
spent the last 3 weeks DMing CS leads, ops managers, and PMs at companies running AI agents in production. just one question: "how do you know your agent works before it goes live?" here's what i found: 17 out of 23 said some version of "we just ship it and watch slack for complaints" 4 used a sprea
π’ How did you handle the team conversation when you rolled out AI customer support? β score 29
Sources: reddit/r/AIAgents
We're planning to roll out AI-assisted support in Q3. Technically the plan is solid. The part I'm not sure I'm handling well is the team communication. I have 11 support agents. The honest projection is that AI will handle 60β70% of ticket volume over time. I don't plan to do layoffs β the growth pl
π’ PostHog/posthog β π¦ PostHog is an all-in-one developer platform for building successful products. We offer product analytics, web analytics, session replay, error tracking, feature flags, experimentation, surveys, data warehouse, a CDP, and an AI product assistant to help debug your code, ship features faster, and keep all your usage and customer data in one stack. β score 21
Sources: github_trending
π¦ PostHog is an all-in-one developer platform for building successful products. We offer product analytics, web analytics, session replay, error tracking, feature flags, experimentation, surveys, data warehouse, a CDP, and an AI product assistant to help debug your code, ship features faster, and ke
π’ What's in a GGUF, besides the weights - and what's still missing? β score 4
Sources: reddit/r/LocalLLaMA
Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.
Infrastructure & Compute
π’ Finding the 4x 3090 Sweet Spot β score 21
Sources: reddit/r/LocalLLaMA
https://preview.redd.it/8o43bjhe9d1h1.png?width=5346&format=png&auto=webp&s=1c87c2ee8b8ffff43495f543266056b0e26d3947 In another post I had someone ask me about the power draw of the 4x 3090 setup so I'm sharing a a full test I conducted to understand the efficiency curve. Used this [blog
π’ Struggling with Overfitting on Medical Imaging Task [D] β score 19
Sources: reddit/r/MachineLearning
Hi everyone, Iβm working on a 2-class classification problem (LCA vs. RCA coronary arteries) using 2D X-ray angiograms. Iβm currently stuck in a cycle of extreme overfitting and could use some advice on my training strategy. The Setup: * Dataset: Small (~900 training frames from ~300 unique DICOMs
Other Signals
π’ Can a 5090 with qwen3.6 achieve > 3,000 tok/s ? bring your pitchforks (open-dllm) β score 38
Sources: reddit/r/LocalLLaMA
so background - these people. Fred Zhangzhi Peng, Shuibai Zhang, Alex Tong, worked on converting AR -> diffusion (its already working from older models). [https://oval-shell-31c.notion.site/Open-dLLM-Open-Diffusion-Large-Language-Model-25e03bf6136480b7a4ebe3d53be9f68a](https://oval-shell-31c.noti
π’ PINN is predicting trivial solution for stiff ODE [D] β score 31
Sources: reddit/r/MachineLearning
I am learning physics informed neural networks. Currently, I am solving a simple second ODE (damped harmonic oscillator). The equation is m*d2y/dt2 + mu*dy/dt + k*y = 0 (bcs: y(t=0) = 1, y'(t=0) = 0). I managed to draft a code. The code works for k values upto 50. However, when increased the valu
π’ AllenAI has been iterating on their MolmoAct2 models for robotics β score 12
Sources: reddit/r/LocalLLaMA
r/AllenAI is cooking with MolmoAct2, a 5B vision-language-action model for robot control. They keep releasing new fine-tunes on different kinds of robotics datasets, including (but not limited to, and they keep releasing new ones): * https://huggingface.co/allenai/MolmoAct2-LIBERO - general robotics
π’ Someone Shared a Real Monet Painting as AI and Asked for Critiques β score 10
Sources: hackernews
π’ GetMCP: Zero Trust for AI Agents β score 0
Sources: reddit/r/AIAgents
Just shipped v0.1.0 of something I've been building. Sharing because I haven't seen anyone solve this end-to-end as a self-hostable thing. The problem. AI agents (Claude, ChatGPT, Cursor, in-house bots) are starting to make real calls into production APIs. Most companies are handing them a single lo
π Trending Repos
| Repo | Description | Stars Today | Language |
|---|---|---|---|
| joeseesun/qiaomu-anything-to-notebooklm | Claude Skill: Multi-source content processor for NotebookLM. Supports WeChat articles, web pages, YouTube, PDF, Markdown, search queries β Podcast/PPT/MindMap/Quiz etc. | 438 | python |
| github/awesome-copilot | Community-contributed instructions, agents, skills, and configurations to help you make the most of GitHub Copilot. | 105 | python |
| PostHog/posthog | π¦ PostHog is an all-in-one developer platform for building successful products. We offer product analytics, web analytics, session replay, error tracking, feature flags, experimentation, surveys, data warehouse, a CDP, and an AI product assistant to help debug your code, ship features faster, and keep all your usage and customer data in one stack. | 24 | python |
| NVIDIA-NeMo/NeMo | A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech) | 5 | python |
π New Papers
| Title | Category | Hotness | Link |
|---|---|---|---|
| WildTableBench: Benchmarking Multimodal Foundation Models on Table Understanding In the Wild | research_paper | 8 | Open |
| Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image | research_paper | 4 | Open |
| Aligning Latent Geometry for Spherical Flow Matching in Image Generation | research_paper | 4 | Open |
| Mixed Integer Goal Programming for Personalized Meal Optimization with User-Defined Serving Granularity | cs.AI | 0 | Open |
| A Two-Dimensional Framework for AI Agent Design Patterns: Cognitive Function and Execution Topology | cs.AI | 0 | Open |
| Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems | cs.AI | 0 | Open |
| PolitNuggets: Benchmarking Agentic Discovery of Long-Tail Political Facts | cs.AI | 0 | Open |
| Conditional Attribute Estimation with Autoregressive Sequence Models | cs.AI | 0 | Open |
| Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use | cs.AI | 0 | Open |
| SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks | cs.AI | 0 | Open |
| Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning | cs.AI | 0 | Open |
| SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration | cs.AI | 0 | Open |
| ChromaFlow: A Negative Ablation Study of Orchestration Overhead in Tool-Augmented Agent Evaluation | cs.AI | 0 | Open |
| Modeling Bounded Rationality in Drug Shortage Pharmacists Using Attention-Guided Dynamic Decomposition | cs.AI | 0 | Open |
| ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents | cs.AI | 0 | Open |
π¦ Twitter/X Highlights
| Account | Tweet Summary |
|---|---|
| xai | You can now use your @grok subscription inside @NousResearch Hermes Agent. http://x.ai/news/grok-hermes Post |
Repeated From Recent Briefings
- tinyhumansai/openhuman β Your Personal AI super intelligence. Private, Simple and extremely powerful. - first seen 2026-05-11
- Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding - first seen 2026-05-11
- arXiv implements 1-year ban for papers containing incontrovertible evidence of unchecked LLM-generated errors, such as hallucinated references or results. [N] - first seen 2026-05-15
- garrytan/gstack β Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA - first seen 2026-05-12
- rohitg00/agentmemory β #1 Persistent memory for AI coding agents based on real-world benchmarks - first seen 2026-05-09
- anthropics/skills β Public repository for Agent Skills - first seen 2026-05-11
- Long Context Pre-Training with Lighthouse Attention - first seen 2026-05-08
- K-Dense-AI/scientific-agent-skills β A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing. - first seen 2026-05-14
- shiyu-coder/Kronos β Kronos: A Foundation Model for the Language of Financial Markets - first seen 2026-05-07
- ViMU: Benchmarking Video Metaphorical Understanding - first seen 2026-05-15
- ... plus 289 more repeated items in processed data