๐ด High Significance
Model Releases
๐ด TextGen is now a native desktop app. Open-source alternative to LM Studio (formerly text-generation-webui). โ score 94
Sources: reddit/r/LocalLLaMA
Hi all, I have been making a lot of updates to my project, and I wanted to share them here. TextGen (previously text-generation-webui, also known as my username oobabooga or ooba) has been in development since December 2022, before LLaMa and llama.cpp existed. In the last two months, the project has
๐ด DELIGHT โ self-hosted AI engineering autopilot: local LLM + browser farm + repo graph + P2P compute โ score 94
Sources: reddit/r/AIAgents
DELIGHT โ self-hosted AI engineering autopilot: local LLM + browser farm + repo graph + P2P compute TL;DR: Built a local "OS for AI agents" that scans your entire repo into a live graph (Worm), routes tasks between local Qwen, headless ChatGPT browser sessions via Tor/antidetect, and OpenRou
๐ด Claude for Small Business โ score 70
Sources: hackernews
Developer Tools
๐ด Human-level performance via ML was not proven impossible with complexity theory [D] โ score 94
Sources: reddit/r/MachineLearning
Van Rooij, Guest, de Haan, Adolfi, Kolokolova, and Rich claimed to have proven that AGI via ML is impossible in Computational Brain & Behavior in 2024. The basic idea was to try to reduce a known NP-hard problem to the problem of
๐ด Feels like building AI apps is becoming infrastructure engineering โ score 81
Sources: reddit/r/AIAgents
I started experimenting with AI apps because it felt fast and exciting. Now every workflow somehow involves frameworks, vector DBs, orchestration, observability, memory systems, evals, and constant debugging. Wondering if others feel the same lately.
๐ด we really all are going to make it, aren't we? 2x3090 setup. โ score 72
Sources: reddit/r/LocalLLaMA
i'm blown away. i saw someone made a post the other day about "club-3090" and after having sonnet patch some fixes into it, specifically a sse-session drop bug and a bug with tool-calling, it's fair to say that even "budget" setups like myself will have a path forward soon for only-local-ai. referen
Enterprise Adoption
๐ด Web-Search is coming to a screeching performance halt as Google shuts down their free search index, and traffic defenders like Cloudflare challenge AI at every gateway. What are our options? โ score 83
Sources: reddit/r/LocalLLaMA
Google is closing its free tier to just 50 domains for site-specific search, and an inheritance date of January 1st, 2027, with no public pricing being listed for advanced searches. Cloudflare's new site-default is to challenge all AI bots attempting to scrape web-information for all their customers
Research Papers
๐ด FrameSkip: Learning from Fewer but More Informative Frames in VLA Training โ score 75
Sources: huggingface
Vision-Language-Action (VLA) policies are commonly trained from dense robot demonstration trajectories, often collected through teleoperation, by sampling every recorded frame as if it provided equally useful supervision. We argue that this convention creates a temporal supervision imbalance: long l
Other Signals
๐ด Built Support Vector Machine(SVM) from scratch in Rust [P] โ score 81
Sources: reddit/r/MachineLearning
Built my own SVM classifier from scratch in Rust. It uses SMO optimization, have linear and rbf kernel, uses grid search to tune the hyperparameters. I tested it on two datasets one using Linear dataset and other using RBF, these were the results: |Dataset|Kernel|Accuracy|Recall|F1| |:-|:-|:-|:-|:-|
๐ก Notable
Developer Tools
๐ก Most AI-generated apps are complete slop. Controversial take: itโs not AIโs fault โ score 69
Sources: reddit/r/AIAgents
AI gets blamed for making boring products, but I think thatโs backwards. The problem isnโt that AI canโt build. The problem is that we continually hand it dead ideas. โBuild me a productivity app.โ โBuild me a habit tracker.โ โBuild me a dashboard for small businesses.โ Of course the output feels ge
๐ก opendatalab/MinerU โ Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows. โ score 65
Sources: github_trending
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
๐ก K-Dense-AI/scientific-agent-skills โ A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing. โ score 63
Sources: github_trending
A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing.
๐ก openai/whisper โ Robust Speech Recognition via Large-Scale Weak Supervision โ score 56
Sources: github_trending
Robust Speech Recognition via Large-Scale Weak Supervision
๐ก Building a safe, effective sandbox to enable Codex on Windows โ score 50
Sources: lab_blog/OpenAI
Learn how OpenAI built a secure sandbox for Codex on Windows, enabling safe, efficient coding agents with controlled file access and network restrictions.
Infrastructure & Compute
๐ก Trained transformer-based chess models to play like humans (including thinking time) [P] โ score 44
Sources: reddit/r/MachineLearning
I trained a set of deep learning (transformer-based) chess models to play like humans (inspired by MAIA and Grandmaster Chess Without Search). There's a separate model for each 100-point rating bucket from ~800 to 2500+. I started with training a mid-strength model from scratch on a 8xH100 cluster,
Research Papers
๐ก RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation โ score 58
Sources: huggingface ยท arxiv/cs.AI
Intensive care units (ICU) generate long, dense and evolving streams of clinical information, where physicians must repeatedly reassess patient states under time pressure, underscoring a clear need for reliable AI decision support. Existing ICU benchmarks typically treat historical clinician actions
๐ก Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs โ score 55
Sources: huggingface
Existing preference datasets for text-to-image models typically store only the final winner/loser images. This representation is insufficient for rectified flow (RF) models, whose generation is naturally indexed by a specific prior noise sample and follows a nearly straight denoising trajectory. In
๐ก Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition โ score 52
Sources: huggingface ยท arxiv/cs.AI
Fine-tuning multilingual ASR models like Whisper for low-resource languages often improves read speech but degrades spontaneous audio performance, a phenomenon we term studio-bias. To diagnose this mismatch, we introduce Vividh-ASR, a complexity-stratified benchmark for Hindi and Malayalam across fo
๐ก PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents โ score 42
Sources: huggingface ยท arxiv/cs.CL
We introduce PersonalAI 2.0 (PAI-2), a novel framework, designed to enhance large language model (LLM) based systems through integration of external knowledge graphs (KG). The proposed approach addresses key limitations of existing Graph Retrieval-Augmented Generation (GraphRAG) methods by incorpora
๐ก F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking โ score 42
Sources: huggingface ยท arxiv/cs.LG
Traditional retrieval pipelines optimize utility through stages of candidate retrieval and reranking, where ranking operates over a predefined candidate set. Large Language Models (LLMs) broaden this into a generative process: given a candidate pool, an LLM can generate a subset and order it within
Other Signals
๐ก MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant) โ score 61
Sources: reddit/r/LocalLLaMA
TL;DR Results from the title are for single inference with 2 prompt of 1k and 15k tokens. So no MTP (as itโs slower for big prompt), no DFlash (working too but slower for big prompt), no quant used (full precision wanted) and the results are pretty good for a 2018 card. (Bench has been done with
๐ก Have the "on-hold" durations been getting longer for arXiv submissions? [D] โ score 56
Sources: reddit/r/MachineLearning
I have a paper that has been "on-hold" for about 2 weeks now. I understand that it might take a little longer now because of inundation of AI generated low-effort papers but my papers have gone from "on-hold" to "submitted" within a couple of days in the past. Wondering if anyone else is facing the
๐ก 24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) โ score 50
Sources: reddit/r/LocalLLaMA
I got Qwen 3.6 35B-A3B and Gemma 4 26B-A4B running on a $200 secondhand machine (i7-6700 / GTX 1080 / 32 GB RAM) using llama.cpp (the TurboQuant/RotorQuant KV cache quantisation allows 128k context within the 8 GB VRAM). Results (Q4_K_M models, 128k context): |Model|tok/s|Key flags| |:
๐ก The US is winning the AI race where it matters most: commercialization โ score 50
Sources: hackernews
๐ก @swyx: if your reaction to this is โhaha openclaw bad, see prompt injection is the #1 dangerโ you: 1) havent sufficiently appreciated the layers to this tweet 2) havent seen enough ai api keys โ score 50
Sources: twitter_rss
if your reaction to this is โhaha openclaw bad, see prompt injection is the #1 dangerโ you: 1) havent sufficiently appreciated the layers to this tweet 2) havent seen enough ai api keys
๐ข Incremental
Model Releases
๐ข What revenue model would you guys suggest for our automation orchestration platform open to public agents as a marketplace? โ score 38
Sources: reddit/r/AIAgents
This is not a promotion. We are looking for suggestions. So we are very close to launching an automation orchestration platform where any developer can list their agent based on platform specifications to perform any specific task that can be used as a building block of a larger flow by anyone. The
๐ข A Claude Code and Codex Skill for Deliberate Skill Development โ score 10
Sources: hackernews
๐ข Simpler self hosted alt to Open WebUI โ score 8
Sources: reddit/r/LocalLLaMA
Got Qwen3.6 27B running on my newly assembled 4x 3090 rig (s/o 3090-club) and I'm trying to get the people in my house to adopt the local workflow. Open WebUI has improved a lot in the recent updates, but I still found it pretty rough for non-technical people. It often feels more like a dev tool tha
๐ข The "the future is fictional" problem of many local LLMs โ score 6
Sources: reddit/r/LocalLLaMA
Many local models have a problem (that raised due to excessive RHLF training): They mostly think that everything that is beyond their knowledge cutoff date would be "fictional" or "satirical". To be fair: Even the Gemini API without web access can have this sometimes. But it stops when you give it t
Developer Tools
๐ข Side Projects. โ score 39
Sources: reddit/r/LocalLLaMA
โ Little something I put together to play with for larger contexts than my 9070xt. 8700k, dual P100's, 16gb DDR4, 32gb Optane, Samsung sata SSD. Nothing too fancy. Anyone else do a recent build? How's it working out?
๐ข Spent weeks debugging my agent in Langchain before realizing the framework was the problem. โ score 38
Sources: reddit/r/AIAgents
Spent way too long thinking complexity in my agent was a me problem. Bad prompts, bad memory setup, bad tool definitions. Kept tweaking Langchain configs trying to fix behavior I couldn't even properly observe. Turns out half the problem was I had no idea what was actually happening under the hood.
๐ข TraceMind โ open source LLM quality monitoring with a ReAct agent that investigates why your AI started giving wrong answers โ score 38
Sources: reddit/r/AIAgents
Background: I was building a multi-agent system. Changed one line in a system prompt. Quality dropped from 84% to 52% pass rate. HTTP 200 the whole time. Found out 11 days later from a user. That incident made me realize LLM apps have a monitoring gap that doesn't exist in traditional software. When
๐ข Local services data is the biggest gap for AI agents. Am I wrong? โ score 38
Sources: reddit/r/AIAgents
I've been building agents that need to interact with the real, physical world; things like "find me a plumber available tomorrow under $80/hr" or "compare 3 electricians near me." And I keep hitting the same wall: this data simply doesn't exist in structured form. * Pricing? Buried in a 2015 PDF on
๐ข NVIDIA/OpenShell โ OpenShell is the safe, private runtime for autonomous AI agents. โ score 37
Sources: github_trending
OpenShell is the safe, private runtime for autonomous AI agents.
Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.
Infrastructure & Compute
๐ข ansible/ansible โ Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on remote systems.https://docs.ansible.com. โ score 13
Sources: github_trending
Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on rem
๐ข huggingface/pytorch-image-models โ The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more โ score 3
Sources: github_trending
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt,
Business & Funding
๐ข Rare event prediction on time series that change structure mid-stream? [D] โ score 0
Sources: reddit/r/MachineLearning
Hi reddit! I made this post on r/MLQuestions, but I am posting it here too for spread:) This is a case I have been assigned at work and I'd love input from anyone who's tackled something similar. I'm building a failure prediction model for ~33k chargers. The devices emit data at two very different
Research Papers
๐ข From Pixels to Concepts: Do Segmentation Models Understand What They Segment? โ score 15
Sources: huggingface
Segmentation is a fundamental vision task underlying numerous downstream applications. Recent promptable segmentation models, such as Segment Anything Model 3 (SAM3), extend segmentation from category-agnostic mask prediction to concept-guided localization conditioned on high-level textual prompts.
Other Signals
๐ข Arena AI Model ELO History โ score 30
Sources: hackernews
๐ข Anyone actually using a local LLM as their daily knowledge base? Not for coding, for life stuff. What's your setup? โ score 17
Sources: reddit/r/LocalLLaMA
So I've been going down a rabbit hole lately and I can't find many people actually talking about this specific use case. everyone here runs local LLMs for coding, chat, maybe some creative writing. cool. But what about using it as a proper personal knowledge base? like, dump your own notes, PDFs, ra
๐ข GPT 5.5 v/s GPT 5.4. Paying 63% more just for 0.1 point difference! โ score 6
Sources: reddit/r/AIAgents
was running cost comparisons on codex models this week and kept assuming gpt-5.5 would justify the premium because it benchmarks highest. the thing i keep noticing is that raw benchmark scores and cost-adjusted scores are almost completely disconnected, and people treat them like they're the same nu
๐ Trending Repos
| Repo | Description | Stars Today | Language |
|---|---|---|---|
| opendatalab/MinerU | Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows. | 129 | python |
| K-Dense-AI/scientific-agent-skills | A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing. | 99 | python |
| openai/whisper | Robust Speech Recognition via Large-Scale Weak Supervision | 68 | python |
| NVIDIA/OpenShell | OpenShell is the safe, private runtime for autonomous AI agents. | 38 | rust |
| ErlichLiu/Proma | ๆๆไธๆป็้็จ Agent ไฝ้ชๅธฆ่ฟไฝ ็ๅทฅไฝๆต๏ผไธบ 100x ไธไธ็จๆท่็็ๆชๆฅไบงๅ๏ผๆญฃๅจๅฎ็ฐ proactive Agent ้ถๆฎตใๅบไบ Claude Agent SDK ็ๅฎๆดๅผๆบๅฎ่ทต๏ผๅ็ๆฏๆ้ฃไนฆ็พค่่ฐ็จใ็ตๆดปๆฅๅ ฅไปปๆๅคงๆจกๅไพๅบๅ โโ ่ฎฉ้กถ็บง Agent ่ฝๅ็ๆญฃ่ทๅจไฝ ๆฏๅคฉ็จ็ๅฐๆนใ | 35 | typescript |
| EleutherAI/lm-evaluation-harness | A framework for few-shot evaluation of language models. | 22 | python |
| ansible/ansible | Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on remote systems.https://docs.ansible.com. | 18 | python |
| huggingface/pytorch-image-models | The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more | 8 | python |
๐ New Papers
| Title | Category | Hotness | Link |
|---|---|---|---|
| FrameSkip: Learning from Fewer but More Informative Frames in VLA Training | research_paper | 19 | Open |
| RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation | research_paper | 3 | Open |
| Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs | research_paper | 7 | Open |
| Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition | research_paper | 2 | Open |
| Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents | cs.AI | 0 | Open |
| Macro-Action Based Multi-Agent Instruction Following through Value Cancellation | cs.AI | 0 | Open |
| Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack | cs.AI | 0 | Open |
| Revealing Interpretable Failure Modes of VLMs | cs.AI | 0 | Open |
| Learning Transferable Latent User Preferences for Human-Aligned Decision Making | cs.AI | 0 | Open |
| On the Size Complexity and Decidability of First-Order Progression | cs.AI | 0 | Open |
| DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models | cs.AI | 0 | Open |
| CHAL: Council of Hierarchical Agentic Language | cs.AI | 0 | Open |
| BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics | cs.AI | 0 | Open |
| State-Centric Decision Process | cs.AI | 0 | Open |
| PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models | cs.AI | 0 | Open |
๐ข Lab Blog Posts
๐ฆ Twitter/X Highlights
| Account | Tweet Summary |
|---|---|
| swyx | if your reaction to this is โhaha openclaw bad, see prompt injection is the #1 dangerโ you: 1) havent sufficiently appreciated the layers to this tweet 2) havent seen enough ai api keys Post |
Repeated From Recent Briefings
- NousResearch/hermes-agent โ The agent that grows with you - first seen 2026-05-11
- MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image - first seen 2026-05-12
- tinyhumansai/openhuman โ Your Personal AI super intelligence. Private, Simple and extremely powerful. - first seen 2026-05-11
- rohitg00/agentmemory โ #1 Persistent memory for AI coding agents based on real-world benchmarks - first seen 2026-05-09
- farion1231/cc-switch โ A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io - first seen 2026-05-08
- Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model - first seen 2026-05-13
- garrytan/gstack โ Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA - first seen 2026-05-12
- yikart/AiToEarn โ Let's use AI to Earn! - first seen 2026-05-11
- Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling - first seen 2026-05-13
- anthropics/skills โ Public repository for Agent Skills - first seen 2026-05-11
- ... plus 117 more repeated items in processed data