๐Ÿ”ด High Significance

Model Releases

๐Ÿ”ด TextGen is now a native desktop app. Open-source alternative to LM Studio (formerly text-generation-webui). โ€” score 94 Sources: reddit/r/LocalLLaMA

Hi all, I have been making a lot of updates to my project, and I wanted to share them here. TextGen (previously text-generation-webui, also known as my username oobabooga or ooba) has been in development since December 2022, before LLaMa and llama.cpp existed. In the last two months, the project has

๐Ÿ”ด DELIGHT โ€“ self-hosted AI engineering autopilot: local LLM + browser farm + repo graph + P2P compute โ€” score 94 Sources: reddit/r/AIAgents

DELIGHT โ€“ self-hosted AI engineering autopilot: local LLM + browser farm + repo graph + P2P compute TL;DR: Built a local "OS for AI agents" that scans your entire repo into a live graph (Worm), routes tasks between local Qwen, headless ChatGPT browser sessions via Tor/antidetect, and OpenRou

๐Ÿ”ด Claude for Small Business โ€” score 70 Sources: hackernews

Developer Tools

๐Ÿ”ด Human-level performance via ML was not proven impossible with complexity theory [D] โ€” score 94 Sources: reddit/r/MachineLearning

Van Rooij, Guest, de Haan, Adolfi, Kolokolova, and Rich claimed to have proven that AGI via ML is impossible in Computational Brain & Behavior in 2024. The basic idea was to try to reduce a known NP-hard problem to the problem of

๐Ÿ”ด Feels like building AI apps is becoming infrastructure engineering โ€” score 81 Sources: reddit/r/AIAgents

I started experimenting with AI apps because it felt fast and exciting. Now every workflow somehow involves frameworks, vector DBs, orchestration, observability, memory systems, evals, and constant debugging. Wondering if others feel the same lately.

๐Ÿ”ด we really all are going to make it, aren't we? 2x3090 setup. โ€” score 72 Sources: reddit/r/LocalLLaMA

i'm blown away. i saw someone made a post the other day about "club-3090" and after having sonnet patch some fixes into it, specifically a sse-session drop bug and a bug with tool-calling, it's fair to say that even "budget" setups like myself will have a path forward soon for only-local-ai. referen

Enterprise Adoption

๐Ÿ”ด Web-Search is coming to a screeching performance halt as Google shuts down their free search index, and traffic defenders like Cloudflare challenge AI at every gateway. What are our options? โ€” score 83 Sources: reddit/r/LocalLLaMA

Google is closing its free tier to just 50 domains for site-specific search, and an inheritance date of January 1st, 2027, with no public pricing being listed for advanced searches. Cloudflare's new site-default is to challenge all AI bots attempting to scrape web-information for all their customers

Research Papers

๐Ÿ”ด FrameSkip: Learning from Fewer but More Informative Frames in VLA Training โ€” score 75 Sources: huggingface

Vision-Language-Action (VLA) policies are commonly trained from dense robot demonstration trajectories, often collected through teleoperation, by sampling every recorded frame as if it provided equally useful supervision. We argue that this convention creates a temporal supervision imbalance: long l

Other Signals

๐Ÿ”ด Built Support Vector Machine(SVM) from scratch in Rust [P] โ€” score 81 Sources: reddit/r/MachineLearning

Built my own SVM classifier from scratch in Rust. It uses SMO optimization, have linear and rbf kernel, uses grid search to tune the hyperparameters. I tested it on two datasets one using Linear dataset and other using RBF, these were the results: |Dataset|Kernel|Accuracy|Recall|F1| |:-|:-|:-|:-|:-|

๐ŸŸก Notable

Developer Tools

๐ŸŸก Most AI-generated apps are complete slop. Controversial take: itโ€™s not AIโ€™s fault โ€” score 69 Sources: reddit/r/AIAgents

AI gets blamed for making boring products, but I think thatโ€™s backwards. The problem isnโ€™t that AI canโ€™t build. The problem is that we continually hand it dead ideas. โ€œBuild me a productivity app.โ€ โ€œBuild me a habit tracker.โ€ โ€œBuild me a dashboard for small businesses.โ€ Of course the output feels ge

๐ŸŸก opendatalab/MinerU โ€” Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows. โ€” score 65 Sources: github_trending

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

๐ŸŸก K-Dense-AI/scientific-agent-skills โ€” A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing. โ€” score 63 Sources: github_trending

A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing.

๐ŸŸก openai/whisper โ€” Robust Speech Recognition via Large-Scale Weak Supervision โ€” score 56 Sources: github_trending

Robust Speech Recognition via Large-Scale Weak Supervision

๐ŸŸก Building a safe, effective sandbox to enable Codex on Windows โ€” score 50 Sources: lab_blog/OpenAI

Learn how OpenAI built a secure sandbox for Codex on Windows, enabling safe, efficient coding agents with controlled file access and network restrictions.

Infrastructure & Compute

๐ŸŸก Trained transformer-based chess models to play like humans (including thinking time) [P] โ€” score 44 Sources: reddit/r/MachineLearning

I trained a set of deep learning (transformer-based) chess models to play like humans (inspired by MAIA and Grandmaster Chess Without Search). There's a separate model for each 100-point rating bucket from ~800 to 2500+. I started with training a mid-strength model from scratch on a 8xH100 cluster,

Research Papers

๐ŸŸก RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation โ€” score 58 Sources: huggingface ยท arxiv/cs.AI

Intensive care units (ICU) generate long, dense and evolving streams of clinical information, where physicians must repeatedly reassess patient states under time pressure, underscoring a clear need for reliable AI decision support. Existing ICU benchmarks typically treat historical clinician actions

๐ŸŸก Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs โ€” score 55 Sources: huggingface

Existing preference datasets for text-to-image models typically store only the final winner/loser images. This representation is insufficient for rectified flow (RF) models, whose generation is naturally indexed by a specific prior noise sample and follows a nearly straight denoising trajectory. In

๐ŸŸก Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition โ€” score 52 Sources: huggingface ยท arxiv/cs.AI

Fine-tuning multilingual ASR models like Whisper for low-resource languages often improves read speech but degrades spontaneous audio performance, a phenomenon we term studio-bias. To diagnose this mismatch, we introduce Vividh-ASR, a complexity-stratified benchmark for Hindi and Malayalam across fo

๐ŸŸก PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents โ€” score 42 Sources: huggingface ยท arxiv/cs.CL

We introduce PersonalAI 2.0 (PAI-2), a novel framework, designed to enhance large language model (LLM) based systems through integration of external knowledge graphs (KG). The proposed approach addresses key limitations of existing Graph Retrieval-Augmented Generation (GraphRAG) methods by incorpora

๐ŸŸก F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking โ€” score 42 Sources: huggingface ยท arxiv/cs.LG

Traditional retrieval pipelines optimize utility through stages of candidate retrieval and reranking, where ranking operates over a predefined candidate set. Large Language Models (LLMs) broaden this into a generative process: given a candidate pool, an LLM can generate a subset and order it within

Other Signals

๐ŸŸก MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant) โ€” score 61 Sources: reddit/r/LocalLLaMA

TL;DR Results from the title are for single inference with 2 prompt of 1k and 15k tokens. So no MTP (as itโ€™s slower for big prompt), no DFlash (working too but slower for big prompt), no quant used (full precision wanted) and the results are pretty good for a 2018 card. (Bench has been done with

๐ŸŸก Have the "on-hold" durations been getting longer for arXiv submissions? [D] โ€” score 56 Sources: reddit/r/MachineLearning

I have a paper that has been "on-hold" for about 2 weeks now. I understand that it might take a little longer now because of inundation of AI generated low-effort papers but my papers have gone from "on-hold" to "submitted" within a couple of days in the past. Wondering if anyone else is facing the

๐ŸŸก 24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context) โ€” score 50 Sources: reddit/r/LocalLLaMA

I got Qwen 3.6 35B-A3B and Gemma 4 26B-A4B running on a $200 secondhand machine (i7-6700 / GTX 1080 / 32 GB RAM) using llama.cpp (the TurboQuant/RotorQuant KV cache quantisation allows 128k context within the 8 GB VRAM). Results (Q4_K_M models, 128k context): |Model|tok/s|Key flags| |:

๐ŸŸก The US is winning the AI race where it matters most: commercialization โ€” score 50 Sources: hackernews

๐ŸŸก @swyx: if your reaction to this is โ€œhaha openclaw bad, see prompt injection is the #1 dangerโ€ you: 1) havent sufficiently appreciated the layers to this tweet 2) havent seen enough ai api keys โ€” score 50 Sources: twitter_rss

if your reaction to this is โ€œhaha openclaw bad, see prompt injection is the #1 dangerโ€ you: 1) havent sufficiently appreciated the layers to this tweet 2) havent seen enough ai api keys

๐ŸŸข Incremental

Model Releases

๐ŸŸข What revenue model would you guys suggest for our automation orchestration platform open to public agents as a marketplace? โ€” score 38 Sources: reddit/r/AIAgents

This is not a promotion. We are looking for suggestions. So we are very close to launching an automation orchestration platform where any developer can list their agent based on platform specifications to perform any specific task that can be used as a building block of a larger flow by anyone. The

๐ŸŸข A Claude Code and Codex Skill for Deliberate Skill Development โ€” score 10 Sources: hackernews

๐ŸŸข Simpler self hosted alt to Open WebUI โ€” score 8 Sources: reddit/r/LocalLLaMA

Got Qwen3.6 27B running on my newly assembled 4x 3090 rig (s/o 3090-club) and I'm trying to get the people in my house to adopt the local workflow. Open WebUI has improved a lot in the recent updates, but I still found it pretty rough for non-technical people. It often feels more like a dev tool tha

๐ŸŸข The "the future is fictional" problem of many local LLMs โ€” score 6 Sources: reddit/r/LocalLLaMA

Many local models have a problem (that raised due to excessive RHLF training): They mostly think that everything that is beyond their knowledge cutoff date would be "fictional" or "satirical". To be fair: Even the Gemini API without web access can have this sometimes. But it stops when you give it t

Developer Tools

๐ŸŸข Side Projects. โ€” score 39 Sources: reddit/r/LocalLLaMA

โ€‹ Little something I put together to play with for larger contexts than my 9070xt. 8700k, dual P100's, 16gb DDR4, 32gb Optane, Samsung sata SSD. Nothing too fancy. Anyone else do a recent build? How's it working out?

๐ŸŸข Spent weeks debugging my agent in Langchain before realizing the framework was the problem. โ€” score 38 Sources: reddit/r/AIAgents

Spent way too long thinking complexity in my agent was a me problem. Bad prompts, bad memory setup, bad tool definitions. Kept tweaking Langchain configs trying to fix behavior I couldn't even properly observe. Turns out half the problem was I had no idea what was actually happening under the hood.

๐ŸŸข TraceMind โ€“ open source LLM quality monitoring with a ReAct agent that investigates why your AI started giving wrong answers โ€” score 38 Sources: reddit/r/AIAgents

Background: I was building a multi-agent system. Changed one line in a system prompt. Quality dropped from 84% to 52% pass rate. HTTP 200 the whole time. Found out 11 days later from a user. That incident made me realize LLM apps have a monitoring gap that doesn't exist in traditional software. When

๐ŸŸข Local services data is the biggest gap for AI agents. Am I wrong? โ€” score 38 Sources: reddit/r/AIAgents

I've been building agents that need to interact with the real, physical world; things like "find me a plumber available tomorrow under $80/hr" or "compare 3 electricians near me." And I keep hitting the same wall: this data simply doesn't exist in structured form. * Pricing? Buried in a 2015 PDF on

๐ŸŸข NVIDIA/OpenShell โ€” OpenShell is the safe, private runtime for autonomous AI agents. โ€” score 37 Sources: github_trending

OpenShell is the safe, private runtime for autonomous AI agents.

Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

๐ŸŸข ansible/ansible โ€” Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on remote systems.https://docs.ansible.com. โ€” score 13 Sources: github_trending

Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on rem

๐ŸŸข huggingface/pytorch-image-models โ€” The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more โ€” score 3 Sources: github_trending

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt,

Business & Funding

๐ŸŸข Rare event prediction on time series that change structure mid-stream? [D] โ€” score 0 Sources: reddit/r/MachineLearning

Hi reddit! I made this post on r/MLQuestions, but I am posting it here too for spread:) This is a case I have been assigned at work and I'd love input from anyone who's tackled something similar. I'm building a failure prediction model for ~33k chargers. The devices emit data at two very different

Research Papers

๐ŸŸข From Pixels to Concepts: Do Segmentation Models Understand What They Segment? โ€” score 15 Sources: huggingface

Segmentation is a fundamental vision task underlying numerous downstream applications. Recent promptable segmentation models, such as Segment Anything Model 3 (SAM3), extend segmentation from category-agnostic mask prediction to concept-guided localization conditioned on high-level textual prompts.

Other Signals

๐ŸŸข Arena AI Model ELO History โ€” score 30 Sources: hackernews

๐ŸŸข Anyone actually using a local LLM as their daily knowledge base? Not for coding, for life stuff. What's your setup? โ€” score 17 Sources: reddit/r/LocalLLaMA

So I've been going down a rabbit hole lately and I can't find many people actually talking about this specific use case. everyone here runs local LLMs for coding, chat, maybe some creative writing. cool. But what about using it as a proper personal knowledge base? like, dump your own notes, PDFs, ra

๐ŸŸข GPT 5.5 v/s GPT 5.4. Paying 63% more just for 0.1 point difference! โ€” score 6 Sources: reddit/r/AIAgents

was running cost comparisons on codex models this week and kept assuming gpt-5.5 would justify the premium because it benchmarks highest. the thing i keep noticing is that raw benchmark scores and cost-adjusted scores are almost completely disconnected, and people treat them like they're the same nu

RepoDescriptionStars TodayLanguage
opendatalab/MinerUTransforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.129python
K-Dense-AI/scientific-agent-skillsA set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing.99python
openai/whisperRobust Speech Recognition via Large-Scale Weak Supervision68python
NVIDIA/OpenShellOpenShell is the safe, private runtime for autonomous AI agents.38rust
ErlichLiu/PromaๆŠŠๆœ€ไธๆป‘็š„้€š็”จ Agent ไฝ“้ชŒๅธฆ่ฟ›ไฝ ็š„ๅทฅไฝœๆต๏ผŒไธบ 100x ไธ“ไธš็”จๆˆท่€Œ็”Ÿ็š„ๆœชๆฅไบงๅ“๏ผŒๆญฃๅœจๅฎž็Žฐ proactive Agent ้˜ถๆฎตใ€‚ๅŸบไบŽ Claude Agent SDK ็š„ๅฎŒๆ•ดๅผ€ๆบๅฎž่ทต๏ผŒๅŽŸ็”Ÿๆ”ฏๆŒ้ฃžไนฆ็พค่Š่ฐƒ็”จใ€็ตๆดปๆŽฅๅ…ฅไปปๆ„ๅคงๆจกๅž‹ไพ›ๅบ”ๅ•† โ€”โ€” ่ฎฉ้กถ็บง Agent ่ƒฝๅŠ›็œŸๆญฃ่ท‘ๅœจไฝ ๆฏๅคฉ็”จ็š„ๅœฐๆ–นใ€‚35typescript
EleutherAI/lm-evaluation-harnessA framework for few-shot evaluation of language models.22python
ansible/ansibleAnsible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on remote systems.https://docs.ansible.com.18python
huggingface/pytorch-image-modelsThe largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more8python

๐Ÿ“„ New Papers

TitleCategoryHotnessLink
FrameSkip: Learning from Fewer but More Informative Frames in VLA Trainingresearch_paper19Open
RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitationresearch_paper3Open
Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairsresearch_paper7Open
Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognitionresearch_paper2Open
Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agentscs.AI0Open
Macro-Action Based Multi-Agent Instruction Following through Value Cancellationcs.AI0Open
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJackcs.AI0Open
Revealing Interpretable Failure Modes of VLMscs.AI0Open
Learning Transferable Latent User Preferences for Human-Aligned Decision Makingcs.AI0Open
On the Size Complexity and Decidability of First-Order Progressioncs.AI0Open
DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Modelscs.AI0Open
CHAL: Council of Hierarchical Agentic Languagecs.AI0Open
BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamicscs.AI0Open
State-Centric Decision Processcs.AI0Open
PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Modelscs.AI0Open

๐Ÿข Lab Blog Posts

๐Ÿฆ Twitter/X Highlights

AccountTweet Summary
swyxif your reaction to this is โ€œhaha openclaw bad, see prompt injection is the #1 dangerโ€ you: 1) havent sufficiently appreciated the layers to this tweet 2) havent seen enough ai api keys Post

Repeated From Recent Briefings