๐ด High Significance
Model Releases
๐ด Anyone else tired of the โClaude just dropped OPUS 4.8 weโre all cookedโ posts on LinkedIn? โ score 94
Sources: reddit/r/AIAgents
Every time a new Claude releases, the same cycle plays out. LinkedIn and X are flooded with "I just fired my appointment setter, my graphic designer, my copywriter..." followed by Comment AI and I'll send you the full prompt, and next thing you know you're getting pitched an AI course. The problem i
๐ด nvidia/Qwen3.6-35B-A3B-NVFP4 ยท Hugging Face โ score 90
Sources: reddit/r/LocalLLaMA
The NVIDIA Qwen3.6-35B-A3B-NVFP4 model is the quantized version of Alibaba's Qwen3.6-35B-A3B model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Qwen3.6-3
Infrastructure & Compute
๐ด My home data center โ score 70
Sources: reddit/r/LocalLLaMA
System 1: Threadripper 3960x 24c 4x 3090 ti 128gb ddr4 System 2: Xeon 8352 36c 4x 5070 ti 128gb ddr4 System 3: Intel 14700k 24c 64gb ddr5 5090 System 4: Ryzen 5950x 16c 64gb ddr4 2x 5070 ti The first system uses two PSUs to handle the almost 2000w full load of the 3090s. Was nervous about this but i
Other Signals
๐ด Someone out there likely needs this โ score 97
Sources: reddit/r/LocalLLaMA
๐ด 125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar โ score 83
Sources: reddit/r/LocalLLaMA
Under $1000 for 32gb vram from 2023, and ~300 watts draw... and this thing is outperforming the latest pick-your-vendor $5k mini pcs from 2026. So.. next question is can I make it squeeze 150 t/s with the same q4xl on cuda 13.3 this weekend. Anyone try it yet? **Edit** llamacpp ini/flags: podma
๐ด Email is still the highest ROI channel for SaaS retention and the tooling hasn't caught up with that reality โ score 83
Sources: reddit/r/AIAgents
I went down a rabbit hole on this recently so sharing what I found. Email consistently shows up at the top of ROI benchmarks for direct user communication. Studies across SaaS companies repeatedly show that behavior-triggered emails, specifically ones tied to actual in-app actions, outperform broadc
๐ด It's funny how everything changes, yet somehow stays the same. โ score 77
Sources: reddit/r/LocalLLaMA
๐ก Notable
Model Releases
๐ก mudler/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF just released ! โ score 43
Sources: reddit/r/LocalLLaMA
Description of the module: I host 30+ free APEX MoE quantizations as independent research. My only local hardware is an NVIDIA DGX Spark (122 GB unified memory) โ enough for ~30-50B-class MoEs, but bigger ones (200B+) require rented compute on H100/H200/Blackwell, typically $20-100 per
Developer Tools
๐ก Been pen-testing AI agents for a while. Same flaws everywhere. Offering a few free audits โ score 61
Sources: reddit/r/AIAgents
Spent the last several months breaking AI agents โ the kind that actually do stuff, not chatbots that just talk. Customer support bots with refund powers. Internal agents hooked into Slack, Jira, prod databases. Coding assistants with write access. It's wild how similar the failures are. Different s
๐ก how do you catch agent regressions after a change? built a small tool for this โ score 61
Sources: reddit/r/AIAgents
fix a failure, ship a new prompt or model, same failure quietly returns. been hitting this a lot. replayd is a small SDK that captures failed runs as tests and replays them before deploy. v0.1.0, early, but works end to end. not a big breakthrough, just a focused tool for one problem. putting it out
๐ก All DGX Station GB300 OEM systems side-by-side in one image (roughly actual size) โ score 50
Sources: reddit/r/LocalLLaMA
!^(Except for HP which I had to guesstimate from some Chinese guy's pic at a showcase because the ZGX Fury AI Station G1N's official page is locked down. Same reason why I didn't include Nvidia's)!<
Most underrated LLM system of 2026 and nothing comes closeAssuming budget is infinitely d
๐ก Bayesian Opt. GPs vs Linear models and Neural Networks for parameter optimizations [R] โ score 44
Sources: reddit/r/MachineLearning
Hi, Relatively new to deep learning. I wanted some opinions on which of these approaches might be best for time series data and spectral analysis. I currently use a GP and it works pretty well, but Iโm wondering what the computational tradeoffs and so forth might be. Any ideas?
Infrastructure & Compute
๐ก Dell confirms XPS laptop with NVIDIA N1X at Computex ( basically a DGX Spark GB10 for consumers with Windows ) โ score 57
Sources: reddit/r/LocalLLaMA
๐ก Why do the output layer weights become word vectors in Word2Vec? [D] โ score 56
Sources: reddit/r/MachineLearning
I'm trying to understand the intuition behind Word2Vec training using a neural network. In Word2Vec (CBOW or Skip-gram), we often hear that the weight matrices learned during training contain the vector representations (embeddings) of words. However, I don't understand why the weights of the hidden-
Business & Funding
๐ก Cost Analysis of my $6.4k Local LLM Server โ score 63
Sources: reddit/r/LocalLLaMA
I haven't seen any of these done, so I just wanted to share my experience in case it is useful for anyone. The purpose of this post is to show total cost of ownership of my local llm server versus API equivalent. Before you look at the final numbers, note that most people do not do proper financial
Other Signals
๐ก switched email tools three times in one year and i have thoughts โ score 61
Sources: reddit/r/AIAgents
first one was too basic, couldn't segment customers properly second one was powerful but every time i wanted to do something slightly custom i needed to call support or hire someone third one looked great in the demo and then the onboarding was a nightmare i think the problem is most of these tools
๐ข Incremental
Model Releases
๐ข Built a minimalist coding agent optimized for memory footprint โ score 28
Sources: reddit/r/AIAgents
Hi everybody, I spent the last two weeks building [zerostack](https://gi-dellav.github.io/zerostack/), a coding agent in Rust, focused on memory footprint. I managed to get it to run at ~16MB (with peaks of 24MB) of RAM usage, and no CPU usage when idle.
๐ข Best small model right now (~4B params) that is good with agentic tasks for personal assistant? โ score 23
Sources: reddit/r/LocalLLaMA
Looking for suggestions. I have been experimenting with gemma-4-E2B and gemma-4-E4B but the tool calling has been not the best? My tasks are just things like: * Update calendar * Get my schedule * Send a WA message at 4PM etc. Any suggestions? If it helps, here are my server params: ` ./llama-server
๐ข Made a program using LocalLLM based on llama.cpp for fellow Book Lovers! โ score 10
Sources: reddit/r/LocalLLaMA
TL;DR: I built an Ebook reader embedded with a compact translation model. Hi! I know this post has a promotional nature, but it contains a concept that I believe readers who love books will appreciate, so please take a look. While talking to an AI developer from an English-speaking country livin
๐ข Speed difference between Windows 11 and Linux with llama.cpp: a myth when using medium and large MoE models โ score 7
Sources: reddit/r/LocalLLaMA
As the title says, there is no speed difference between Linux and Windows when using llama.cpp. I myself kept two operating systems on my computer for a long time because of this misconception. But when I got tired of constantly switching, I decided to check how much performance Iโd lose if I moved
Developer Tools
๐ข An agentic bank - built, run, and overseen by agents. Banking other agents โ score 28
Sources: reddit/r/AIAgents
A thought experiment. What if you don't transform anything, and start a bank with only agents. Not agentic transformation. A full universal bank... built, run, and overseen by agents. Banking other agents. Play around, see what the agents do: bank.unsol.dev Next up, probably the OCC charter and FINR
๐ข Frona v2026.5.5 โ self-hosted personal AI assistant โ score 28
Sources: reddit/r/AIAgents
Hey, Since LLM tool calling became a thing, the dominant pattern has been: ship an AI assistant that can execute code, browse the web, and hit your APIs, and figure out the security story later. Frona started as a pushback against that pattern. Frona is a personal AI assistant you self-host. You cre
๐ข Use any model and any provider with the official OpenAI Codex Desktop App, without modifying its code, and continue to use the official models in parallel? โ score 17
Sources: reddit/r/LocalLLaMA
All in the title. The official OpenAI Codex Desktop App only accepts models that are from OpenAI and from a curated list. But there is a trick, you can make it think you are using the official OpenAI servers and models by just impersonating the model name and using another one. There is three things
๐ข Donโt bite me for that question pleaseโฆ โ score 7
Sources: reddit/r/LocalLLaMA
And question isโฆ How you earning money on your local llm setups? (Except coding ofc) I see people spending SO MUCH MONEY on the compute power to run llms locally and many of them saying that their setups already payed themselves or they earning much more (I guess they not mean that they saves the sa
Business & Funding
๐ข [PSA] 5060ti 16GB for $300.99. 5070ti 16GB for $699.99. Best Buy in store clearance. โ score 37
Sources: reddit/r/LocalLLaMA
The 5060ti 16GB(SKU 6630626) has been on clearance for a couple of weeks in Best Buy stores for $419.99. A couple of days ago, it dropped to $300.99. The 5070ti 16GB(SKU 6620367) has been on clearance for $699.99. Not all stores will have these prices. Some still have the 5060ti for $419.99 still. T
๐ข Has Anyone Built an Automated Income Stream That Generates Revenue With Little to No Ongoing Work? โ score 6
Sources: reddit/r/AIAgents
Iโm curious to hear real-world examples from people who have built something that generates revenue mostly on autopilot. Iโm not talking about investing in stocks or real estate. I mean businesses, websites, tools, digital products, AI workflows, bots, newsletters, content systems, affiliate sites,
Other Signals
๐ข Workshop submission for main conference paper under review [D] โ score 19
Sources: reddit/r/MachineLearning
I have an ECCV paper main conf. Can I submit the same to a workshop at some other place happening before ECCV? The other workshop (non archival) will be after the final decisions of eccv come. Under any result in eccv- acceptance or rejection, how will this affect it? Im not the main author at eccv.
๐ข Query about non-archival workshop at CVPR-2026 [R] โ score 19
Sources: reddit/r/MachineLearning
My paper was recently accepted to a workshop at CVPR-2026 as non-archival acceptance. Is it mandatory for me to register to the conference as I won't be able to attend(visa issues), but my friend will be there in the conference and can present on my behalf. I have few questions regarding my situatio
๐ข I built mlx-Chronos โ a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) [P] โ score 0
Sources: reddit/r/MachineLearning
Hey! I'm a CS student and I got tired of not being able to compare MLX inference engines properly โ every benchmark out there is either made by the engine's own developers, runs on an M3 Ultra nobody has, or just shows tok/s with zero context. So I built mlx-Chronos โ a small open source CLI tool th
Repeated From Recent Briefings
- harry0703/MoneyPrinterTurbo โ ๅฉ็จAIๅคงๆจกๅ๏ผไธ้ฎ็ๆ้ซๆธ ็ญ่ง้ข Generate short videos with one click using AI LLM. - first seen 2026-05-28
- Why Far Looks Up: Probing Spatial Representation in Vision-Language Models - first seen 2026-05-30
- How long does it realistically take for you to produce an ICML/NeurIPS/ICLR-level paper? [D] - first seen 2026-05-30
- run-llama/liteparse โ A fast, helpful, and open-source document parser - first seen 2026-05-29
- farion1231/cc-switch โ A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io - first seen 2026-05-08
- DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation - first seen 2026-05-29
- anthropics/claude-code โ Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands. - first seen 2026-05-29
- Graduating Without a PhD Internship [D] - first seen 2026-05-30
- twentyhq/twenty โ The open alternative to Salesforce, designed for AI. - first seen 2026-05-25
- Crosstalk-Solutions/project-nomad โ Project N.O.M.A.D, is a self-contained, offline survival computer packed with critical tools, knowledge, and AI to keep you informed and empoweredโanytime, anywhere. - first seen 2026-05-08
- ... plus 29 more repeated items in processed data