๐Ÿ”ด High Significance

Developer Tools

๐Ÿ”ด Anthropic's open-source framework for AI-powered vulnerability discovery โ€” score 90 Sources: hackernews

๐Ÿ”ด KVarN: new KV-cache quant from Huawei. 3โ€“5ร— KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag) โ€” score 82 Sources: reddit/r/LocalLLaMA

The KV-cache quant race just got more interesting. Huawei just open-sourced KVarN, a KV-cache quantization method under Apache 2.0, drops into vLLM with one flag. Posting because the tradeoff it's claiming is genuinely different from what's already in the stack, and I'd like to see it stress-tes

๐Ÿ”ด What AI app builder are you using these days? Strong use cases + real experiences โ€” score 81 Sources: reddit/r/AIAgents

I'm starting to reach a saturation point with the AI app builders now. Feels like every other day on X someoneโ€™s claiming they built and shipped a full app over the weekend with some new tool. Lovable, Bolt.new, Emergent, Replit Agentโ€ฆ itโ€™s nonstop and hard to tell whatโ€™s good atp. Iโ€™m trying to pic

๐Ÿ”ด fathah/hermes-desktop โ€” Desktop Companion for Hermes Agent โ€” score 78 Sources: github_trending

Desktop Companion for Hermes Agent

Infrastructure & Compute

๐Ÿ”ด Nvidia's been paying shills on LinkedIn โ€” score 96 Sources: reddit/r/LocalLLaMA

3 different accounts, some even with LinkedIn Gold, made the above posts all on the same day. And clearly all of them followed the marketing team's pointers without even understanding how locally hosted AI works, no way a $249 8GB machine can replace frontier models.

Research Papers

๐Ÿ”ด AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints โ€” score 78 Sources: huggingface ยท arxiv/cs.CL

Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still underexplore adaptive planning under such progressively revealed dual c

๐Ÿ”ด Complexity-Balanced Diffusion Splitting โ€” score 75 Sources: huggingface

Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance, deploying a massive network uniformly across the entire generative

Other Signals

๐Ÿ”ด finally โ€” score 89 Sources: reddit/r/LocalLLaMA

๐ŸŸก Notable

Model Releases

๐ŸŸก Computer-use agents now beat humans on AndroidWorld. Where are the production QA deployments? โ€” score 56 Sources: reddit/r/AIAgents

Was looking at the AndroidWorld leaderboard this week. The top entry hits 92% on mobile UI tasks, beating an 88% human baseline. On paper that's already past the line where you'd expect production QA agents to be everywhere. But every time I talk to QA leads at meetups they're still on Selenium + Cy

Developer Tools

๐ŸŸก KVarN: Variance-Normalized KV-Cache Quantization [R] โ€” score 69 Sources: reddit/r/MachineLearning

Excited to share some of my own work here :) KVarN is our new KV-Cache quantization method. In very brief, we combine Hadamard rotations with variance-normalization on both axes of the K and V matrices, then round to nearest. Simple, but works very well, especially for decode-heavy test-time-s

๐ŸŸก mvanhorn/last30days-skill โ€” AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary โ€” score 57 Sources: github_trending

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

๐ŸŸก What should an agent handoff include besides the transcript? โ€” score 56 Sources: reddit/r/AIAgents

Full transcripts feel like the wrong default once an agent run gets long. Iโ€™m getting more value from a compact handoff: what weโ€™re trying to do, whatโ€™s already decided, what failed, current state, and the next action. What else has actually reduced rework for you?

๐ŸŸก AIDC-AI/Pixelle-Video โ€” ๐Ÿš€ AI ๅ…จ่‡ชๅŠจ็Ÿญ่ง†้ข‘ๅผ•ๆ“Ž | AI Fully Automated Short Video Engine โ€” score 49 Sources: github_trending

๐Ÿš€ AI ๅ…จ่‡ชๅŠจ็Ÿญ่ง†้ข‘ๅผ•ๆ“Ž | AI Fully Automated Short Video Engine

Infrastructure & Compute

๐ŸŸก cyberpapiii/chipotlai-max โ€” The AI coding agent that runs on stolen Chipotle compute ๐ŸŒฏ Fork of OpenCode with Pepper AI as default model. Community project to add providers from Home Depot, Lowes, Target, Starbucks & more. โ€” score 59 Sources: github_trending

The AI coding agent that runs on stolen Chipotle compute ๐ŸŒฏ Fork of OpenCode with Pepper AI as default model. Community project to add providers from Home Depot, Lowes, Target, Starbucks & more.

Business & Funding

๐ŸŸก Would you say capture-time semantic annotation for robot trajectories is a solved problem? [R] โ€” score 44 Sources: reddit/r/MachineLearning

It seems raw teleoperation data (RGB + joint states) structurally lacks affordance, contact intent, and embodiment-specific kinematic context. (information that can't be reliably recovered post-hoc once the demonstration is recorded) Most current approaches either filter/clean after collection, or r

Research Papers

๐ŸŸก LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs โ€” score 68 Sources: huggingface ยท arxiv/cs.AI

Large language models can reproduce training data, but existing memorization evaluations mostly measure whether models can be forced to do so, rather than whether they do so under ordinary use. We introduce PropMe, a propensity-aware framework for memorization evaluation that contrasts prefix-based

๐ŸŸก Towards One-to-Many Temporal Grounding โ€” score 62 Sources: huggingface ยท arxiv/cs.AI

Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios, however, often require localizing multiple disjoint segments for a single query -- a setting we term One-to-Many Temporal

๐ŸŸก Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs โ€” score 58 Sources: huggingface ยท arxiv/cs.CL

Automatic Speech Recognition (ASR) has become a key technology for human--AI interaction. However, code-switching ASR (CS-ASR) remains particularly challenging due to the severe scarcity of multilingual CS speech resources across diverse language pairs. Existing approaches primarily improve CS-ASR p

Other Signals

๐ŸŸก Today made me realize just how bad things have gotten without Meta โ€” score 68 Sources: reddit/r/LocalLLaMA

๐ŸŸก Showcase: a much easier way to give your agent a free phone number โ€” score 64 Sources: reddit/r/AIAgents

Yo I just wanted to share a project me and my friend are working on called OP. It gives agents (especially hermes/openclaw) a real phone number they can use to send texts, do 2FA, calls, etc. Twilio works, but the free tier sucks with the sandbox since all their numbers are VoIP and have to follow 1

๐ŸŸก You guys were right - Qwen 3.6 35B IS good...and KV Cache DOES matter. โ€” score 61 Sources: reddit/r/LocalLLaMA

WARNING: I'm speed typing this, no time to organizea/format, so if short paragraph chunks bother you, just keep it moving. CONTEXT UPDATE: (for those interested, otherwise skip) >For those interested in the data points, the task was building an agentic workflow inside of rivet that incl

๐ŸŸก How do ML researchers actually use AI tools to improve their writing? [D] โ€” score 56 Sources: reddit/r/MachineLearning

As an ML researcher, how do you use AI tools in your daily work? Do you mostly use them to clean up grammar and wording, or also to rewrite, structure, or draft technical text?

๐ŸŸก Finally finished my LLM server: EPYC 9575F, 4ร— RTX 3090 (96GB VRAM), 768GB ECC RAM โ€” score 54 Sources: reddit/r/LocalLLaMA

Took a while, but Nalthis is finally up and assembled. Specs: * Supermicro H13SSL-N * AMD EPYC 9575F (64C/128T Zen 5) * 768GB DDR5-5600 ECC RDIMM * 4ร— RTX 3090 (96GB VRAM total) * 1ร— 2TB NVMe OS * 2ร— 3.94TB NVMe data * 2050W ATX 3.1 PSU * Corsair 9000D Planned use: * vLLM - high throughput small mod

Omitted 4 additional other signals items from the main section; see raw data and source-specific sections below.

๐ŸŸข Incremental

Model Releases

๐ŸŸข Are We Underestimating Small Edge AI Models?[D] โ€” score 19 Sources: reddit/r/MachineLearning

A lot of recent discussion around Edge AI focuses on running increasingly larger local LLMs. Meanwhile modern smartphones already have enough compute for many practical computer vision tasks that don't require massive models at all. I recently built and released an Android feature that performs offl

๐ŸŸข An agent runtime with persistent memory that fans work out across multiple models. โ€” score 19 Sources: reddit/r/AIAgents

Hey! Finally releasing code I've put the past 4-5 months of my life into, I had an idea and wanted to fix some things that really irritated me with LLMs. Aimee runs agents that actually remember. Self-hosted, your keys. No subscriptions, no costs, purely open source. First public beta release, but t

๐ŸŸข Here is my llama.cpp NVFP4/MXFP6 GGUF quantizer tool โ€” score 18 Sources: reddit/r/LocalLLaMA

Hello everyone I wanted to share what I've been working on. I started writing NVFP4 kernels for llama.cpp last year and needed the ability to quantize NVFP4 GGUFs, so this project started as an NVFP4 quantizer. It's since become much larger. I would love to get more help to improve it. This is what

Developer Tools

๐ŸŸข RTX Spark Ads: DJT Edition โ€” score 39 Sources: reddit/r/LocalLLaMA

"Weโ€™re going to have the most beautiful laptops, theyโ€™ll be the slimmest laptops ever. A total masterpiece, look at that green chip. Unbelievably powerful. Theyโ€™ll be so slim you wonโ€™t even see them from the sideโ€ฆbelieve meโ€ฆitโ€™s true. A lot of people are saying it. Itโ€™s not like those big, clumsy, f

๐ŸŸข I built a local AI agent runtime focused on security and UX after being unsatisfied with existing options โ€” here is what I learned โ€” score 36 Sources: reddit/r/AIAgents

I have been using open source AI agent runtimes for a while and kept running into the same two problems. Either the tool was powerful but the security model made me uncomfortable giving it access to my email and projects, or it was safe but too stripped down to do anything genuinely useful. So I bui

๐ŸŸข Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? [d] โ€” score 19 Sources: reddit/r/MachineLearning

Hello everyone, Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? I am working on a project idea related to library-specific code generation. The concrete case is a specific Python library used in a technical/scientific domain. The go

๐ŸŸข Which AI lip-sync tool are people actually using in 2026? โ€” score 19 Sources: reddit/r/AIAgents

I have been experimenting with faceless videos lately and realized good lip sync is way harder than I expected. Over the last few weeks I tested a bunch of option, HeyGen, InfiniteTalk, and a few smaller tools that kept popping up in recommendations. A lot of tools in the market nail the lips but le

๐ŸŸข Why haven't MCP Apps gone viral the way MCP and Skills did? โ€” score 19 Sources: reddit/r/AIAgents

When MCP and Agent Skills came out, they went viral really fast. But why didn't the MCP App gain that same traction? Or at least not anywhere close? For those who don't know, MCP app is a standard that introduces interactive UI for MCPs. Check out this link for more info. [https://modelcontextprotoc

Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

๐ŸŸข NVIDIA/NemoClaw โ€” Run agents like Hermes and OpenClaw more securely inside NVIDIA OpenShell with managed inference โ€” score 18 Sources: github_trending

Run agents like Hermes and OpenClaw more securely inside NVIDIA OpenShell with managed inference

Other Signals

๐ŸŸข Gemma 4 12B is my new main squeeze โ€” score 32 Sources: reddit/r/LocalLLaMA

The Unsloth Q5_K_XL is officially my main squeeze for local coding. I started out with the Q4_K_XL, but found myself fixing syntax errors a little too often. It wasn't terrible, but I had one file where I had to make 23 edits just for syntax. With the Q4 I was pulling around 61 t/s, and moving t

๐ŸŸข Fine-tuning an LLM to write docs like it's 1995 โ€” score 30 Sources: hackernews

๐ŸŸข How LLM-driven NPCs work in Ultima Online (ServUO) โ€” score 25 Sources: reddit/r/LocalLLaMA

๐ŸŸข PSA: You may not need to quantize spec draft when using MTP โ€” score 11 Sources: reddit/r/LocalLLaMA

Using `--spec-draft-type-k q4_0 --spec-draft-type-v q4_0` might actually decrease your context size! With quantized spec draft, my context size is 83200. Without it (i.e. using the default fp16 spec draft), context size increased to 91648. I reported this in a llama.cpp discussion and am17an (th

RepoDescriptionStars TodayLanguage
fathah/hermes-desktopDesktop Companion for Hermes Agent387typescript
cyberpapiii/chipotlai-maxThe AI coding agent that runs on stolen Chipotle compute ๐ŸŒฏ Fork of OpenCode with Pepper AI as default model. Community project to add providers from Home Depot, Lowes, Target, Starbucks & more.201typescript
mvanhorn/last30days-skillAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary199python
AIDC-AI/Pixelle-Video๐Ÿš€ AI ๅ…จ่‡ชๅŠจ็Ÿญ่ง†้ข‘ๅผ•ๆ“Ž | AI Fully Automated Short Video Engine125python
NVIDIA/NemoClawRun agents like Hermes and OpenClaw more securely inside NVIDIA OpenShell with managed inference53typescript

๐Ÿ“„ New Papers

TitleCategoryHotnessLink
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraintsresearch_paper25Open
Complexity-Balanced Diffusion Splittingresearch_paper13Open
LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMsresearch_paper7Open
Towards One-to-Many Temporal Groundingresearch_paper4Open
Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairsresearch_paper3Open
How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experimentcs.AI0Open
What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systemscs.AI0Open
I Know What You Meme, Even If it Emerged Today: Understanding Evolving Memes through Open-World Knowledge Acquisitioncs.AI0Open
GITCO: Gated Inference-Time Context Optimization in TSFMscs.AI0Open
Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factorycs.AI0Open
SentinelBench: A Benchmark for Long-Running Monitoring Agentscs.AI0Open
An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)cs.AI0Open
Synthetic Contrastive Reasoning for Multi-Table Q&Acs.AI0Open
Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judgescs.AI0Open
Residual Modeling for High-Fidelity Learned Compression of Scientific Datacs.AI0Open

๐Ÿฆ Twitter/X Highlights

AccountTweet Summary
OpenAIWhat happened when one of our models found a counterexample to an 80-year-old Erdล‘s conjecture? Researchers @alexwei_, @HongxunWu, and @wjmzbmr1 shared the story on the OpenAI Podcast with @AndrewMayne and explained how mathematicians and models can work together to make new discoveries. Post

Repeated From Recent Briefings