π΄ High Significance
Model Releases
π΄ The Financial Times has published an article about Heretic β score 96
Sources: reddit/r/LocalLLaMA
https://www.ft.com/content/5630ed79-a263-41ed-9a1a-321617ae310e βThe FT was able to use Heretic, a tool available on the popular code repository GitHub, to remove the guardrails from Metaβs Llama 3.3 model in less than 10 minutes without any specialist hardware.β βHeretic creator Philipp Emanuel Wei
π΄ Update on 12x32gb sxm v100 cluster / local AI for legal drafting β score 81
Sources: reddit/r/LocalLLaMA
Update from the lawyer with the V100 server. A few of you asked what I actually ended up running once the dust settled, so here it is. Still just a lawyer, still driving the whole thing through Claude Code, still not fully sure what I'm doing β but it works now, which is more than I could say last t
π΄ Is there a way to use multiple AI models without paying for 11 different monthly subscriptions? β score 81
Sources: reddit/r/AIAgents
Iβm getting into AI content creation, generating both images and short videos, but subscribing to different AI tools feels like a total rip-off. I need GPT for logic and layout, Flux for visuals, and specialized video models for motion.Right now, Iβm juggling like 5 different API keys and subscripti
π΄ Is Qwen3.6 current king for local agentic use? β score 73
Sources: reddit/r/LocalLLaMA
I've been testing other models but it seems like nothing even come close to Qwen3.6 35B A3B for agentic use. The worse I'd get is a loop sometimes, while Gemma4 produced broken tool calls occasionally and I couldn't even get GLM 4.7 Flash REAP past 2 or 3 messages before it starts looping. All IQ4_N
Developer Tools
π΄ How Do You Think We Can Help Avoid AI Scams? β score 94
Sources: reddit/r/AIAgents
Right now most people can tell if they were talking to AI. But with older folks, it's trickier. I see things like this, and it seems like it's only going to get worse. I don't think a realistic solution would be to ban AI. To me, there are a few opti
Research Papers
π΄ SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills β score 82
Sources: huggingface Β· arxiv/cs.AI
Large language model (LLM) agents accumulate rich episodic trajectories while solving real-world tasks, but it remains unclear whether such experience can be distilled into reusable procedural skills. We introduce SkillEvolBench, a diagnostic benchmark for evaluating this step from experience reuse
π΄ Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents β score 78
Sources: huggingface Β· arxiv/cs.CL
While AI agents demonstrate remarkable capabilities in reasoning and tool use, they remain fundamentally reactive: they compute responses only after explicit user prompts. This paradigm ignores a critical opportunity: the idle time between interactions is largely wasted, leaving agents unable to pre
π΄ InstructSAM: Segment Any Instance with Any Instructions β score 75
Sources: huggingface
In this paper, we introduce InstructSAM, a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. We formulates instruction-driven instance segmentation as a set-structured query prediction problem and propose an explicit reasoning-to-instance query
Other Signals
π΄ Using AI to write better code more slowly β score 88
Sources: hackernews
π΄ The famous METR AI time horizons graph contains numerous severe errors [D] β score 81
Sources: reddit/r/MachineLearning
Nathan Witkin, a research writer at NYU Sternβs Tech and Society Lab, writes damningly about the famous METR AI time horizons graph in the Substack publication Transformer: >It is impossible to dr
π‘ Notable
Model Releases
π‘ @xai: Grok Build is now available in Beta for all SuperGrok and X Premium+ users. Use Plan Mode, create images and videos with Imagine, and build automations or orchestrators with the CLI. Visit http://x. β score 60
Sources: twitter_rss
Grok Build is now available in Beta for all SuperGrok and X Premium+ users. Use Plan Mode, create images and videos with Imagine, and build automations or orchestrators with the CLI. Visit http://x.ai/cli to get started.
Developer Tools
π‘ Built a runtime governance proxy for AI agents after realizing prompt injection gets a lot scarier once agents have tools β score 69
Sources: reddit/r/AIAgents
When your agent reads external content β webpages, emails, documents, database rows β that content can contain hidden instructions that hijack it. This isnβt theoretical. A poisoned document tells your agent to forward credentials. A malicious email tells it to ignore its guidelines. The model has n
π‘ @AnthropicAI: Anthropic co-founder Chris Olah was invited to speak at today's presentation of Pope Leo XIV's encyclical "Magnifica humanitas." Read the full text of his remarks: https://www.anthropic.com/news/chri β score 50
Sources: twitter_rss
Anthropic co-founder Chris Olah was invited to speak at today's presentation of Pope Leo XIV's encyclical "Magnifica humanitas." Read the full text of his remarks: https://www.anthropic.com/news/chris-olah-pope-leo-encyclical
π‘ DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P] β score 44
Sources: reddit/r/MachineLearning
Just thought I'd share, I ran a DCGAN on a dual core RISC-V microcontroller, the CH32H417 generating 64x64 cat faces. This is a new RISC-V MCU, so no TFLite, no CMSIS NN and no external memory. It's a pure C inference engine, bit-identical to PyTorch reference outputs. The model is 12.6M parameters
Infrastructure & Compute
π‘ Norway's 2 petabytes of Huawei flash storage and LLM training β score 62
Sources: hackernews
Research Papers
π‘ CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models β score 65
Sources: huggingface
Video prediction is increasingly viewed as a path toward generalizable world models, yet it remains unclear whether these systems learn underlying causal structure or merely exploit superficial visual correlations for future prediction. We introduce CRONOS, an intervention-based benchmark designed t
π‘ A Comprehensive Dataset for Human vs. AI Generated Image Detection β score 60
Sources: arxiv/cs.AI Β· arxiv/cs.CL
arXiv:2601.00553v2 Announce Type: replace-cross Abstract: Multimodal generative AI systems like Stable Diffusion, DALL-E, and MidJourney have fundamentally changed how synthetic images are created. These tools drive innovation but also enable the spread of misleading content, false information, and
π‘ SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking β score 50
Sources: huggingface Β· arxiv/cs.AI
Mobile GUI agents powered by large language models have progressed rapidly, creating urgent needs for realistic and comprehensive evaluation. Existing benchmarks prioritize reproducibility but are often limited to open-source apps or file-operation tasks for the difficulty of constructing rewards on
π‘ Reinforcing Few-step Generators via Reward-Tilted Distribution Matching β score 45
Sources: huggingface
Recent advances in few-step diffusion distillation have enabled efficient image generation, yet aligning these models with human preferences remains challenging. We propose Reward-Tilted Distribution Matching Distillation (RTDMD), a two-stage framework that unifies distribution matching distillation
Other Signals
π‘ Already 11 000 submissions for EMNLP? [D] β score 69
Sources: reddit/r/MachineLearning
Is this normal? I searched it up and last year it was only 8000.
π‘ One letter to appease them all β score 58
Sources: reddit/r/LocalLLaMA
π‘ Are ICML workshops worth attending? [D] β score 56
Sources: reddit/r/MachineLearning
Hi! I missed securing a main conference ticket for ICML 2026, as my workshop paper got accepted two days ago. Do you believe that it is worth attending just workshops at such A*-tier conferences (with all the overseas travel costs etc.)? I was quite looking forward to attending both, including the
π‘ Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs. β score 50
Sources: reddit/r/LocalLLaMA
Here's the PR by pedapudi. https://github.com/ggml-org/llama.cpp/pull/21344 It's merge request has been denied so it will not be in mainline llama.cpp. The changes are so small that I just put them into whatever the current release of llama.cpp is. Read the PR for more info. It will only work with M
π‘ CXMT started selling ram to corsair β score 42
Sources: reddit/r/LocalLLaMA
They started producing cheaper ram for corsair, hopefully it will get cheaper for consumers [https://www.tomshardware.com/pc-components/ddr5/chinese-memory-maker-cxmt-enters-the-mainstream-consumer-memory-with-corsair-vengeance-ddr5-kit-chinese-made-dram-emerges-as-an-antidote-for-crushing-shortages
π’ Incremental
Model Releases
π’ model : add support for talkie-1930-13b by niklassheth Β· Pull Request #22596 Β· ggml-org/llama.cpp β score 27
Sources: reddit/r/LocalLLaMA
https://huggingface.co/talkie-lm/talkie-1930-13b-it talkie-1930-13b-it talkie-1930-13b-it is a 13B vintage language model. It is an instruction-tuned post-train of talkie-1930-13b-base, which was trained on 260B tokens of pre-1931 Englis
π’ Running on a macbook, and having issues with crashing? Maybe this will help... β score 4
Sources: reddit/r/LocalLLaMA
Just a friendly pointer on getting around some issues on macbooks. I hope someone finds this useful. I spent weeks of ripping my hair out with crashes, crap performance and issues - and being entirely too stubborn to harness the power of Google to find solutions to my issues. Though, I prefer doing
Developer Tools
π’ SkillOpt treats markdown skill files as trainable parameters with proper optimization machinery β score 35
Sources: reddit/r/LocalLLaMA
Paper came out recently that formalizes something a lot of agent builders have been doing ad hoc. They use a frontier model to propose bounded edits (add/delete/replace) to markdown skill files, then gate every edit against a held out validation set. Only strict improvements accepted, ties rejected,
π’ moeru-ai/airi β ππ§Έ Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported. β score 34
Sources: github_trending
ππ§Έ Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.
π’ OpenBB-finance/OpenBB β Financial data platform for analysts, quants and AI agents. β score 25
Sources: github_trending
Financial data platform for analysts, quants and AI agents.
π’ Freelancers who build WhatsApp Business API bots for multiple clients: how do you structure your Meta Developer setup? β score 24
Sources: reddit/r/AIAgents
Hey everyone, I'm building WhatsApp appointment bots for dental clinics and I'm confused about the Meta Developer App structure when scaling to multiple clients. My confusion: * Do you create ONE Meta Developer App and add each client's phone number inside it? * Or do you create a SEPARATE app f
π’ NateBJones-Projects/OB1 β Open Brain β The infrastructure layer for your thinking. One database, one AI gateway, one chat channel β any AI plugs in. No middleware, no SaaS. β score 20
Sources: github_trending
Open Brain β The infrastructure layer for your thinking. One database, one AI gateway, one chat channel β any AI plugs in. No middleware, no SaaS.
Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.
Business & Funding
π’ I finally put my NPU (Intel Arrow Lake) to use doing ASR for my smart home β score 15
Sources: reddit/r/LocalLLaMA
I wrote about what I found in a deep dive elsewhere (which I will no mention because Reddit doesn't like cross linking) but I wanted to share it here since this is where I learn the most about AI stuff and I've seen before questions about NPUs, that are often dismissed as marketing gimmicks (and for
Enterprise Adoption
π’ How I built a safety layer around LLM-generated trading code and cut deployment time from 40 hours to 20 minutes β score 0
Sources: reddit/r/AIAgents
I built AlgoAI, a platform that converts plain-English strategy descriptions into live Python trading bots running against MetaTrader 5. The goal was to compress a workflow that traditionally takes quants 8 to 40 hours down to under 20 minutes. We hit that target. But the interesting engineering pro
Research Papers
π’ HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction β score 30
Sources: huggingface
Online 3D reconstruction requires estimating camera pose and scene geometry under strict causal and bounded-memory constraints. Existing methods often suffer from drift, jitter, or collapse on long sequences. We trace these failures to a fundamental mismatch. Streaming geometry is inherently tempora
π’ Pixel-Level Pavement Distress Assessment Using Instance Segmentation β score 10
Sources: huggingface
Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to achieve the geometric precision necessary for maintenance-relevant quantification. This paper presents a v
Other Signals
π’ Use Boring Languages with LLMs β score 38
Sources: hackernews
π’ [Open Source] a contract layer at the agent tool boundary, rules in yaml not in the prompt (apache 2.0) β score 26
Sources: reddit/r/AIAgents
sharing what i've been working on. sponsio is an open-source contract layer that sits at the tool-call boundary of an llm agent. apache 2.0, python and ts. the thesis: rules that absolutely must hold (like "always check policy before issuing a refund" or "never call this tool twice per session") don
π’ qwen 3.6 27B AR-> Diffusion - local training on 5090 β score 15
Sources: reddit/r/LocalLLaMA
based on the work of open-dllm - (which achieved qwen 2.5 autoregressive -> diffusion realignment head - same exact model under the hood delivering a 4x in improvement.) TLDR I haven't got a trained model yet. just a burnt out gpu cable and a new psu on order. I did actually get the thing to do a
π’ Multimodal adaptive optical microscope: in vivo imaging, molecules to organisms β score 12
Sources: hackernews
π’ Aiki my local Wikipedia Retrieval-Augmented Generation system [R] β score 11
Sources: reddit/r/MachineLearning
Hey i built Aiki a lightweight tool that let's you chat with Wikipedia locally. https://i.redd.it/67mzfsrc6f3h1.gif what it does: * Downloads and chunks wikipedia articles (u can choose those articles by their name or articles and also the option of downloading the similar topics) * Uses a cus
π Trending Repos
| Repo | Description | Stars Today | Language |
|---|---|---|---|
| moeru-ai/airi | ππ§Έ Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported. | 62 | typescript |
| OpenBB-finance/OpenBB | Financial data platform for analysts, quants and AI agents. | 43 | python |
| NateBJones-Projects/OB1 | Open Brain β The infrastructure layer for your thinking. One database, one AI gateway, one chat channel β any AI plugs in. No middleware, no SaaS. | 25 | typescript |
π New Papers
| Title | Category | Hotness | Link |
|---|---|---|---|
| SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills | research_paper | 12 | Open |
| Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents | research_paper | 10 | Open |
| InstructSAM: Segment Any Instance with Any Instructions | research_paper | 9 | Open |
| CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models | research_paper | 6 | Open |
| A Comprehensive Dataset for Human vs. AI Generated Image Detection | cs.AI | 0 | Open |
| SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking | research_paper | 2 | Open |
| In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models | cs.AI | 0 | Open |
| Confidence Calibration in Large Language Models | cs.AI | 0 | Open |
| How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning | cs.AI | 0 | Open |
| Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction | cs.AI | 0 | Open |
| Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs | cs.AI | 0 | Open |
| Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game | cs.AI | 0 | Open |
| BODHI: Precise OS Kernel Specification Inference | cs.AI | 0 | Open |
| When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure | cs.AI | 0 | Open |
| Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Model | cs.AI | 0 | Open |
π¦ Twitter/X Highlights
| Account | Tweet Summary |
|---|---|
| xai | Grok Build is now available in Beta for all SuperGrok and X Premium+ users. Use Plan Mode, create images and videos with Imagine, and build automations or orchestrators with the CLI. Visit http://x.ai/cli to get started. Post |
| AnthropicAI | Anthropic co-founder Chris Olah was invited to speak at today's presentation of Pope Leo XIV's encyclical "Magnifica humanitas." Read the full text of his remarks: https://www.anthropic.com/news/chris-olah-pope-leo-encyclical Post |
Repeated From Recent Briefings
- Lum1104/Understand-Anything β Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. - first seen 2026-05-21
- colbymchenry/codegraph β Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode, and Hermes Agent β fewer tokens, fewer tool calls, 100% local - first seen 2026-05-09
- rohitg00/ai-engineering-from-scratch β Learn it. Build it. Ship it for others. - first seen 2026-05-21
- How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D] - first seen 2026-05-25
- anthropics/knowledge-work-plugins β Open source repository of plugins primarily intended for knowledge workers to use in Claude Cowork - first seen 2026-05-25
- mukul975/Anthropic-Cybersecurity-Skills β 754 structured cybersecurity skills for AI agents Β· Mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND & NIST AI RMF Β· agentskills.io standard Β· Works with Claude Code, GitHub Copilot, Codex CLI, Cursor, Gemini CLI & 20+ platforms Β· 26 security domains Β· Apache 2.0 - first seen 2026-05-24
- NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) - first seen 2026-05-23
- earendil-works/pi β AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods - first seen 2026-05-09
- multica-ai/multica β The open-source managed agents platform. Turn coding agents into real teammates β assign tasks, track progress, compound skills. - first seen 2026-05-24
- garrytan/gstack β Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA - first seen 2026-05-12
- ... plus 202 more repeated items in processed data