πŸ”΄ High Significance

Model Releases

πŸ”΄ The Financial Times has published an article about Heretic β€” score 96 Sources: reddit/r/LocalLLaMA

https://www.ft.com/content/5630ed79-a263-41ed-9a1a-321617ae310e β€œThe FT was able to use Heretic, a tool available on the popular code repository GitHub, to remove the guardrails from Meta’s Llama 3.3 model in less than 10 minutes without any specialist hardware.” β€œHeretic creator Philipp Emanuel Wei

πŸ”΄ Update on 12x32gb sxm v100 cluster / local AI for legal drafting β€” score 81 Sources: reddit/r/LocalLLaMA

Update from the lawyer with the V100 server. A few of you asked what I actually ended up running once the dust settled, so here it is. Still just a lawyer, still driving the whole thing through Claude Code, still not fully sure what I'm doing β€” but it works now, which is more than I could say last t

πŸ”΄ Is there a way to use multiple AI models without paying for 11 different monthly subscriptions? β€” score 81 Sources: reddit/r/AIAgents

I’m getting into AI content creation, generating both images and short videos, but subscribing to different AI tools feels like a total rip-off. I need GPT for logic and layout, Flux for visuals, and specialized video models for motion.Right now, I’m juggling like 5 different API keys and subscripti

πŸ”΄ Is Qwen3.6 current king for local agentic use? β€” score 73 Sources: reddit/r/LocalLLaMA

I've been testing other models but it seems like nothing even come close to Qwen3.6 35B A3B for agentic use. The worse I'd get is a loop sometimes, while Gemma4 produced broken tool calls occasionally and I couldn't even get GLM 4.7 Flash REAP past 2 or 3 messages before it starts looping. All IQ4_N

Developer Tools

πŸ”΄ How Do You Think We Can Help Avoid AI Scams? β€” score 94 Sources: reddit/r/AIAgents

Right now most people can tell if they were talking to AI. But with older folks, it's trickier. I see things like this, and it seems like it's only going to get worse. I don't think a realistic solution would be to ban AI. To me, there are a few opti

Research Papers

πŸ”΄ SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills β€” score 82 Sources: huggingface Β· arxiv/cs.AI

Large language model (LLM) agents accumulate rich episodic trajectories while solving real-world tasks, but it remains unclear whether such experience can be distilled into reusable procedural skills. We introduce SkillEvolBench, a diagnostic benchmark for evaluating this step from experience reuse

πŸ”΄ Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents β€” score 78 Sources: huggingface Β· arxiv/cs.CL

While AI agents demonstrate remarkable capabilities in reasoning and tool use, they remain fundamentally reactive: they compute responses only after explicit user prompts. This paradigm ignores a critical opportunity: the idle time between interactions is largely wasted, leaving agents unable to pre

πŸ”΄ InstructSAM: Segment Any Instance with Any Instructions β€” score 75 Sources: huggingface

In this paper, we introduce InstructSAM, a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. We formulates instruction-driven instance segmentation as a set-structured query prediction problem and propose an explicit reasoning-to-instance query

Other Signals

πŸ”΄ Using AI to write better code more slowly β€” score 88 Sources: hackernews

πŸ”΄ The famous METR AI time horizons graph contains numerous severe errors [D] β€” score 81 Sources: reddit/r/MachineLearning

Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, writes damningly about the famous METR AI time horizons graph in the Substack publication Transformer: >It is impossible to dr

🟑 Notable

Model Releases

🟑 @xai: Grok Build is now available in Beta for all SuperGrok and X Premium+ users. Use Plan Mode, create images and videos with Imagine, and build automations or orchestrators with the CLI. Visit http://x. β€” score 60 Sources: twitter_rss

Grok Build is now available in Beta for all SuperGrok and X Premium+ users. Use Plan Mode, create images and videos with Imagine, and build automations or orchestrators with the CLI. Visit http://x.ai/cli to get started.

Developer Tools

🟑 Built a runtime governance proxy for AI agents after realizing prompt injection gets a lot scarier once agents have tools β€” score 69 Sources: reddit/r/AIAgents

When your agent reads external content β€” webpages, emails, documents, database rows β€” that content can contain hidden instructions that hijack it. This isn’t theoretical. A poisoned document tells your agent to forward credentials. A malicious email tells it to ignore its guidelines. The model has n

🟑 @AnthropicAI: Anthropic co-founder Chris Olah was invited to speak at today's presentation of Pope Leo XIV's encyclical "Magnifica humanitas." Read the full text of his remarks: https://www.anthropic.com/news/chri β€” score 50 Sources: twitter_rss

Anthropic co-founder Chris Olah was invited to speak at today's presentation of Pope Leo XIV's encyclical "Magnifica humanitas." Read the full text of his remarks: https://www.anthropic.com/news/chris-olah-pope-leo-encyclical

🟑 DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P] β€” score 44 Sources: reddit/r/MachineLearning

Just thought I'd share, I ran a DCGAN on a dual core RISC-V microcontroller, the CH32H417 generating 64x64 cat faces. This is a new RISC-V MCU, so no TFLite, no CMSIS NN and no external memory. It's a pure C inference engine, bit-identical to PyTorch reference outputs. The model is 12.6M parameters

Infrastructure & Compute

🟑 Norway's 2 petabytes of Huawei flash storage and LLM training β€” score 62 Sources: hackernews

Research Papers

🟑 CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models β€” score 65 Sources: huggingface

Video prediction is increasingly viewed as a path toward generalizable world models, yet it remains unclear whether these systems learn underlying causal structure or merely exploit superficial visual correlations for future prediction. We introduce CRONOS, an intervention-based benchmark designed t

🟑 A Comprehensive Dataset for Human vs. AI Generated Image Detection β€” score 60 Sources: arxiv/cs.AI Β· arxiv/cs.CL

arXiv:2601.00553v2 Announce Type: replace-cross Abstract: Multimodal generative AI systems like Stable Diffusion, DALL-E, and MidJourney have fundamentally changed how synthetic images are created. These tools drive innovation but also enable the spread of misleading content, false information, and

🟑 SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking β€” score 50 Sources: huggingface Β· arxiv/cs.AI

Mobile GUI agents powered by large language models have progressed rapidly, creating urgent needs for realistic and comprehensive evaluation. Existing benchmarks prioritize reproducibility but are often limited to open-source apps or file-operation tasks for the difficulty of constructing rewards on

🟑 Reinforcing Few-step Generators via Reward-Tilted Distribution Matching β€” score 45 Sources: huggingface

Recent advances in few-step diffusion distillation have enabled efficient image generation, yet aligning these models with human preferences remains challenging. We propose Reward-Tilted Distribution Matching Distillation (RTDMD), a two-stage framework that unifies distribution matching distillation

Other Signals

🟑 Already 11 000 submissions for EMNLP? [D] β€” score 69 Sources: reddit/r/MachineLearning

Is this normal? I searched it up and last year it was only 8000.

🟑 One letter to appease them all β€” score 58 Sources: reddit/r/LocalLLaMA

🟑 Are ICML workshops worth attending? [D] β€” score 56 Sources: reddit/r/MachineLearning

Hi! I missed securing a main conference ticket for ICML 2026, as my workshop paper got accepted two days ago. Do you believe that it is worth attending just workshops at such A*-tier conferences (with all the overseas travel costs etc.)? I was quite looking forward to attending both, including the

🟑 Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs. β€” score 50 Sources: reddit/r/LocalLLaMA

Here's the PR by pedapudi. https://github.com/ggml-org/llama.cpp/pull/21344 It's merge request has been denied so it will not be in mainline llama.cpp. The changes are so small that I just put them into whatever the current release of llama.cpp is. Read the PR for more info. It will only work with M

🟑 CXMT started selling ram to corsair β€” score 42 Sources: reddit/r/LocalLLaMA

They started producing cheaper ram for corsair, hopefully it will get cheaper for consumers [https://www.tomshardware.com/pc-components/ddr5/chinese-memory-maker-cxmt-enters-the-mainstream-consumer-memory-with-corsair-vengeance-ddr5-kit-chinese-made-dram-emerges-as-an-antidote-for-crushing-shortages

🟒 Incremental

Model Releases

🟒 model : add support for talkie-1930-13b by niklassheth Β· Pull Request #22596 Β· ggml-org/llama.cpp β€” score 27 Sources: reddit/r/LocalLLaMA

https://huggingface.co/talkie-lm/talkie-1930-13b-it talkie-1930-13b-it talkie-1930-13b-it is a 13B vintage language model. It is an instruction-tuned post-train of talkie-1930-13b-base, which was trained on 260B tokens of pre-1931 Englis

🟒 Running on a macbook, and having issues with crashing? Maybe this will help... β€” score 4 Sources: reddit/r/LocalLLaMA

Just a friendly pointer on getting around some issues on macbooks. I hope someone finds this useful. I spent weeks of ripping my hair out with crashes, crap performance and issues - and being entirely too stubborn to harness the power of Google to find solutions to my issues. Though, I prefer doing

Developer Tools

🟒 SkillOpt treats markdown skill files as trainable parameters with proper optimization machinery β€” score 35 Sources: reddit/r/LocalLLaMA

Paper came out recently that formalizes something a lot of agent builders have been doing ad hoc. They use a frontier model to propose bounded edits (add/delete/replace) to markdown skill files, then gate every edit against a held out validation set. Only strict improvements accepted, ties rejected,

🟒 moeru-ai/airi β€” πŸ’–πŸ§Έ Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported. β€” score 34 Sources: github_trending

πŸ’–πŸ§Έ Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.

🟒 OpenBB-finance/OpenBB β€” Financial data platform for analysts, quants and AI agents. β€” score 25 Sources: github_trending

Financial data platform for analysts, quants and AI agents.

🟒 Freelancers who build WhatsApp Business API bots for multiple clients: how do you structure your Meta Developer setup? β€” score 24 Sources: reddit/r/AIAgents

Hey everyone, I'm building WhatsApp appointment bots for dental clinics and I'm confused about the Meta Developer App structure when scaling to multiple clients. My confusion: * Do you create ONE Meta Developer App and add each client's phone number inside it? * Or do you create a SEPARATE app f

🟒 NateBJones-Projects/OB1 β€” Open Brain β€” The infrastructure layer for your thinking. One database, one AI gateway, one chat channel β€” any AI plugs in. No middleware, no SaaS. β€” score 20 Sources: github_trending

Open Brain β€” The infrastructure layer for your thinking. One database, one AI gateway, one chat channel β€” any AI plugs in. No middleware, no SaaS.

Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.

Business & Funding

🟒 I finally put my NPU (Intel Arrow Lake) to use doing ASR for my smart home β€” score 15 Sources: reddit/r/LocalLLaMA

I wrote about what I found in a deep dive elsewhere (which I will no mention because Reddit doesn't like cross linking) but I wanted to share it here since this is where I learn the most about AI stuff and I've seen before questions about NPUs, that are often dismissed as marketing gimmicks (and for

Enterprise Adoption

🟒 How I built a safety layer around LLM-generated trading code and cut deployment time from 40 hours to 20 minutes β€” score 0 Sources: reddit/r/AIAgents

I built AlgoAI, a platform that converts plain-English strategy descriptions into live Python trading bots running against MetaTrader 5. The goal was to compress a workflow that traditionally takes quants 8 to 40 hours down to under 20 minutes. We hit that target. But the interesting engineering pro

Research Papers

🟒 HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction β€” score 30 Sources: huggingface

Online 3D reconstruction requires estimating camera pose and scene geometry under strict causal and bounded-memory constraints. Existing methods often suffer from drift, jitter, or collapse on long sequences. We trace these failures to a fundamental mismatch. Streaming geometry is inherently tempora

🟒 Pixel-Level Pavement Distress Assessment Using Instance Segmentation β€” score 10 Sources: huggingface

Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to achieve the geometric precision necessary for maintenance-relevant quantification. This paper presents a v

Other Signals

🟒 Use Boring Languages with LLMs β€” score 38 Sources: hackernews

🟒 [Open Source] a contract layer at the agent tool boundary, rules in yaml not in the prompt (apache 2.0) β€” score 26 Sources: reddit/r/AIAgents

sharing what i've been working on. sponsio is an open-source contract layer that sits at the tool-call boundary of an llm agent. apache 2.0, python and ts. the thesis: rules that absolutely must hold (like "always check policy before issuing a refund" or "never call this tool twice per session") don

🟒 qwen 3.6 27B AR-> Diffusion - local training on 5090 β€” score 15 Sources: reddit/r/LocalLLaMA

based on the work of open-dllm - (which achieved qwen 2.5 autoregressive -> diffusion realignment head - same exact model under the hood delivering a 4x in improvement.) TLDR I haven't got a trained model yet. just a burnt out gpu cable and a new psu on order. I did actually get the thing to do a

🟒 Multimodal adaptive optical microscope: in vivo imaging, molecules to organisms β€” score 12 Sources: hackernews

🟒 Aiki my local Wikipedia Retrieval-Augmented Generation system [R] β€” score 11 Sources: reddit/r/MachineLearning

Hey i built Aiki a lightweight tool that let's you chat with Wikipedia locally. https://i.redd.it/67mzfsrc6f3h1.gif what it does: * Downloads and chunks wikipedia articles (u can choose those articles by their name or articles and also the option of downloading the similar topics) * Uses a cus

RepoDescriptionStars TodayLanguage
moeru-ai/airiπŸ’–πŸ§Έ Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.62typescript
OpenBB-finance/OpenBBFinancial data platform for analysts, quants and AI agents.43python
NateBJones-Projects/OB1Open Brain β€” The infrastructure layer for your thinking. One database, one AI gateway, one chat channel β€” any AI plugs in. No middleware, no SaaS.25typescript

πŸ“„ New Papers

TitleCategoryHotnessLink
SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skillsresearch_paper12Open
Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agentsresearch_paper10Open
InstructSAM: Segment Any Instance with Any Instructionsresearch_paper9Open
CRONOS: Benchmarking Counterfactual Physical Consistency in Video Modelsresearch_paper6Open
A Comprehensive Dataset for Human vs. AI Generated Image Detectioncs.AI0Open
SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarkingresearch_paper2Open
In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Modelscs.AI0Open
Confidence Calibration in Large Language Modelscs.AI0Open
How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoningcs.AI0Open
Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interactioncs.AI0Open
Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffscs.AI0Open
Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Gamecs.AI0Open
BODHI: Precise OS Kernel Specification Inferencecs.AI0Open
When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressurecs.AI0Open
Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Modelcs.AI0Open

🐦 Twitter/X Highlights

AccountTweet Summary
xaiGrok Build is now available in Beta for all SuperGrok and X Premium+ users. Use Plan Mode, create images and videos with Imagine, and build automations or orchestrators with the CLI. Visit http://x.ai/cli to get started. Post
AnthropicAIAnthropic co-founder Chris Olah was invited to speak at today's presentation of Pope Leo XIV's encyclical "Magnifica humanitas." Read the full text of his remarks: https://www.anthropic.com/news/chris-olah-pope-leo-encyclical Post

Repeated From Recent Briefings