π΄ High Significance
Model Releases
π΄ Waiting for Qwen 3.7 open weight... The new King has arrived... β score 90
Sources: reddit/r/LocalLLaMA
The hype is real! https://qwen.ai/blog?id=qwen3.7
π΄ Qwen3.6 35Ba3 has changed my workflows and even how I use my computer β score 77
Sources: reddit/r/LocalLLaMA
My workflow has changed basically to ask Codex to do certain tasks and then document how to do them (including errors it found on its way) into a skill. I feed that skill to pi, and suddenly my qwen3.6 gets that hard stuff done: - devops on a VPS - using docling to create epubs from old PDFs - us
Developer Tools
π΄ Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D] β score 81
Sources: reddit/r/MachineLearning
The research community has provided (already for some time) seemingly more efficient and effective tokenizations for vision. Do we have any hint on whether non-fixed-patches tokenization is being applied on the big player models? I imagine not, and I'm trying to think why: - marginal gains? - pipe
π΄ A.I. Agents: Theyβre Fun Theyβre Useful But Donβt Give Them the Credit Card β score 81
Sources: reddit/r/AIAgents
Infrastructure & Compute
π΄ When your LLM treats data center GPUs like an optional DLC β score 70
Sources: reddit/r/LocalLLaMA
Research Papers
π΄ TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation β score 82
Sources: huggingface Β· arxiv/cs.CL
Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency. We present TransitLM, a large-scale dataset of over 13 million transit route planning records from four Chines
π΄ LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning β score 72
Sources: huggingface Β· arxiv/cs.CL
Joint audio-visual reasoning is essential for omnimodal understanding, yet current multimodal large language models (MLLMs) still struggle when reasoning requires fine-grained evidence from both modalities. A central limitation is that explicit text-based chain-of-thought (CoT) compresses continuous
Other Signals
π΄ [AMA] Got laid off 3 weeks ago. Instead of updating my resume I went down a rabbit hole. Here's what I found β score 94
Sources: reddit/r/AIAgents
I've been a software engineer for a few years. Worked in bigtech, good salary, stable job, the whole thing. Then my position got cut and I had one of those forced moments of clarity that I think a lot of people in tech are having right now. I could update my LinkedIn, apply to 200 jobs, and land som
π΄ Multi-Stream LLMs: new paper on parallelizing/separating prompts, thinking, I/O β score 88
Sources: hackernews
π΄ 110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp β score 83
Sources: reddit/r/LocalLLaMA
Had been getting great MTP performance with llama.cpp on my RTX 4070 Super 12GB, until they actually merged the MTP PR. Then, performance tanked and was bare
π‘ Notable
Model Releases
π‘ LatitudeGames/Equinox-31B Β· Hugging Face β score 63
Sources: reddit/r/LocalLLaMA
new model from LatitudeGames - Gemma 31B finetune https://huggingface.co/LatitudeGames/Equinox-31B-GGUF Equinox draws its name from the balance between extremes. Trained on a bal
π‘ Launch HN: Runtime (YC P26) β Sandboxed coding agents for everyone on a team β score 62
Sources: hackernews
π‘ For everyone that uses OpenCode / Pi - Heres your promptprocessing fix! β score 50
Sources: reddit/r/LocalLLaMA
This PR deserves much more attention as it fixes the constant promptprocessing that happens when using llama.cpp with Opencode or pi. https://github.com/ggml-org/llama.cpp/pull/22929
π‘ AdventHealth advances whole-person care with OpenAI β score 50
Sources: lab_blog/OpenAI
AdventHealth is using ChatGPT for Healthcare to streamline workflows, reduce administrative burden, and return more time to patient care.
π‘ Weβre launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks β score 50
Sources: lab_blog/DeepMind
Omitted 3 additional model releases items from the main section; see raw data and source-specific sections below.
Developer Tools
π‘ Novel Problems in VLA [R] β score 56
Sources: reddit/r/MachineLearning
I'm currently doing a research internship and my supervisor is constantly pushing me to have a novel idea, I've read about 15-20 papers about VLA and I think that most of the things are saturated, I thought about an equivariant VLA based on equivariant CNN which was published in 2016 and successfull
π‘ teng-lin/notebooklm-py β Unofficial Python API and agentic skill for Google NotebookLM. Full programmatic access to NotebookLM's featuresβincluding capabilities the web UI doesn't exposeβvia Python, CLI, and AI agents like Claude Code, Codex, and OpenClaw. β score 50
Sources: github_trending
Unofficial Python API and agentic skill for Google NotebookLM. Full programmatic access to NotebookLM's featuresβincluding capabilities the web UI doesn't exposeβvia Python, CLI, and AI agents like Claude Code, Codex, and OpenClaw.
π‘ In theory, if I have $20k-ish to spend on hardware what would actually get me closest to local coding agent that would allow me to go totally off the social grid? β score 43
Sources: reddit/r/LocalLLaMA
Let's say I'm in the market to buy a studio or RTX 6000's. At what point am I off the grid with a local coding agent? Probably a model question too.
Research Papers
π‘ Q-ARVD: Quantizing Autoregressive Video Diffusion Models β score 65
Sources: huggingface
Autoregressive video diffusion models (ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical
π‘ GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation β score 55
Sources: huggingface
Open-ended image generation is no longer a simple prompt-to-image problem. High-quality generation often requires an agent to combine a model's internal generative ability with external resources. As requests become more diverse and demanding, we aim to develop a general image-generation agent that
π‘ More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts β score 42
Sources: huggingface Β· arxiv/cs.CL
Detecting Schwartz values in political text is difficult because implicit cues often depend on surrounding arguments and fine-grained distinctions between neighboring values. We study when context and explicit moral knowledge help sentence-level value detection. Using the ValuesML/Touch{Γ©} ValueEval
Other Signals
π‘ Heretic has been served a legal notice by Meta, Inc. β score 67
Sources: reddit/r/LocalLLaMA
To Whomsoever it May Concern, The individual behind the Heretic Free Software Project (henceforth called "Heretic", notwithstanding unrelated entities of the same name) has been served a notice by a legal services provider representing Meta Platforms, Inc. (henceforth called "Meta"), via the digital
π‘ We're Thursday and no one claimed AGI yet this week! β score 57
Sources: reddit/r/LocalLLaMA
U guys okay?
π’ Incremental
Model Releases
π’ Which agent should I use for coding and notetaking? β score 38
Sources: reddit/r/AIAgents
Hey everyone, I'm a software developer with a strong interest in psychology and photography. Iβm currently subscribed to ChatGPT mainly because I use it for coding with agents, but I keep running into the limits of the $20 plan. Because of that, Iβm considering other options, including GitHub Copilo
π’ New Release of ROCm based MLX LLM Engine - lemon-mlx-engine β score 23
Sources: reddit/r/LocalLLaMA
Hey everyone lemon-mlx-engine just got done integrating TheRock / ROCm 7.13 into the lemon-mlx-engine which means you get to try the latest ROCm on your local hardware with the MLX engine! This also includes various bug fixes and kernel fixes we have been seeing in Qwen3, 3.5 and 3.6 MoE and dense.
π’ Low-level coding dataset β score 10
Sources: reddit/r/LocalLLaMA
Hi all, I've recently been thinking about putting together a community sourced coding dataset for finetuning models, with a heavy focus on cpp and systems programming. My goal is to eventually have a model (say a finetune of Qwen3.6-27b) that is good at stuff like memory ownership, thread safety, op
π’ ztok β a fast multithreaded tokenizer in Zig that loads tiktoken / HF / SentencePiece and is 2β5Γ faster β score 0
Sources: reddit/r/LocalLLaMA
I built ztok, a tokenizer library focused on being fast and format-agnostic for local pipelines. - Loads what you already have β .tiktoken, HF tokenizer.json, SentencePiece .model, TokenMonster, Mistral Tekken. Auto-detected. - Bit-identical to tiktoken / HF / SentencePiece on the equivalence gate
Developer Tools
π’ If you've built an AI agent or chatbot - how do you know what users actually want from it? β score 38
Sources: reddit/r/AIAgents
Real question for anyone running an agent or chat product. When users just talk to your agent in natural language, you lose visibility into what they actually asked for, whether they got it, and what they kept wanting that your agent couldn't do. And when it quietly fails someone, there's no error
π’ A customer asked why our agent believed something. We had no answer. β score 38
Sources: reddit/r/AIAgents
Prompt was clean. Model was fine. We went digging. No timestamp on the write. No source attached. No way to tell if it came from a user input, a tool call, a summarization pass, or something that got corrupted three updates ago. The memory layer had absorbed it at some point and that was the end of
π’ Inline prompt-injection guards need to be fast enough for the agent hot path β score 38
Sources: reddit/r/AIAgents
I wrote up a benchmark note for Armorer Guard here: https://armorerlabs.com/blog/armorer-guard-inline-prompt-injection-defense The practical question I am trying to answer is not just "can we detect prompt injection?" It is: can a guard sit directly before an agent turns context into action without
π’ Agent reliability is killing the one genius narrative β score 38
Sources: reddit/r/AIAgents
The line from Yao Shunyu's interview that stuck with me was not the spicy "AI does not need brains" bit. It was the claim that the individual hero era in AI is basically over. That sounds harsh, but it matches what the industry looks like now. Frontier model progress is less about one genius idea an
π’ google-labs-code/stitch-skills β A library of Agent Skills designed to work with the Stitch MCP server. Each skill follows the Agent Skills open standard, for compatibility with coding agents such as Antigravity, Gemini CLI, Claude Code, Cursor. β score 31
Sources: github_trending
A library of Agent Skills designed to work with the Stitch MCP server. Each skill follows the Agent Skills open standard, for compatibility with coding agents such as Antigravity, Gemini CLI, Claude Code, Cursor.
Omitted 5 additional developer tools items from the main section; see raw data and source-specific sections below.
Business & Funding
π’ One thing that's been bothering me lately: benchmark performance often tells me almost nothing about whether a workflow will survive production usage.[D] β score 31
Sources: reddit/r/MachineLearning
I've seen systems score well internally and then immediately fail under: * ambiguous user intent * messy real-world context * contradictory instructions * long-running sessions Feels like evaluation still heavily rewards clean-task optimization instead of behavioral robustness. What are people using
Enterprise Adoption
π’ Nobody talks about what AI memory looks like after six months in production. β score 38
Sources: reddit/r/AIAgents
Old preferences keep winning retrieval, sarcastic comments get stored as literal truth, and summaries outlive the facts that made them true. You're not running a memory system at that point, you're babysitting one. Your AI context should not be a black box. It should be configurable, correctable, an
Research Papers
π’ OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding β score 15
Sources: huggingface
Omni-proactive streaming video understanding, i.e., autonomously deciding when to speak and what to say from continuous audio-visual streams, is an emerging capability of omni-modal large language models. Existing benchmarks fall short in three key aspects: they rely primarily on visual signals, ado
Other Signals
π’ Can liveness detection models generalise to synthetic media generation techniques they were never trained on? [D] β score 39
Sources: reddit/r/MachineLearning
Most liveness detection systems in production today were built around a threat model where the attacker is submitting a static image or a basic replay video. The generation quality of current synthetic media is categorically different from what those training datasets captured. The question I keep c
π’ Latest b9274 Addresses MTP VRAM leak β score 37
Sources: reddit/r/LocalLLaMA
B9274 I have been having an issue with MTP models unloading after a couple minutes of use. Can't figure out why. Anyways z I don't think this is relevant to that but I did observe the vram creep so hopefully this helps. > server : free d
π’ Anyone evaluated the difference between Qwen Code for the local qwen models vs another harness? CC, OC, LC, Aider etc.. β score 17
Sources: reddit/r/LocalLLaMA
For me, opencode doing fantastically but was wondering if qwen code would be more native and have better functionality, since idk which agentic harness they used to get their benchmark results
π’ Lisbon Machine Learning School (LxMLS 2026) [D] β score 6
Sources: reddit/r/MachineLearning
Hi did anyone apply it, or attended it previously? How was the experience? I got the acceptance but no scholarship, is it worth going self sponsored?
π Trending Repos
| Repo | Description | Stars Today | Language |
|---|---|---|---|
| teng-lin/notebooklm-py | Unofficial Python API and agentic skill for Google NotebookLM. Full programmatic access to NotebookLM's featuresβincluding capabilities the web UI doesn't exposeβvia Python, CLI, and AI agents like Claude Code, Codex, and OpenClaw. | 186 | python |
| google-labs-code/stitch-skills | A library of Agent Skills designed to work with the Stitch MCP server. Each skill follows the Agent Skills open standard, for compatibility with coding agents such as Antigravity, Gemini CLI, Claude Code, Cursor. | 69 | typescript |
| software-mansion/argent | An agentic toolkit to control, debug, and profile iOS and Android apps. Made by Software Mansion. | 67 | typescript |
| ryoppippi/ccusage | Analyze coding (agent) CLI token usage and costs from local data. | 58 | rust |
| google/adk-samples | A collection of sample agents built with Agent Development Kit (ADK) | 33 | python |
π New Papers
| Title | Category | Hotness | Link |
|---|---|---|---|
| TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation | research_paper | 106 | Open |
| LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning | research_paper | 30 | Open |
| Q-ARVD: Quantizing Autoregressive Video Diffusion Models | research_paper | 14 | Open |
| GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation | research_paper | 10 | Open |
| Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration | cs.AI | 0 | Open |
| OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind | cs.AI | 0 | Open |
| AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows | cs.AI | 0 | Open |
| High Quality Embeddings for Horn Logic Reasoning | cs.AI | 0 | Open |
| Open-World Evaluations for Measuring Frontier AI Capabilities | cs.AI | 0 | Open |
| Personality Engineering with AI Agents: A New Methodology for Negotiation Research | cs.AI | 0 | Open |
| From Automated to Autonomous: Hierarchical Agent-native Network Architecture (HANA) | cs.AI | 0 | Open |
| COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space | cs.AI | 0 | Open |
| Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines | cs.AI | 0 | Open |
| Declarative Data Services: Structured Agentic Discovery for Composing Data Systems | cs.AI | 0 | Open |
| VBFDD-Agent for Electric Vehicle Battery Fault Detection and Diagnosis: Descriptive Text Modeling of Battery Digital Signals | cs.AI | 0 | Open |
π’ Lab Blog Posts
- OpenAI: AdventHealth advances whole-person care with OpenAI
- DeepMind: Weβre launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks
π¦ Twitter/X Highlights
| Account | Tweet Summary |
|---|---|
| OpenAI | Highlights from todayβs Codex Thursday launches: 1οΈβ£ Codex can now securely use apps on your Mac from your phone, even when your Mac is locked and the screen is off. http://developers.openai.com/codex/app/computer-use#locked-use Post |
| xai | You can now use your @grok or X Premium subscription in @opencode. Use the model powering Grok Build for high speed and codebase intelligence. https://x.ai/news/grok-opencode Post |
| simonw | I released the first alpha of Datasette Agent - a conversational AI assistant for Datasette that can answer questions about data in SQLite databases, and can be extended with plugins to add extra tools and features Here's a demo Post |
Repeated From Recent Briefings
- colbymchenry/codegraph β Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, and OpenCode β fewer tokens, fewer tool calls, 100% local - first seen 2026-05-09
- Imbad0202/academic-research-skills β Academic Research Skills for Claude Code: research β write β review β revise β finalize - first seen 2026-05-13
- OpenAI claims a general-purpose reasoning model found a counterexample to Erdos's unit-distance bound [D] - first seen 2026-05-21
- NousResearch/hermes-agent β The agent that grows with you - first seen 2026-05-11
- tinyhumansai/openhuman β Your Personal AI super intelligence. Private, Simple and extremely powerful. - first seen 2026-05-11
- rohitg00/ai-engineering-from-scratch β Learn it. Build it. Ship it for others. - first seen 2026-05-21
- Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps - first seen 2026-05-19
- anthropics/claude-plugins-official β Official, Anthropic-managed directory of high quality Claude Code Plugins. - first seen 2026-05-09
- Lum1104/Understand-Anything β Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. - first seen 2026-05-21
- HKUDS/CLI-Anything β "CLI-Anything: Making ALL Software Agent-Native" -- CLI-Hub:https://clianything.cc/ - first seen 2026-05-17
- ... plus 580 more repeated items in processed data