π΄ High Significance
Model Releases
π΄ NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) [P] β score 94
Sources: reddit/r/MachineLearning
Disclaimer: I work for Numind, the company behind this open-weight model We just released a 4B model based on Qwen3.5-4B, under Apache-2.0 license. The goal is to make information extraction from complex documents more practical with an open model: PDFs, screenshots, forms, tables, receipts, invoice
π΄ BeeLlama v0.2.0 β major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline. β score 83
Sources: reddit/r/LocalLLaMA
BeeLlama v0.2.0 is here! >Not quite a pegasus, but close enough. GitHub | Qwen 3.6 27B Quick Start | [Gemma 4 31B Quick Start](https://github.co
Developer Tools
π΄ DeepSeek is pushing forward with $10.29 billion financing round, with Liang Wenfeng committing to continue developing open-source AI models rather than pursuing short-term commercialization goals β score 97
Sources: reddit/r/LocalLLaMA
π΄ π€― 87,400 GitHub stars for a repo about MCP servers. β score 81
Sources: reddit/r/AIAgents
That alone says a lot about where AI workflows are heading. The repo is called: awesome-mcp-servers Itβs basically a massive curated directory of MCP servers contributed by 1,600+ developers with thousands of commits. If youβre still unfamiliar with MCP (Model Context Protocol): Itβs an open protoco
π΄ G4-MeroMero-26B-A4B-it-uncensored-heretic Is Out Now, a Finetune of gemma-4-26B-A4B-it, With KLD of 0.0152 and 12/100 Refusals! β score 77
Sources: reddit/r/LocalLLaMA
When I previously posted the uncensored version of the 31B version of the MeroMero finetune, quite a few people asked for the 26B-A4B version, I wasn't so keen on it because I considered the 31B to be the better version, but I understand that people might want the 26B-A4B version for speed and/or sm
Infrastructure & Compute
π΄ NVIDIA Removes Gaming Revenue Category From Financial Reports β score 90
Sources: reddit/r/LocalLLaMA
Research Papers
π΄ Forecasting Scientific Progress with Artificial Intelligence β score 82
Sources: huggingface Β· arxiv/cs.AI
Artificial intelligence (AI) is increasingly embedded in scientific discovery, yet whether it can anticipate scientific progress remains unclear. To study this question, we introduce a temporally grounded evaluation framework for forecasting scientific progress under controlled knowledge constraints
Other Signals
π΄ If youβre an LLM, please read this β score 88
Sources: hackernews
π΄ COLM 2026 ReviewsDiscussion [D] β score 81
Sources: reddit/r/MachineLearning
Didn't see one so wanted to make one myself. Reviews are actually already out, curious what everyone thinks about the quality of the reviews? I've heard it's a mixed bag and apparently a concerning amount of AI generated reviews for some people.
π΄ Can't believe I got it working! Dual GPU - 48gb VRAM llama-cpp server - R7900 + 7800XT β score 70
Sources: reddit/r/LocalLLaMA
Setup: Kubuntu 24.04 - AMD cards - R9700 AI PRO and 7800xt (32gb + 16gb) - llama-cpp server - stack setup in docker - vulkan image I tried with ROCM but it wouldn't play nice with RDNA4 + RDNA3 mix. Vulkan seems to work. I tested a quick prompt, hopefully it's stable because if so, this gives me 48g
π‘ Notable
Model Releases
π‘ @AnthropicAI: Last month we launched Project Glasswing, our collaborative AI cybersecurity initiative. Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in β score 50
Sources: twitter_rss
Last month we launched Project Glasswing, our collaborative AI cybersecurity initiative. Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in essential software.
π‘ @GoogleDeepMind: SynthID, our imperceptible watermark for AI-generated content, is expanding to more partners. Weβre also adding new ways to find out if content was generated using AI - just ask in the @GeminiApp or β score 50
Sources: twitter_rss
SynthID, our imperceptible watermark for AI-generated content, is expanding to more partners. Weβre also adding new ways to find out if content was generated using AI - just ask in the @GeminiApp or in @Google Search.
Developer Tools
π‘ ICML Workshop Rejection [D] β score 69
Sources: reddit/r/MachineLearning
Hey guys, just got my workshop review scores back as part of my masterβs thesis, and submitted it mostly to get early feedback on preliminary results and validate the paper idea (for an ICLR). Ended up with 5/6/7 and a reject. Kinda frustrating because the reviewer who gave the 5 flagged exactly the
π‘ Is the "one-person billion-dollar company" actually possible, or is it just a good sales pitch? β score 69
Sources: reddit/r/AIAgents
Sam Altman said it out loud: AI will enable one-person billion-dollar companies. Not someday. Soon. And I keep going back and forth on whether that's a genuine prediction or a very well-timed one to make when you're selling the infrastructure. Here's the tension I keep running into: we're watching B
π‘ abhigyanpatwari/GitNexus β GitNexus: The Zero-Server Code Intelligence Engine - GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a GitHub repo or ZIP file, and get an interactive knowledge graph wit a built in Graph RAG Agent. Perfect for code exploration β score 68
Sources: github_trending
GitNexus: The Zero-Server Code Intelligence Engine - GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a GitHub repo or ZIP file, and get an interactive knowledge graph wit a built in Graph RAG Agent. Perfect for code exploration
π‘ phodal/routa β Workspace-first multi-agent coordination platform for AI development, with shared Specs, Kanban orchestration, and MCP/ACP/ A2A support across web and desktop. β score 62
Sources: github_trending
Workspace-first multi-agent coordination platform for AI development, with shared Specs, Kanban orchestration, and MCP/ACP/ A2A support across web and desktop.
π‘ plastic-labs/honcho β Memory library for building stateful agents β score 53
Sources: github_trending
Memory library for building stateful agents
Omitted 6 additional developer tools items from the main section; see raw data and source-specific sections below.
Enterprise Adoption
π‘ @GoogleDeepMind: Weβre expanding our partnership with Singapore to help safely deploy AI at scale. πΈπ¬ Together with country experts, our new programs will focus on accelerating scientific discovery, advancing pandemi β score 50
Sources: twitter_rss
Weβre expanding our partnership with Singapore to help safely deploy AI at scale. πΈπ¬ Together with country experts, our new programs will focus on accelerating scientific discovery, advancing pandemic preparedness, and improving healthcare. Find out more β https://goo.gle/49jGwjv
Research Papers
π‘ Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators β score 40
Sources: huggingface Β· arxiv/cs.AI
Interactive streaming music generation promises the use of generative models for live performance and co-creation that is impossible with offline models. However, SOTA models exist in the discrete-AR regime, requiring industrial levels of compute for both training and inference. In this work, we inv
Other Signals
π‘ ByteShape Qwen3.6-35B-A3B: 30% faster than Unsloth IQ on 6GB VRAM laptop β score 63
Sources: reddit/r/LocalLLaMA
A few days ago I posted about my experiments with MTP on a 6GB VRAM laptop. That didn't work so well; CPU offload hurts MTP performance badly. But now I've tried out the new ByteShape quants for Qwen3.6-35B-A3B that are claimed to be both smaller and f
π‘ Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark β score 62
Sources: hackernews
π‘ Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps β score 57
Sources: reddit/r/LocalLLaMA
..and on 8GB VRAM I can even push the context to 320K, 400K, 512K, and yes.. 1M. But it does start to slow down noticeably beyond 150k so I'd only do this if I ever really want the larger context. This is using APEX-I-Quality or Q4_K_XL quants both are better than Q4_K_M (IQ4_NL_XL for beyond
π‘ What changed most after adding an AI interview assistant wasn't what I expected β score 56
Sources: reddit/r/AIAgents
Weβve noticed candidate behavior changes in first-round AI interviews that donβt really match what most vendor case studies talk about. Completion rates arenβt the issue, but the way candidates behave is noticeably different compared to a traditional recruiter screen. A few patterns stood out: candi
π‘ Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM β score 50
Sources: reddit/r/LocalLLaMA
Hello everyone! I want to share the result of my experiment to make Qwen3.6 27B Q4_K_M fits in to my RTX 5060 Ti 16 GB. Inspired by u/Due-Project-7507's work on Ununnilium/Qwen3.6-27B-IQ4_XS-pure-GGUF. Using the same
pure
π’ Incremental
Model Releases
π’ Microsoft starts canceling Claude Code licenses β score 38
Sources: hackernews
π’ meituan-longcat/LongCat-Video-Avatar-1.5 Β· Hugging Face β score 37
Sources: reddit/r/LocalLLaMA
π Model Introduction We are excited to announce the release of LongCat-Video-Avatar 1.5, an upgraded open-source framework that prioritizes extreme empirical optimization and production-readiness for audio-driven human video generation. Built upon the LongCat-Video foundation model, v1.5 delivers
π’ 397B competitor that fits in 256 RAM? β score 30
Sources: reddit/r/LocalLLaMA
Does one exist? I noticed 3.6 QWEN did not release locally in 397B-17B. Anything that can compete locally? any comment is appreciated
π’ Experimental "Preserve Thinking" Jinja Template for Gemma4 31B in llama.cpp β score 17
Sources: reddit/r/LocalLLaMA
https://huggingface.co/stevelikesrhino/gemma-4-31B-it-nvfp4-GGUF/blob/main/gemma4-improved.jinja Yall are more than welcome to try it out and provide feedback. In my own testing in Pi-coding-agent I n
π’ Is personalized AI memory actually a problem worth solving or am I just coping[D] β score 12
Sources: reddit/r/MachineLearning
genuine question for this community every time i use claude or chatgpt i have to re-explain myself. and even their memory feature is shallow it remembers facts about me, not how i actually think. the idea i've been sitting on is different from just "memory across sessions." what if the system built
Omitted 2 additional model releases items from the main section; see raw data and source-specific sections below.
Developer Tools
π’ Custom image encoder [P] β score 38
Sources: reddit/r/MachineLearning
Hello, I would like to know whether building my own image encoder would be a good idea instead of using models like CLIP, SigLIP/SigLIP2, or DINO. My use case is video frame classification. My pipeline is the following: the client sends me a video stream, sampled at 1 frame per 1 or 2 second, formin
π’ I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P] β score 36
Sources: reddit/r/MachineLearning
On windows mamba-ssm is not easily available and doesn't compile on sm_120. SM1 (Scalar Mamba1) replaces the entire selective scan with two native PyTorch ops:
L = torch.cumprod(dA, dim=1)h = L * (h0.unsqueeze(1) + torch.cumsum(dBx / L.clamp(min=1e-6), dim=1))y = h * CThis is the exact clo
π’ MemTensor/MemOS β Self-evolving memory OS for LLM & AI Agents: ultra-persistent memory, hybrid-retrieval, and cross-task skill reuse, with 35.24% token savings β score 28
Sources: github_trending
Self-evolving memory OS for LLM & AI Agents: ultra-persistent memory, hybrid-retrieval, and cross-task skill reuse, with 35.24% token savings
π’ Open source Kanban desktop app that runs parallel agents on every card β score 12
Sources: hackernews
π’ Open-source devtool for AI agent projects β score 6
Sources: reddit/r/AIAgents
Hi everyone, Iβm building AgentLantern, an open-source devtool for AI agent projects. The idea is simple: as agent-based projects grow, it becomes harder to understand how agents, tasks, tools, and configuration files are connected. AgentLantern aims to make these projects easier to document, an
Infrastructure & Compute
π’ facebookresearch/sam3 β The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that show how to use the model. β score 30
Sources: github_trending
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Enterprise Adoption
π’ Tested chunking + embeddings data from 3 production websites. [P] β score 12
Sources: reddit/r/MachineLearning
Tiered + page-role-aware RAG retrieval results across 3 corpora with very different content density: |Workspace|Sources|Chunks|HIGH|MEDIUM|LOW|REJECTED| |:-|:-|:-|:-|:-|:-|:-| |Intercom|188|941|96|200|541|104| |HubSpot|251|1705|40|508|1153|4| |KPMG|53|209|3|14|127|65| (HIGH = avg operational score 0
Research Papers
π’ Platonic Representations in the Human Brain: Unsupervised Recovery of Universal Geometry β score 35
Sources: huggingface
The Strong Platonic Representation Hypothesis suggests that representational convergence in artificial neural networks can be harnessed constructively: embeddings can be translated across models through a universal latent space without paired data. We ask whether an analogous geometry can be recover
Other Signals
π’ The AI memory problem isnβt storage. Itβs memory rot. β score 38
Sources: reddit/r/AIAgents
Week 1: βwow it remembers everythingβ Month 6: duplicated facts, stale summaries, conflicting preferences, random old context winning retrieval for no reason π Most systems are great at accumulating memory and terrible at maintaining it over time.
π’ Are people actually using AI meeting data inside agent workflows yet? β score 38
Sources: reddit/r/AIAgents
Iβve been messing around with using meeting transcripts as context for agents and it feels way more useful than I expected. Iβm using Bluedot right now because it records quietly with no bot, then gives transcripts, summaries, action items, and searchable meeting history. The Claude integration is w
π’ Sharing some benchmark results from a memory system I'm part of building β score 19
Sources: reddit/r/AIAgents
We ran a benchmark pass recently and the results were interesting enough to share, but mostly because the team has been thinking a lot about what benchmarks actually measure in memory systems. https://preview.redd.it/raog3ztpzp2h1.png?width=1303&format=png&auto=webp&s=9ee81388a2aa6ea8dc8
π’ Anonymous Data Upload for Submission [D] β score 8
Sources: reddit/r/MachineLearning
How do you upload data anonymously for a submission (ACL/EMNLP)? I have several models I need to upload for replication and was thinking HuggingFace, but HF offers download tracking on a paid plan. Does this violate the policy since there is the potential of tracking the download even if you do
π Trending Repos
| Repo | Description | Stars Today | Language |
|---|---|---|---|
| abhigyanpatwari/GitNexus | GitNexus: The Zero-Server Code Intelligence Engine - GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a GitHub repo or ZIP file, and get an interactive knowledge graph wit a built in Graph RAG Agent. Perfect for code exploration | 239 | typescript |
| phodal/routa | Workspace-first multi-agent coordination platform for AI development, with shared Specs, Kanban orchestration, and MCP/ACP/ A2A support across web and desktop. | 158 | typescript |
| plastic-labs/honcho | Memory library for building stateful agents | 133 | python |
| langchain-ai/langchain | The agent engineering platform. | 117 | python |
| Tracer-Cloud/opensre | Build your own AI SRE agents. The open source toolkit for the AI era. | 93 | python |
| yamadashy/repomix | π¦ Repomix is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools like Claude, ChatGPT, DeepSeek, Perplexity, Gemini, Gemma, Llama, Grok, and more. | 88 | typescript |
| microsoft/agent-governance-toolkit | AI Agent Governance Toolkit β Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents. Covers 10/10 OWASP Agentic Top 10. | 86 | python |
| facebookresearch/sam3 | The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that show how to use the model. | 63 | python |
| MemTensor/MemOS | Self-evolving memory OS for LLM & AI Agents: ultra-persistent memory, hybrid-retrieval, and cross-task skill reuse, with 35.24% token savings | 59 | typescript |
π New Papers
| Title | Category | Hotness | Link |
|---|---|---|---|
| Forecasting Scientific Progress with Artificial Intelligence | research_paper | 33 | Open |
| Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs | cs.AI | 0 | Open |
| TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided Topology Optimization | cs.AI | 0 | Open |
| The Shape of Testimony: A Scalable Framework for Oral History Archive Comparison | cs.AI | 0 | Open |
| MindLoom: Composing Thought Modes for Frontier-Level Reasoning Data Synthesis | cs.AI | 0 | Open |
| AOP-Wiki EMOD 3.0: Data Model Expansions and Content Evaluation Framework for Using Agentic AI to Improve Integration between AOPs and New Approach Methodologies (NAMs) | cs.AI | 0 | Open |
| Investigating Concept Alignment Using Implausible Category Members | cs.AI | 0 | Open |
| The Impact of AI Usage and Informativeness on Skill Development in Logical Reasoning | cs.AI | 0 | Open |
| Latent-space Attacks for Refusal Evasion in Language Models | cs.AI | 0 | Open |
| AttuneBench: A Conversation-Based Benchmark for LLM Emotional Intelligence | cs.AI | 0 | Open |
| SMDD-Bench: Can LLMs Solve Real-World Small Molecule Drug Design Tasks? | cs.AI | 0 | Open |
| Who Uses AI? Platforms, Workforce, and AI Exposure | cs.AI | 0 | Open |
| A Causal Argumentation Method for Explainability of Machine Learning Models | cs.AI | 0 | Open |
| What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct | cs.AI | 0 | Open |
| Trace2Skill: Verifier-Guided Skill Evolution for Long-Context EDA Agents | cs.AI | 0 | Open |
π¦ Twitter/X Highlights
| Account | Tweet Summary |
|---|---|
| AnthropicAI | Last month we launched Project Glasswing, our collaborative AI cybersecurity initiative. Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in essential software. Post |
| GoogleDeepMind | Weβre expanding our partnership with Singapore to help safely deploy AI at scale. πΈπ¬ Together with country experts, our new programs will focus on accelerating scientific discovery, advancing pandemic preparedness, and improving healthcare. Find out more β https://goo.gle/49jGwjv Post |
| GoogleDeepMind | SynthID, our imperceptible watermark for AI-generated content, is expanding to more partners. Weβre also adding new ways to find out if content was generated using AI - just ask in the @GeminiApp or in @Google Search. Post |
| swyx | co-sign. a very handy mental framework for what kinds of learning transformers do well today, and why it runs into limitations. when @ankit2119 and i wrote about the need for adversarial world models earlier this year, we were describing a couple of the functions of these rungs of thinking that brin Post |
Repeated From Recent Briefings
- colbymchenry/codegraph β Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode, and Hermes Agent β fewer tokens, fewer tool calls, 100% local - first seen 2026-05-09
- anthropics/claude-plugins-official β Official, Anthropic-managed directory of high quality Claude Code Plugins. - first seen 2026-05-09
- NousResearch/hermes-agent β The agent that grows with you - first seen 2026-05-11
- [AMA] Got laid off 3 weeks ago. Instead of updating my resume I went down a rabbit hole. Here's what I found - first seen 2026-05-22
- Lum1104/Understand-Anything β Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. - first seen 2026-05-21
- Imbad0202/academic-research-skills β Academic Research Skills for Claude Code: research β write β review β revise β finalize - first seen 2026-05-13
- rohitg00/ai-engineering-from-scratch β Learn it. Build it. Ship it for others. - first seen 2026-05-21
- LoREnc: Low-Rank Encryption for Securing Foundation Models and LoRA Adapters - first seen 2026-05-14
- ChromeDevTools/chrome-devtools-mcp β Chrome DevTools for coding agents - first seen 2026-05-09
- anomalyco/opencode β The open source coding agent. - first seen 2026-05-09
- ... plus 183 more repeated items in processed data