🔴 High Significance
Model Releases
🔴 Apple reveals new AI architecture built around Google Gemini models — score 90
Sources: hackernews
🔴 Me: Arguing with an AI bot who just posted something on this sub about Llama 3.1. — score 77
Sources: reddit/r/LocalLLaMA
For real tho, these bots need to turn on their web search functions and quit living in the past. It’s bad enough we gotta deal with all the “Qwen3.6 27b helped me quit drinking and brought my dog back from the dead” posts. Sheesh /s
🔴 Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax — score 70
Sources: reddit/r/LocalLLaMA
Hey fellow Llamas, your time is precious, so I'll keep it short (while trying to explain everything lol). TL;DR: * 33-35B MoE on a 16 GB GPU. Qwen3.6 35B-A3B: 13.3 GiB (was ~20.5). Laguna XS.2 33B-A3B: 14.6 GiB (was 18.8). Both measured on an RTX 3090, both under 16 GiB. * **Only the active
Developer Tools
🔴 Anyone running browser-using agents at any kind of scale? What's your infrastructure looking like? — score 94
Sources: reddit/r/AIAgents
We have a research agent that browses the web to gather info (think: shopping comparison, lead enrichment, competitive intel). Works fine for low concurrency. Falls apart at 20+ concurrent browser sessions. We've tried local Playwright pool, Browserbase, and a few self-hosted setups. None of them ar
🔴 My company is having me vibecode an Argus replacement — score 81
Sources: reddit/r/AIAgents
AI has given a lot of non-technical people delusional amounts of self-confidence. The company I work at is big into AI and everyone is vibecoding things left and right. They are having me build an Argus (the real estate valuation suite) replacement and I have been given a deadline of 2 weeks to repl
🔴 google/skills — Agent Skills for Google products and technologies — score 77
Sources: github_trending
Agent Skills for Google products and technologies
Infrastructure & Compute
🔴 Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server — score 97
Sources: reddit/r/LocalLLaMA
Just saw Xiaomi MiMo announce MiMo-V2.5-Pro UltraSpeed, claiming they broke the 1,000 tokens/sec output barrier on a 1 trillion parameter MoE model. According to them, they’re doing it on a single standard 8-GPU node, not custom wafer-scale hardware like Cerebras and not SRAM-heavy hardw
Research Papers
🔴 SwiftVR: Real-Time One-Step Generative Video Restoration — score 80
Sources: huggingface
Real-time video restoration (VR) for live streams requires high-resolution outputs under strict per-frame latency constraints. Existing one-step diffusion-based VR models remain difficult to deploy on consumer-grade GPUs due to two main bottlenecks: quadratic spatial attention at high resolutions an
Other Signals
🔴 STOP racist posts about Chinese researchers [D] — score 94
Sources: reddit/r/MachineLearning
Edit: the original post targeting Chinese researchers is removed by the mods. Sorry for any confusion. Yes, I'm calling it out. It IS racism. As an active member of r/MachineLearning and a researcher who is ethnic Chinese, I am DISGUSTED by unfounded accusations against the group of researchers who
🔴 When every other post is an AI generated benchmark report, a question about the best model, or a slop-coded application or engine that pretends to be groundbreaking — score 90
Sources: reddit/r/LocalLLaMA
🔴 Gemma 4 Chat Template now has preserve thinking — score 83
Sources: reddit/r/LocalLLaMA
🔴 Should ArXiv backtrack endorsement? [D] — score 81
Sources: reddit/r/MachineLearning
ArXiv has an endorsement system for a reason. I would only offer endorsement to whom I have direct academic collaboration or mentorship with, since I'm putting my own academic reputation on the stake. This is also the standard of almost any serious academic researcher I am aware of. Now ArXiv is mak
🔴 xAI is looking more like a datacentre REIT than a frontier lab — score 70
Sources: hackernews
🟡 Notable
Model Releases
🟡 Quick note on the QAT of recent — score 57
Sources: reddit/r/LocalLLaMA
tldr: Googles quant is broken, use unsloth UD Q4_K_XL for now This might be low quality post, but oh well, we ball llama-quantize will quant the token embed to q6k when Google really was supposed to use "--pure" but that’s only the first problem The llama-quantize quant function is hardcoded to -7 w
🟡 2X tk/s (from 19.4 -> 38.1 tk/s on 1 x MI50) Playing with a hypothesis like speculative decoding.. but instead of an additional side model, exploiting that I can run multiple computations side-by-side AS IF I had Qwen3.6-27B loaded twice in memory - small quants don't use all the available compute. — score 50
Sources: reddit/r/LocalLLaMA
MODS: if you wanna remove for slop, that's cool - once I have something like a llama.cpp patch I'll repost something people can use. I'll write a full article on my Medium account with how it works. Just got excited and wanted to share. *** Forgive the claude summary, in the readme, but the base
🟡 Be very FR - would someone (or you) pay a virtual AI assitant? should I build this? It can read emails, reschedule your whole week, manage your calls, and improve as it is used more. — score 50
Sources: reddit/r/AIAgents
Did not find a pinned weekly post for feedback, but do let me know if this post needs to be removed. I've been experimenting with many many AI agents since last year, and while i love my claude subscription, it exhausts out pretty quick. So i started tinkeding with self hosted solutions and I now
Developer Tools
🟡 My team's AI usage got so expensive they quietly rolled back the mandate — score 69
Sources: reddit/r/AIAgents
Our engineering leadership went all in on AI about three months ago. Every ticket, every PR review, every design doc had to go through their shiny new enterprise copilot setup. They even started tracking adoption metrics in standups. So we used it. For everything. Pasting entire codebases into conte
Enhanced ChatGPT Clone: Features Agents, MCP, Skills, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User A
🟡 Apple Core AI Framework — score 50
Sources: hackernews
🟡 Confidential submission of draft S-1 to the SEC — score 50
Sources: lab_blog/OpenAI
OpenAI confirms a confidential S-1 submission to the SEC and has not yet determined timing for further action.
🟡 @AnthropicAI: New Science Blog: Why has AI advanced faster in coding than in biology? To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffi — score 50
Sources: twitter_rss
New Science Blog: Why has AI advanced faster in coding than in biology? To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic. How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology
Omitted 5 additional developer tools items from the main section; see raw data and source-specific sections below.
Research Papers
🟡 Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short — score 62
Sources: huggingface · arxiv/cs.AI
Reinforcement learning with verifiable rewards (RLVR) has become a leading paradigm for improving the reasoning ability of large language models through outcome-based supervision. However, verifiable rewards frequently become uninformative at the group level: when all sampled traces of a given promp
🟡 Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops — score 48
Sources: huggingface · arxiv/cs.AI
Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 (16%) hackable by frontier models given only the task description. This corrupts both le
🟡 Chiaroscuro Attention: Spending Compute in the Dark — score 48
Sources: huggingface · arxiv/cs.AI
Standard transformers apply self-attention uniformly at every layer and token, regardless of whether the input requires dynamic cross-token interaction. We propose CHIAR-Former (Chiaroscuro Attention), a 4-layer hybrid transformer that routes each token to one of three operators - DCT spectral mixin
🟡 Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text — score 48
Sources: huggingface · arxiv/cs.AI
Chain-of-Thought (CoT) improves the performance of Large Language Models (LLMs) and has been extended to Multimodal Large Language Models (MLLMs). More recent work further moves from text-based multimodal reasoning toward interleaved-modal reasoning, where intermediate steps can incorporate both tex
🟡 PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment — score 48
Sources: huggingface · arxiv/cs.CL
Long-horizon agentic tasks pose a fundamental credit assignment challenge for outcome-base reinforcement learning: trajectory-level rewards verify final correctness but provide limited guidance on which intermediate reasoning steps or tool interactions contribute to the outcome. The difficulty is es
Other Signals
🟡 Was BitNet a dead end? What happened to ternary LLMs? — score 63
Sources: reddit/r/LocalLLaMA
They seemed so promising at one point but the biggest ternary model is still 2B. What happened? Why aren't the frontier open weights AI labs attempting to use them?
🟢 Incremental
Model Releases
🟢 ggml-webgpu: Improve prefill speeds for k-quants + refactor matmul for Q4/Q5/Q8 and k-quants by yomaytk · Pull Request #24225 · ggml-org/llama.cpp — score 30
Sources: reddit/r/LocalLLaMA
This PR improves matmul performance for k-quants. The following table shows the improvement on the
pp512test in M2 pro. |quant|model|master (t/s)|PR (t/s)|speedup| |:-|:-|:-|:-|:-| |Q2_K|qwen3 0.6B Q2_K - Med
🟢 Best agentic workflows for finance research, prospecting, and team productivity? — score 25
Sources: reddit/r/AIAgents
I’m currently an intern at a small investment bank / securities firm, mostly around sales and trading, but the internship is pretty broad. I work with traders, researchers, market makers, and generally help wherever I can. One thing I’ve noticed is that there does not seem to be anyone in the office
Developer Tools
🟢 Université Paris Saclay or TU Delft for Applied Mathematics Masters [R] — score 38
Sources: reddit/r/MachineLearning
I've been admitted into both UPS and TUD for Applied Mathematics, and I wanted to hear some advice on which one would be better. For context, I'd like to work in some form of AI research, most likely within industry. At the moment, I'm most interested in privacy preserving machine learning or mechan
🟢 777genius/agent-teams-ai — You're the boss, agents are your team. They handle tasks on their own, message each other, and review each other's work. You just watch the kanban board and give high-level commands. Codex/Claude/OpenCode(200+ models, 75+ LLM providers, free models no auth). Build your AI company with multiple teams. — score 37
Sources: github_trending
You're the boss, agents are your team. They handle tasks on their own, message each other, and review each other's work. You just watch the kanban board and give high-level commands. Codex/Claude/OpenCode(200+ models, 75+ LLM providers, free models no auth). Build your AI company with multiple teams
🟢 xerrors/Yuxi — 结合知识库、知识图谱管理的 多租户 Agent Harness 平台。 An agent harness that integrates a LightRAG knowledge base and knowledge graphs. Build with LangChain + Vue + FastAPI, support DeepAgents、MinerU PDF、Neo4j 、MCP. — score 28
Sources: github_trending
结合知识库、知识图谱管理的 多租户 Agent Harness 平台。 An agent harness that integrates a LightRAG knowledge base and knowledge graphs. Build with LangChain + Vue + FastAPI, support DeepAgents、MinerU PDF、Neo4j 、MCP.
🟢 I want to create an agent that sends me an email every morning? — score 25
Sources: reddit/r/AIAgents
How do I do this? It will gather data and send me an email every morning, what’s the free way to do this? Nothing complex
🟢 marin-community/marin — Open-source framework for the research and development of foundation models. — score 21
Sources: github_trending
Open-source framework for the research and development of foundation models.
Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.
Research Papers
🟢 Text-to-Image Models Need Less from Text Encoders Than You Think — score 25
Sources: huggingface
Text-to-image models rely on text prompts as their primary interface to human intent. Prompts are encoded by a text encoder into embeddings that condition the image generation process. Beyond individual token meanings, text embeddings encode contextual information across the full prompt, such as com
Other Signals
🟢 Jetbrains Mellum 2: a really good and performant model — score 37
Sources: reddit/r/LocalLLaMA
Oh Hey Folks, I took the Mellum 2 model for a spin, so I wanted to share my impressions here. >Disclaimer: the tests presented here are not cientific nor have those nice names like perplexity,etc. These tests are somewhat more akin to what Im working in a daily basis or how useful a model is help
🟢 Screen recording instead of prompting to pass more context fast — score 30
Sources: reddit/r/AIAgents
I had a task where I wanted to: * connect to a vpn * ssh into a server port forward * then open a grafana dashboard, capture some stat * capture the detail in the right place in an excel and send a message on slack All the tasks that I want to automate required a lot of prompting with minute details
🟢 Microsoft's open source tools were hacked to steal passwords of AI developers — score 30
Sources: hackernews
🟢 Anyone seen benchmarks comparing Gemma 4 4-bit QAT vs. 8-bit standard quants? — score 23
Sources: reddit/r/LocalLLaMA
I'm trying to find out if anyone has done any benchmarking comparing the Gemma 4 4-bit QAT models (via Unsloth) against standard 8-bit non-QAT quants. I know QAT is supposed to retain a ton of accuracy compared to the baseline BF16, but I'm curious how a 4-bit QAT model actually fares against a trad
🟢 Gemma 4 26B A4B IT QAT Comparison — score 17
Sources: reddit/r/LocalLLaMA
Hopefully this isn't too low effort of a post. I just finished the benchmarks and I figured I'd post them online because they certainly were insightful for me. I did not use any AI other than asking Gemini 3.1 Pro if it was statistically significant because I was too tired to do inferential statisti
Omitted 3 additional other signals items from the main section; see raw data and source-specific sections below.
📈 Trending Repos
| Repo | Description | Stars Today | Language |
|---|---|---|---|
| google/skills | Agent Skills for Google products and technologies | 461 | python |
| danny-avila/LibreChat | Enhanced ChatGPT Clone: Features Agents, MCP, Skills, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User Auth, Presets, open-source for self-hosting. Active | 141 | typescript |
| langchain-ai/deepagents | The batteries-included agent harness. | 96 | python |
| alistaitsacle/free-llm-api-keys | Free LLM API keys for GPT-5.5, Claude, DeepSeek, Gemini, Grok — copy, paste, use. Updated 3-5x daily. No credit card needed. | 76 | python |
| magenta/magenta-realtime | Magenta RealTime 2: An Open-Weights Live Music Model | 74 | python |
| 777genius/agent-teams-ai | You're the boss, agents are your team. They handle tasks on their own, message each other, and review each other's work. You just watch the kanban board and give high-level commands. Codex/Claude/OpenCode(200+ models, 75+ LLM providers, free models no auth). Build your AI company with multiple teams. | 60 | typescript |
| xerrors/Yuxi | 结合知识库、知识图谱管理的 多租户 Agent Harness 平台。 An agent harness that integrates a LightRAG knowledge base and knowledge graphs. Build with LangChain + Vue + FastAPI, support DeepAgents、MinerU PDF、Neo4j 、MCP. | 39 | python |
| marin-community/marin | Open-source framework for the research and development of foundation models. | 33 | python |
📄 New Papers
| Title | Category | Hotness | Link |
|---|---|---|---|
| SwiftVR: Real-Time One-Step Generative Video Restoration | research_paper | 9 | Open |
| Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short | research_paper | 5 | Open |
| PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow | cs.AI | 0 | Open |
| OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs | cs.AI | 0 | Open |
| Syll: Open-Source Personal Automation with Cross-Surface Execution | cs.AI | 0 | Open |
| A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline | cs.AI | 0 | Open |
| Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning | cs.AI | 0 | Open |
| Automatic Extraction of Structured Information from Brain MRI Reports Using an Open-Weight Large Language Model | cs.AI | 0 | Open |
| Some hypotheses on how chatbots work in problem-solving-driven conversations. Large Language Models as confirmation of the Innovation Illusion | cs.AI | 0 | Open |
| Land cover and flood type govern the detection limits of satellite-based flood mapping across diverse global flood events | cs.AI | 0 | Open |
| Reconstructing and forecasting disease trajectories of patients with Alzheimer's disease using routine data in resource-constrained settings | cs.AI | 0 | Open |
| Improving Multimodal Reasoning via Worst Dimension Optimization | cs.AI | 0 | Open |
| Beyond Goodhart's Law: A Dynamic Benchmark for Evaluating Compliance in Multi-Agent Systems | cs.AI | 0 | Open |
| Where Instruction Hierarchy Breaks: Diagnosing and Repairing Failures in Reasoning Language Models | cs.AI | 0 | Open |
| Scaling Participation in Modular AI Systems | cs.AI | 0 | Open |
🏢 Lab Blog Posts
🐦 Twitter/X Highlights
| Account | Tweet Summary |
|---|---|
| AnthropicAI | New Science Blog: Why has AI advanced faster in coding than in biology? To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic. How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology Post |
| swyx | It's finally out!!! @METR_Evals found that more than half of SWEBench results is unmergeable slop. FrontierCode represents over 1000+ hours of maintainer validated software engineering work most frontier models cannot yet solve, much less solve with high quality. Cog had IOI Gold medalists and top c Post |
Repeated From Recent Briefings
- mvanhorn/last30days-skill — AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary - first seen 2026-06-05
- LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents - first seen 2026-06-05
- yikart/AiToEarn — Let's use AI to Earn! - first seen 2026-05-11
- aaif-goose/goose — an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM - first seen 2026-05-07
- Panniantong/Agent-Reach — Give your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees. - first seen 2026-06-06
- TauricResearch/TradingAgents — TradingAgents: Multi-Agents LLM Financial Trading Framework - first seen 2026-05-02
- Crosstalk-Solutions/project-nomad — Project N.O.M.A.D, is a self-contained, offline survival computer packed with critical tools, knowledge, and AI to keep you informed and empowered—anytime, anywhere. - first seen 2026-05-08
- SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating - first seen 2026-06-08
- Imbad0202/academic-research-skills — Academic Research Skills for Claude Code: research → write → review → revise → finalize - first seen 2026-05-13
- heygen-com/hyperframes — Write HTML. Render video. Built for agents. - first seen 2026-05-10
- ... plus 194 more repeated items in processed data