🔴 High Significance
Model Releases
🔴 Claude Opus 4.8 — score 92
Sources: hackernews
Developer Tools
🔴 I spent three months researching AI phone control, and trust seems more important than features — score 94
Sources: reddit/r/AIAgents
Not long ago, I wanted to find an AI agent that could control a phone, so I went on Reddit to see what people were actually using these tools for. What struck me most was how specific and everyday the use cases were. A lot of the things people want AI to handle are repetitive phone tasks, but they a
🔴 Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild — score 89
Sources: reddit/r/LocalLLaMA
Been following the infrastructure side of AI more lately and stumbled on this from Zai. They upgraded the network architecture on a thousand-GPU cluster running GLM-5.1 coding inference from the standard ROFT setup to something they built called ZCube, developed with Tsinghua University and HarnetsA
🔴 run-llama/liteparse — A fast, helpful, and open-source document parser — score 87
Sources: github_trending
A fast, helpful, and open-source document parser
🔴 Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue — score 75
Sources: hackernews
🔴 How do companies protect proprietary prompts from contractors and consulting engineers? — score 72
Sources: reddit/r/AIAgents
Prompts are a core part of the IP for my client. We’re speeding up development by bringing in 2–3 external contract engineers, but we don’t want to fully expose the underlying prompts/workflows to them. Are there any tools, gateways, or architectures people are using to partially protect prompts fro
Business & Funding
🔴 A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P] — score 81
Sources: reddit/r/MachineLearning
Hello everyone. The new dataset is named MONET, is Apache 2.0 and available on HF: https://huggingface.co/datasets/jasperai/monet **MONET is open, Apache 2.0-licensed image–text dataset. It was built from 2.9 billion images and refined to 104.9 milli
Research Papers
🔴 UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering — score 95
Sources: huggingface
Activation-based control steers large language models (LLMs) by intervening on their internal representations during inference, and has emerged as an effective paradigm for controlling behaviors such as persona and style. However, existing methods often rely on fixed steering directions or task-spec
🔴 When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems — score 78
Sources: huggingface · arxiv/cs.AI
The design space of agentic AI inference spans two extremes: frontier large language models (LLMs), typically hosted in the cloud and offering strong performance across a wide range of tasks at substantially high cost, and more cost-efficient small language models (SLMs), which are amenable to on-de
🔴 RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains — score 72
Sources: huggingface · arxiv/cs.LG
Pointwise reward modeling offers critical signals for LLM post-training, yet struggles with absolute scoring in subjective, non-verifiable settings. Rubric-based methods address this by decomposing evaluation into explicit criteria, but existing approaches typically depend on frontier LLMs and suffe
Other Signals
🔴 I've just benchmarked myself: — score 96
Sources: reddit/r/LocalLLaMA
🔴 StepFun 3.7 Flash — score 82
Sources: reddit/r/LocalLLaMA
StepFun dropped Step 3.7 Flash, 196B total / 11B active MoE, runs locally on 128GB RAM It's a multimodal MoE (196B total params, only 11B active) with a built-in 1.8B ViT for vision. Benchmark highlights vs. other flash-tier models: - SWE-Bench Pro: 56.26% (beats DeepSeek V4 Flash at 55.6%, matches
🔴 HF models page now has a "Base only" toggle to filter out finetunes/quants/etc — score 75
Sources: reddit/r/LocalLLaMA
a feature that was requested a lot: https://huggingface.co/models?base_model_relation=base
🟡 Notable
Model Releases
🟡 @AnthropicAI: We've raised $65 billion in Series H funding at a $965 billion post-money valuation, led by @AltimeterCap, Dragoneer, @Greenoaks, and @sequoia. This investment will help us advance our research and e — score 50
Sources: twitter_rss
We've raised $65 billion in Series H funding at a $965 billion post-money valuation, led by @AltimeterCap, Dragoneer, @Greenoaks, and @sequoia. This investment will help us advance our research and expand our capacity to meet growing demand for Claude.
🟡 @xai: Grok Build 0.2.7 is now out, with /usage, /login, shared terminals across subagents, and improved image understanding See all updates at https://x.ai/build/changelog — score 50
Sources: twitter_rss
Grok Build 0.2.7 is now out, with /usage, /login, shared terminals across subagents, and improved image understanding See all updates at https://x.ai/build/changelog
🟡 @MistralAI: We're taking on the hardest problems in the real world 🏗️🚚 🛫⚛️ Today at The AI Now Summit, held at the Louvre, we announced AI solutions for aerospace, automotive, energy, and physics. Deployed in p — score 50
Sources: twitter_rss
We're taking on the hardest problems in the real world 🏗️🚚 🛫⚛️ Today at The AI Now Summit, held at the Louvre, we announced AI solutions for aerospace, automotive, energy, and physics. Deployed in production at @Airbus , @BMW, @EDFofficiel , and more. More below:
🟡 Wall-OSS-0.5: 4B VLA with open training code and zero-shot real-robot evaluation[D] — score 44
Sources: reddit/r/MachineLearning
Wall-OSS-0.5 is a new 4B VLA release from X Square Robot, built on a 3B VLM backbone with action experts in a Mixture-of-Transformers layout. What caught my eye is that the report evaluates the pretrained checkpoint on real robots before task-specific fine tuning, instead of only reporting downstrea
🟡 Claude Code – Everything You Can Configure That the Docs Don't Tell You — score 42
Sources: hackernews
Developer Tools
🟡 Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R] — score 69
Sources: reddit/r/MachineLearning
Just wanted to share my research regarding probe-targeted fine-tuning (LoRa) for verbal confidence calibration., If you probe the hidden states of an instruct-tuned LLM, it can tell correct from incorrect answers at 0.76–0.88 AUROC. But when you ask it directly it tends to respond with confidence at
🟡 anthropics/claude-code — Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands. — score 62
Sources: github_trending
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.
🟡 The hidden tax of web search: 80% of my agent’s tokens are wasted on garbage — score 50
Sources: reddit/r/AIAgents
Spent the weekend digging into the unit economics of my research agent and honestly got a little horrified. Turns out most of the token usage isn’t going into actual reasoning, it’s getting burned on navigation menus, footers, and cookie banners. We’re feeding smart models gigabytes of boilerplate j
🟡 The most common AI memory failure isn't a hallucination. It's a stale fact that never got corrected. — score 50
Sources: reddit/r/AIAgents
A user changes a preference, the old fact keeps winning retrieval, and your AI confidently acts on something that stopped being true months ago. No error is thrown, no alert fires, the system just quietly gets it wrong forever. That is not a model problem, that is a memory architecture problem.
🟡 How Endava builds an agentic organization with Codex — score 50
Sources: lab_blog/OpenAI
Learn how Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks to hours.
Omitted 1 additional developer tools items from the main section; see raw data and source-specific sections below.
Infrastructure & Compute
🟡 LiquidAI/LFM2.5-8B-A1B · Hugging Face — score 61
Sources: reddit/r/LocalLLaMA
looks like you can run it on any potato (A1B)! https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF from LiquidAI: LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-train
Research Papers
🟡 PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions — score 65
Sources: huggingface · arxiv/cs.AI
We address the task of generating physically accurate and visually faithful 4D Human-Object Interaction (HOI). Given a static 3D human and target object represented as 3D Gaussian Splats (3DGS), our goal is to synthesize dynamic scenes where the human actively engages with the object through actions
🟡 Thinking Before Constraining: A Unified Decoding Framework for Large Language Models — score 52
Sources: huggingface · arxiv/cs.AI
Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can inadvertently restrict reasoning capabilities by imposing cons
Other Signals
🟡 Beware!! Users trying to fork and steal your projects — score 68
Sources: reddit/r/LocalLLaMA
Context! User u/Worried_Goat_8604 claimed to have made a similar but unrelated project to my SmallCode. He framed it as "I made this before you, but we can collab if you make me co-founder". In reality, he made a low effort fork of MY project 2 day
🟡 Various LLM Smells — score 58
Sources: hackernews
🟡 Liquid AI releases LFM2.5-8B-A1B — score 54
Sources: reddit/r/LocalLLaMA
Liquid AI released LFM2.5-8B-A1B, an edge model designed to power real-life applications. It builds on LFM2-8B-A1B with three major upgrades: an expanded 128K context window, 38T tokens of pre-training (up from 12T), and large-scale reinforcement learning. It also comes with a doubled vocabulary to
🟡 How are people reducing inference costs in multi-step AI agents? — score 53
Sources: reddit/r/AIAgents
I’m on the Tensormesh team, and I’m trying to better understand how people building AI agents are handling inference costs when agents make many calls per task. One pattern we see is that the same context often gets processed repeatedly: - system prompts - tool definitions - retrieved docs - pol
🟡 llama: use f16 mask for FA to save VRAM by am17an · Pull Request #23764 · ggml-org/llama.cpp — score 46
Sources: reddit/r/LocalLLaMA
now you can download more VRAM ;) (by downloading new llama.cpp version)
🟢 Incremental
Model Releases
🟢 llama.cpp B9387 Significant AMD/ROCm PP Update — score 39
Sources: reddit/r/LocalLLaMA
https://github.com/ggml-org/llama.cpp/releases/tag/b9387 MFMA is restricted to AMD CDNA architecture that's MI100, MI200, MI300 series datacenter cards. Post your initial results if you try it! wink
🟢 Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days — score 17
Sources: reddit/r/AIAgents
🟢 Oculus Founders' AI Startup Sesame Launches Human-Like Voice AI App on iOS — score 17
Sources: reddit/r/AIAgents
🟢 Qwen 3.6 27B overdoing it — score 4
Sources: reddit/r/LocalLLaMA
Although I'm very impressed with Qwen3.6 and is my most used model, I feel that sometimes it being too proactive and start doing things I didn't ask, from creating tests for the last modification to reverting changes I made - eg removing an hardcoded value - that it thinks are instead useful to keep
Developer Tools
🟢 ariadng/metatrader-mcp-server — Model Context Protocol (MCP) to enable AI LLMs to trade using MetaTrader platform — score 36
Sources: github_trending
Model Context Protocol (MCP) to enable AI LLMs to trade using MetaTrader platform
🟢 The AI agent gold rush is skipping the consumer, and I think that's the actual opportunity — score 30
Sources: reddit/r/AIAgents
Quick disclosure: I'm building a vertical agent in sports, so I'm biased. But there's a gap here I can't stop thinking about and I want to know if this sub sees it too. We spent thirty years building software for a human with a cursor. That's quietly ending. The fastest-growing user of your product
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental soun
🟢 mastra-ai/mastra — From the team behind Gatsby, Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack. — score 27
Sources: github_trending
From the team behind Gatsby, Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.
🟢 Use HTML as the primary chat language for your agents so they can draw diagrams — score 25
Sources: reddit/r/LocalLLaMA
A week or two ago Thariq published an article on how good AI's were at working with HTML and that there was not really any reason to use markdown anymore. And yet all of our coding agents work with markdown and output markdown and have been trained
Omitted 8 additional developer tools items from the main section; see raw data and source-specific sections below.
Research Papers
🟢 Geometry Matters: 3D Foundation Priors for Learning Semantic Correspondence — score 35
Sources: huggingface
Foundation features from self-supervised vision models and text-to-image diffusion models have proven effective for semantic correspondence estimation. However, because these features are learned primarily from 2D image objectives, they lack explicit 3D awareness and often confuse symmetric object s
🟢 Towards Consistent Video Geometry Estimation — score 10
Sources: huggingface
This work presents ViGeo, a feed-forward foundation model for recovering spatially dense and temporally consistent geometry from video sequences. Built upon a plain transformer architecture without task-specific architectural modifications, ViGeo supports streaming, full-sequence, and long-video inf
Other Signals
🟢 StepFun 3.7 Flash - Speed Benchmark in M5 Max — score 32
Sources: reddit/r/LocalLLaMA
Just ran a benchmark with day-0 shipped llama.cpp's branch. M5 Max: 128 GB - Q4_K_S / memory peak around ~120+ GB making things sluggish but still usable once cmd+tab landed. Short context < 16k feels fast and very responsive. 32k-64k's speed is not bad, usable. |PP|TG|B|N_KV|T_PP s|S_PP t/
🟢 The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin — score 25
Sources: hackernews
🟢 Orchestrating AI code review at scale — score 8
Sources: hackernews
📈 Trending Repos
| Repo | Description | Stars Today | Language |
|---|---|---|---|
| run-llama/liteparse | A fast, helpful, and open-source document parser | 932 | rust |
| anthropics/claude-code | Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands. | 319 | python |
| apurvsinghgautam/robin | AI-Powered Dark Web OSINT Tool | 91 | python |
| ariadng/metatrader-mcp-server | Model Context Protocol (MCP) to enable AI LLMs to trade using MetaTrader platform | 83 | python |
| OpenMOSS/MOSS-TTS | MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS. | 71 | python |
| mastra-ai/mastra | From the team behind Gatsby, Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack. | 65 | typescript |
| microsoft/RAMPART | A pytest-native safety and security testing framework for agentic AI applications | 62 | python |
| CodeWithCJ/SparkyFitness | SparkyFitness: Built for Families. Powered by AI. Track food, fitness, water, and health — together. | 45 | typescript |
📄 New Papers
| Title | Category | Hotness | Link |
|---|---|---|---|
| UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering | research_paper | 18 | Open |
| When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems | research_paper | 6 | Open |
| RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains | research_paper | 4 | Open |
| PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions | research_paper | 3 | Open |
| Thinking Before Constraining: A Unified Decoding Framework for Large Language Models | research_paper | 2 | Open |
| Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction | cs.AI | 0 | Open |
| Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction | cs.AI | 0 | Open |
| The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling | cs.AI | 0 | Open |
| Ultra-Reduced-Impact-Encased-Logging (URIEL): propose a new method for selective sustainable logging and post-harvest silvicultural treatment in tropical forest using airborne robotics systems | cs.AI | 0 | Open |
| Review Arcade: On the Human Alignment and Gameability of LLM Reviews | cs.AI | 0 | Open |
| Orthogonal Concept Erasure for Diffusion Models | cs.AI | 0 | Open |
| Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes | cs.AI | 0 | Open |
| VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis | cs.AI | 0 | Open |
| BEAMS: Benchmarking and Evaluating AI for Modeling and Simulation | cs.AI | 0 | Open |
| Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild | cs.AI | 0 | Open |
🏢 Lab Blog Posts
🐦 Twitter/X Highlights
| Account | Tweet Summary |
|---|---|
| AnthropicAI | We've raised $65 billion in Series H funding at a $965 billion post-money valuation, led by @AltimeterCap, Dragoneer, @Greenoaks, and @sequoia. This investment will help us advance our research and expand our capacity to meet growing demand for Claude. Post |
| xai | Grok Build 0.2.7 is now out, with /usage, /login, shared terminals across subagents, and improved image understanding See all updates at https://x.ai/build/changelog Post |
| MistralAI | We're taking on the hardest problems in the real world 🏗️🚚 🛫⚛️ Today at The AI Now Summit, held at the Louvre, we announced AI solutions for aerospace, automotive, energy, and physics. Deployed in production at @Airbus , @BMW, @EDFofficiel , and more. More below: Post |
Repeated From Recent Briefings
- harry0703/MoneyPrinterTurbo — 利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM. - first seen 2026-05-28
- Lum1104/Understand-Anything — Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. - first seen 2026-05-21
- AI-generated CUDA kernels silently break training and inference [R] - first seen 2026-05-28
- NousResearch/hermes-agent — The agent that grows with you - first seen 2026-05-11
- farion1231/cc-switch — A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io - first seen 2026-05-08
- mukul975/Anthropic-Cybersecurity-Skills — 754 structured cybersecurity skills for AI agents · Mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND & NIST AI RMF · agentskills.io standard · Works with Claude Code, GitHub Copilot, Codex CLI, Cursor, Gemini CLI & 20+ platforms · 26 security domains · Apache 2.0 - first seen 2026-05-24
- anthropics/skills — Public repository for Agent Skills - first seen 2026-05-11
- earendil-works/pi — AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods - first seen 2026-05-09
- twentyhq/twenty — The open alternative to Salesforce, designed for AI. - first seen 2026-05-25
- anthropics/financial-services - first seen 2026-05-07
- ... plus 450 more repeated items in processed data