π΄ High Significance
Model Releases
π΄ Launch HN: TesterArmy (YC P26) β Agents that test web and mobile apps β score 83
Sources: hackernews
π΄ GLM-5.2 is above GPT-5.5 in AA-Briefcase, Artificial Analysis' new agentic knowledge work eval β score 75
Sources: reddit/r/LocalLLaMA
Developer Tools
π΄ Kilo-Org/kilocode β Kilo is the all-in-one agentic engineering platform. Build, ship, and iterate faster with the most popular open source coding agent. β score 99
Sources: github_trending
Kilo is the all-in-one agentic engineering platform. Build, ship, and iterate faster with the most popular open source coding agent.
π΄ GLM-5.2 inference is free on Hugging Face for the next 6 hours β score 89
Sources: reddit/r/LocalLLaMA
doc: https://huggingface.co/docs/inference-providers/index a cool prompt to try first: [https://huggingface.co/chat/r/aFATtCW?leafId=ed28d5b0-d99b-40be-ba8b-315b1f450e5a](https://huggingface.co/chat/r/aFATtCW?leafId=ed28d5b0-d99b-40be-ba8b-315
π΄ It has been a while since I wrote β score 78
Sources: reddit/r/AIAgents
And it is important that rules are followed to the letter, or else this will be censored, so therefor I shall obey the rules, I swear on my life and my love for it, that I shall not break the rules of this Sub-Reddit - A Man - That is the truth so help me humanity r/aiagents truly r
π΄ browser agents work in demo and then die on auth, sessions, captcha, dom drift... what are ppl doing? β score 78
Sources: reddit/r/AIAgents
The demo-to-production cliff for browser agents is brutal and nobody warns you. Demo: agent navigates to the site, clicks around, extracts data, looks magical. Production, in order of how they killed us: - auth: login flows with MFA, SSO redirects, "verify it's you" emails. agent just stops. - *
Research Papers
π΄ S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence β score 85
Sources: huggingface
Real-world spatial intelligence requires reasoning over a continuous and evolving 3D world, yet existing VLMs and tool-augmented agents largely remain tied to static, stateless inference from isolated visual observations. We introduce \textsc{S-Agent}, a spatial tool-use agentic paradigm for underst
π΄ Playful Agentic Robot Learning β score 82
Sources: huggingface Β· arxiv/cs.AI
Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied c
π΄ DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects β score 75
Sources: huggingface
Dexterous interaction with articulated objects is important for household, assistive, and humanoid manipulation, where multi-finger hands can provide compliant contact patterns beyond parallel-jaw grasping. However, articulated-object manipulation differs from static-object manipulation: the target
Other Signals
π΄ GLM's founder says GLM-fable before the end of the year?! β score 96
Sources: reddit/r/LocalLLaMA
π‘ Notable
Model Releases
π‘ @xai: Grok models are now available on Databricks Agent Bricks. Bring SpaceXAI's latest models to your enterprise data to power capable AI agents. https://x.ai/news/grok-databricks β score 60
Sources: twitter_rss
Grok models are now available on Databricks Agent Bricks. Bring SpaceXAI's latest models to your enterprise data to power capable AI agents. https://x.ai/news/grok-databricks
π‘ New usage analytics and updated spend controls for enterprises β score 50
Sources: lab_blog/OpenAI
OpenAI introduces new spend controls and usage analytics for ChatGPT Enterprise, helping organizations manage costs and scale AI with confidence.
π‘ Improving health intelligence in ChatGPT β score 50
Sources: lab_blog/OpenAI
Learn how GPT-5.5 Instant improves ChatGPTβs health and wellness responses with stronger reasoning, better context, clearer communication, and physician-informed evaluations.
π‘ @AnthropicAI: New Frontier Red Team blog: Phase 2 of Project Fetch, where we test how well Claude can program a robodog. Opus 4.7, on its own, was ~20x faster than last year's best human team aided by Opus 4.1. (T β score 50
Sources: twitter_rss
New Frontier Red Team blog: Phase 2 of Project Fetch, where we test how well Claude can program a robodog. Opus 4.7, on its own, was ~20x faster than last year's best human team aided by Opus 4.1. (The robodog, alas, still failed to fetch a beach ball.) https://www.anthropic.com/research/project-fet
π‘ Giving GLM-5.2 a spin locally on CPU only! (poor man's rig for big models) β score 46
Sources: reddit/r/LocalLLaMA
This is the UD-Q2-K_XL quant. Hardware is: Model: Dell PowerEdge R740 CPU: Dual Xeon 6248R (24 cores each) RAM: 768 GB (All memory channels populated) I'm using ik_llama.cpp which provides some significant performance improvements over the base llama.cpp for CPU-only inference. Unfortunately, we d
Developer Tools
π‘ K-Dense-AI/scientific-agent-skills β Turn any AI agent into an AI Scientist. The #1 Agent Skills library for science, used by 160,000+ scientists worldwide. 140 ready-to-use skills plus 100+ scientific databases covering biology, chemistry, medicine, and drug discovery. Compatible with Cursor, Claude Code, Codex, Pi, Antigravity, and the open Agent Skills standard. β score 69
Sources: github_trending
Turn any AI agent into an AI Scientist. The #1 Agent Skills library for science, used by 160,000+ scientists worldwide. 140 ready-to-use skills plus 100+ scientific databases covering biology, chemistry, medicine, and drug discovery. Compatible with Cursor, Claude Code, Codex, Pi, Antigravity, and t
π‘ BuilderIO/agent-native β A framework for building agent-native applications. β score 67
Sources: github_trending
A framework for building agent-native applications.
π‘ garrytan/gbrain β Garry's Opinionated OpenClaw/Hermes Agent Brain β score 64
Sources: github_trending
Garry's Opinionated OpenClaw/Hermes Agent Brain
π‘ Is Your Agent a Liar? How to Tell and How to Overcome it: β score 56
Sources: reddit/r/AIAgents
How to Tell If Your AI Agent Is Lying, Hallucinating, or Building Vaporware I have spent a lot of time working with autonomous AI agents. After running them on real projects I have learned one expensive lesson. Most of them are very good at lying. They sound confident. They produce clean looking
π‘ poolside/Laguna-M.1 Β· Hugging Face - 225B-A23B β score 54
Sources: reddit/r/LocalLLaMA
Laguna M.1 Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token designed for agentic coding and long-horizon work. # Highlights * Large sparse MoE for agentic coding: Laguna M.1 is a 70-layer MoE transformer with 225B total parameters and 23B ac
Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.
Infrastructure & Compute
π‘ Fearless Concurrency on the GPU: Safe GPU inference in Rust, competitive with vLLM/SGLang [R] β score 69
Sources: reddit/r/MachineLearning
I maintain cuTile Rust and just posted the paper "Fearless Concurrency on the GPU." As more GPU code gets AI-generated, the bottleneck moves from writing it to trusting it. cuTile Rust lets you write or generate GPU kernels whose memory safety and data-race freedom are verified by the compiler, thro
π‘ Lightricks/LTX-2 β Official Python inference and LoRA trainer package for the LTX-2 audioβvideo generative model. β score 43
Sources: github_trending
Official Python inference and LoRA trainer package for the LTX-2 audioβvideo generative model.
Research Papers
π‘ JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising β score 65
Sources: huggingface
Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough challenge. Existing optimization-based methods are slow and can produce oversaturated colors. In contrast, naive stitching approaches fail to produce geome
π‘ FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows β score 55
Sources: huggingface
Conditional diffusion and flow models routinely fail to satisfy the very constraints that define their task. For instance, a depth-conditioned model often produces images whose re-extracted depth disagrees with the input, even though the forward operator--the depth predictor defining the constraint-
π‘ DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis β score 55
Sources: huggingface Β· arxiv/cs.AI
Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a
Other Signals
π‘ GLM-5.2 Is The Best Open Weight Creative Writing Model β score 68
Sources: reddit/r/LocalLLaMA
As Per Sam Paech's Creative Writing Benchmark on EQ Bench: https://eqbench.com/creative_writing.html
π‘ OSS models decisively overtook Proprietary models in market share (based on the last 3 months of OpenRouter data) β score 61
Sources: reddit/r/LocalLLaMA
π‘ The Korean telecom giant at the center of Anthropic's Mythos controversy β score 50
Sources: hackernews
π’ Incremental
Model Releases
π’ Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter β score 25
Sources: reddit/r/LocalLLaMA
Hey! We heard the feedback on making the model more portable and accessible. So in light of that we have 2 updates to share. First, you can pull a new 4-bit quant straight from Hugging Face, so itβs now small enough to run on a Mac or wh
π’ [NEW MODEL] SupraLabs just released SupraVL-Nano-900k, a Vision-Language Model built entirely from scratch! β score 4
Sources: reddit/r/LocalLLaMA
Hey r/LocalLLaMA! We just released SupraVL-Nano-900k, our first VLM. It has ~900k parameters, was trained from scratch on Flickr8k, and the entire architecture fits in a single Jupyter notebook. This is not a production model, it's a fully transparent, readable blueprint for anyone who wants to
Developer Tools
π’ Looking for 3β4 people with running AI agents to test a multi-agent collaboration platform ($20/hour) β score 39
Sources: reddit/r/AIAgents
Hey everyone, Iβm looking for 3β4 people who already have AI agents running and are willing to help test a multi-agent collaboration platform Iβve built. The platform allows agents to connect with other agents in a controlled/supervised environment. An agent can create a session, invite βfriendβ age
π’ labring/FastGPT β FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration. β score 37
Sources: github_trending
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive s
π’ livekit/agents β A framework for building realtime voice AI agents π€ποΈπΉ β score 26
Sources: github_trending
A framework for building realtime voice AI agents π€ποΈπΉ
π’ Google DeepMind unveils plan to protect itself from its own rogue AI agents β score 17
Sources: reddit/r/AIAgents
π’ Sharing my DIY framework that gives AI coding agents eyes β they can finally see the UI they build (open source) β score 8
Sources: reddit/r/AIAgents
I kept hitting the same wall with coding agents: they're blind . An agent writes a web page, a chart, an SVG, a PDF⦠and never actually sees the result. It reasons from source code and terminal output, then confidently says "done" while the button overflows, the text fails contrast, the chart
Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.
Infrastructure & Compute
π’ GLM-5.2 (744B, 2-bit) at 7.3 tok/s on 4Γ3090 + 192GB β and why IQ1_M wasn't any faster β score 18
Sources: reddit/r/LocalLLaMA
TLDR: For the first time, I feel relief that they could shut down the cloud services and I would be ok. I got my 4th 3090 and then unsloth dropped the Q2 and Q1. I wrote nothing else here its from CC, so it might be wrong. GLM-5.2 UD-IQ2_M runs across 4Γ3090 + RAM expert offload at ~7.3 tok/s. Two
Business & Funding
π’ LQ50/LQ50-24GB cost around $1200 β score 39
Sources: reddit/r/LocalLLaMA
Well found this shit on TAOBAO very expensive
Enterprise Adoption
π’ Voice debugging at the conversation level seems far more useful than isolated benchmark metrics [D] β score 19
Sources: reddit/r/MachineLearning
I have been thinking a lot about how poorly isolated benchmark metrics capture real conversational system quality once models are deployed into multi-turn environments. You can have strong STT scores, decent latency, high task completion rates, and still end up with conversations that humans perceiv
Research Papers
π’ HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining β score 25
Sources: huggingface
Embodied foundation models are expected to benefit from data scaling like large language models, but face a much tighter data bottleneck. Teleoperated real-robot trajectories remain the dominant pretraining source due to their precise action supervision and embodiment alignment, yet their scalabilit
π’ No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages β score 5
Sources: huggingface
Large Language Models (LLMs) have significantly advanced the automation of software engineering tasks. One prominent example is code generation, where an LLM produces code in a specified programming language based on a natural language description. Most research in this area has focused on high-reso
Other Signals
π’ Researchers trained a Deep Research agent with 32 H100s and open-sourced everything β score 32
Sources: reddit/r/LocalLLaMA
Ohio State University's NLP team released QUEST-35B, an open-source Deep Research agent trained using ~32 H100s and ~8K synthetic samples. The team open-sourced the training recipe, code, weights and datasets. Benchmark results show competitive performance against several frontier Deep Research
π’ Latent space interpretation [R] β score 31
Sources: reddit/r/MachineLearning
Hi all, I have trained a convolutional autoencoder on a set of medical images. Further classified latent feature maps using random forest to find the top scoring feature map. Now my goal is to understand which input image is captured in top scoring latent feature map. Any suggestions? I have tried e
π’ Zen and the Art of Machine Learning Research β score 17
Sources: hackernews
π’ GLM-5.2 can now run locally in llama.cpp and Unsloth Studio. β score 11
Sources: reddit/r/LocalLLaMA
The 2-bit model retains ~82% accuracy after we shrunk it from 1.51TB to 238GB (-84% size). Run on a 256GB Mac or RAM/VRAM setups. GLM-5.2 is the strongest open model to date. Check the graph for the accuracy of each GLM-5.2-GGUF quantization. Full guide: https://unsloth.ai/docs/models/glm-5.2 GGUF:
π Trending Repos
| Repo | Description | Stars Today | Language |
|---|---|---|---|
| Kilo-Org/kilocode | Kilo is the all-in-one agentic engineering platform. Build, ship, and iterate faster with the most popular open source coding agent. | 1345 | typescript |
| K-Dense-AI/scientific-agent-skills | Turn any AI agent into an AI Scientist. The #1 Agent Skills library for science, used by 160,000+ scientists worldwide. 140 ready-to-use skills plus 100+ scientific databases covering biology, chemistry, medicine, and drug discovery. Compatible with Cursor, Claude Code, Codex, Pi, Antigravity, and the open Agent Skills standard. | 174 | python |
| BuilderIO/agent-native | A framework for building agent-native applications. | 172 | typescript |
| garrytan/gbrain | Garry's Opinionated OpenClaw/Hermes Agent Brain | 167 | typescript |
| microsoft/qlib | Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, including supervised learning, market dynamics modeling, and RL, and is now equipped withhttps://github.com/microsoft/RD-Agentto automate R&D process. | 92 | python |
| openai/skills | Skills Catalog for Codex | 75 | python |
| Lightricks/LTX-2 | Official Python inference and LoRA trainer package for the LTX-2 audioβvideo generative model. | 51 | python |
| cocoindex-io/cocoindex-code | A super light-weight embedded code search engine CLI (AST based) that just works - saves 70% token and improves speed for coding agent π Star if you like it! | 48 | python |
| labring/FastGPT | FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration. | 42 | typescript |
| livekit/agents | A framework for building realtime voice AI agents π€ποΈπΉ | 19 | python |
π New Papers
| Title | Category | Hotness | Link |
|---|---|---|---|
| S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence | research_paper | 22 | Open |
| Playful Agentic Robot Learning | research_paper | 29 | Open |
| DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects | research_paper | 14 | Open |
| JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising | research_paper | 12 | Open |
| FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows | research_paper | 8 | Open |
| DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis | research_paper | 5 | Open |
| Deontic Policies for Runtime Governance of Agentic AI Systems | cs.AI | 0 | Open |
| Measuring Curriculum Alignment across Topical Coverage, Competency, and Cognitive Depth: A Longitudinal Framework Applied to CS2013 and CS2023 | cs.AI | 0 | Open |
| Diffusion Language Models: An Experimental Analysis | cs.AI | 0 | Open |
| Hidden Anchors in Multi-Agent LLM Deliberation | cs.AI | 0 | Open |
| DeXposure-Claw: An Agentic System for DeFi Risk Supervision | cs.AI | 0 | Open |
| LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data | cs.AI | 0 | Open |
| REVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer's Disease Risk | cs.AI | 0 | Open |
| Emergent Alignment | cs.AI | 0 | Open |
| ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence | cs.AI | 0 | Open |
π’ Lab Blog Posts
- OpenAI: New usage analytics and updated spend controls for enterprises
- OpenAI: Improving health intelligence in ChatGPT
π¦ Twitter/X Highlights
| Account | Tweet Summary |
|---|---|
| xai | Grok models are now available on Databricks Agent Bricks. Bring SpaceXAI's latest models to your enterprise data to power capable AI agents. https://x.ai/news/grok-databricks Post |
| AnthropicAI | New Frontier Red Team blog: Phase 2 of Project Fetch, where we test how well Claude can program a robodog. Opus 4.7, on its own, was ~20x faster than last year's best human team aided by Opus 4.1. (The robodog, alas, still failed to fetch a beach ball.) https://www.anthropic.com/research/project-fet Post |
| GoogleDeepMind | Pinned: Instead of assuming AI will always do what we intend, we ask: what if it doesn't? Thatβs why weβve developed our AI Control Roadmap: a framework for building and managing the advanced AI we deploy within Google. π§΅ Post |
Repeated From Recent Briefings
- google-research/timesfm β TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting. - first seen 2026-05-02
- Egonex-AI/Understand-Anything β Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more. - first seen 2026-05-21
- Next-Latent Prediction Transformers [R] - first seen 2026-06-17
- I think most AI voice agent demos hide the hardest part: the listening layer - first seen 2026-06-18
- calesthio/OpenMontage β World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio. - first seen 2026-06-18
- anthropics/financial-services - first seen 2026-05-07
- unsloth GLM-5.2-GGUF , including 2bit at 238GB - first seen 2026-06-18
- Is foundational AI research still something that can be done without access to HPC? [D] - first seen 2026-06-18
- openai/codex β Lightweight coding agent that runs in your terminal - first seen 2026-05-10
- continuedev/continue β open-source coding agent - first seen 2026-06-18
- ... plus 87 more repeated items in processed data