πŸ”΄ High Significance

Model Releases

πŸ”΄ Launch HN: TesterArmy (YC P26) – Agents that test web and mobile apps β€” score 83 Sources: hackernews

πŸ”΄ GLM-5.2 is above GPT-5.5 in AA-Briefcase, Artificial Analysis' new agentic knowledge work eval β€” score 75 Sources: reddit/r/LocalLLaMA

Developer Tools

πŸ”΄ Kilo-Org/kilocode β€” Kilo is the all-in-one agentic engineering platform. Build, ship, and iterate faster with the most popular open source coding agent. β€” score 99 Sources: github_trending

Kilo is the all-in-one agentic engineering platform. Build, ship, and iterate faster with the most popular open source coding agent.

πŸ”΄ GLM-5.2 inference is free on Hugging Face for the next 6 hours β€” score 89 Sources: reddit/r/LocalLLaMA

doc: https://huggingface.co/docs/inference-providers/index a cool prompt to try first: [https://huggingface.co/chat/r/aFATtCW?leafId=ed28d5b0-d99b-40be-ba8b-315b1f450e5a](https://huggingface.co/chat/r/aFATtCW?leafId=ed28d5b0-d99b-40be-ba8b-315

πŸ”΄ It has been a while since I wrote β€” score 78 Sources: reddit/r/AIAgents

And it is important that rules are followed to the letter, or else this will be censored, so therefor I shall obey the rules, I swear on my life and my love for it, that I shall not break the rules of this Sub-Reddit - A Man - That is the truth so help me humanity r/aiagents truly r

πŸ”΄ browser agents work in demo and then die on auth, sessions, captcha, dom drift... what are ppl doing? β€” score 78 Sources: reddit/r/AIAgents

The demo-to-production cliff for browser agents is brutal and nobody warns you. Demo: agent navigates to the site, clicks around, extracts data, looks magical. Production, in order of how they killed us: - auth: login flows with MFA, SSO redirects, "verify it's you" emails. agent just stops. - *

Research Papers

πŸ”΄ S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence β€” score 85 Sources: huggingface

Real-world spatial intelligence requires reasoning over a continuous and evolving 3D world, yet existing VLMs and tool-augmented agents largely remain tied to static, stateless inference from isolated visual observations. We introduce \textsc{S-Agent}, a spatial tool-use agentic paradigm for underst

πŸ”΄ Playful Agentic Robot Learning β€” score 82 Sources: huggingface Β· arxiv/cs.AI

Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied c

πŸ”΄ DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects β€” score 75 Sources: huggingface

Dexterous interaction with articulated objects is important for household, assistive, and humanoid manipulation, where multi-finger hands can provide compliant contact patterns beyond parallel-jaw grasping. However, articulated-object manipulation differs from static-object manipulation: the target

Other Signals

πŸ”΄ GLM's founder says GLM-fable before the end of the year?! β€” score 96 Sources: reddit/r/LocalLLaMA

🟑 Notable

Model Releases

🟑 @xai: Grok models are now available on Databricks Agent Bricks. Bring SpaceXAI's latest models to your enterprise data to power capable AI agents. https://x.ai/news/grok-databricks β€” score 60 Sources: twitter_rss

Grok models are now available on Databricks Agent Bricks. Bring SpaceXAI's latest models to your enterprise data to power capable AI agents. https://x.ai/news/grok-databricks

🟑 New usage analytics and updated spend controls for enterprises β€” score 50 Sources: lab_blog/OpenAI

OpenAI introduces new spend controls and usage analytics for ChatGPT Enterprise, helping organizations manage costs and scale AI with confidence.

🟑 Improving health intelligence in ChatGPT β€” score 50 Sources: lab_blog/OpenAI

Learn how GPT-5.5 Instant improves ChatGPT’s health and wellness responses with stronger reasoning, better context, clearer communication, and physician-informed evaluations.

🟑 @AnthropicAI: New Frontier Red Team blog: Phase 2 of Project Fetch, where we test how well Claude can program a robodog. Opus 4.7, on its own, was ~20x faster than last year's best human team aided by Opus 4.1. (T β€” score 50 Sources: twitter_rss

New Frontier Red Team blog: Phase 2 of Project Fetch, where we test how well Claude can program a robodog. Opus 4.7, on its own, was ~20x faster than last year's best human team aided by Opus 4.1. (The robodog, alas, still failed to fetch a beach ball.) https://www.anthropic.com/research/project-fet

🟑 Giving GLM-5.2 a spin locally on CPU only! (poor man's rig for big models) β€” score 46 Sources: reddit/r/LocalLLaMA

This is the UD-Q2-K_XL quant. Hardware is: Model: Dell PowerEdge R740 CPU: Dual Xeon 6248R (24 cores each) RAM: 768 GB (All memory channels populated) I'm using ik_llama.cpp which provides some significant performance improvements over the base llama.cpp for CPU-only inference. Unfortunately, we d

Developer Tools

🟑 K-Dense-AI/scientific-agent-skills β€” Turn any AI agent into an AI Scientist. The #1 Agent Skills library for science, used by 160,000+ scientists worldwide. 140 ready-to-use skills plus 100+ scientific databases covering biology, chemistry, medicine, and drug discovery. Compatible with Cursor, Claude Code, Codex, Pi, Antigravity, and the open Agent Skills standard. β€” score 69 Sources: github_trending

Turn any AI agent into an AI Scientist. The #1 Agent Skills library for science, used by 160,000+ scientists worldwide. 140 ready-to-use skills plus 100+ scientific databases covering biology, chemistry, medicine, and drug discovery. Compatible with Cursor, Claude Code, Codex, Pi, Antigravity, and t

🟑 BuilderIO/agent-native β€” A framework for building agent-native applications. β€” score 67 Sources: github_trending

A framework for building agent-native applications.

🟑 garrytan/gbrain β€” Garry's Opinionated OpenClaw/Hermes Agent Brain β€” score 64 Sources: github_trending

Garry's Opinionated OpenClaw/Hermes Agent Brain

🟑 Is Your Agent a Liar? How to Tell and How to Overcome it: β€” score 56 Sources: reddit/r/AIAgents

How to Tell If Your AI Agent Is Lying, Hallucinating, or Building Vaporware I have spent a lot of time working with autonomous AI agents. After running them on real projects I have learned one expensive lesson. Most of them are very good at lying. They sound confident. They produce clean looking

🟑 poolside/Laguna-M.1 Β· Hugging Face - 225B-A23B β€” score 54 Sources: reddit/r/LocalLLaMA

Laguna M.1 Laguna M.1 is a 225B total parameter Mixture-of-Experts model with 23B activated parameters per token designed for agentic coding and long-horizon work. # Highlights * Large sparse MoE for agentic coding: Laguna M.1 is a 70-layer MoE transformer with 225B total parameters and 23B ac

Omitted 4 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟑 Fearless Concurrency on the GPU: Safe GPU inference in Rust, competitive with vLLM/SGLang [R] β€” score 69 Sources: reddit/r/MachineLearning

I maintain cuTile Rust and just posted the paper "Fearless Concurrency on the GPU." As more GPU code gets AI-generated, the bottleneck moves from writing it to trusting it. cuTile Rust lets you write or generate GPU kernels whose memory safety and data-race freedom are verified by the compiler, thro

🟑 Lightricks/LTX-2 β€” Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model. β€” score 43 Sources: github_trending

Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.

Research Papers

🟑 JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising β€” score 65 Sources: huggingface

Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough challenge. Existing optimization-based methods are slow and can produce oversaturated colors. In contrast, naive stitching approaches fail to produce geome

🟑 FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows β€” score 55 Sources: huggingface

Conditional diffusion and flow models routinely fail to satisfy the very constraints that define their task. For instance, a depth-conditioned model often produces images whose re-extracted depth disagrees with the input, even though the forward operator--the depth predictor defining the constraint-

🟑 DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis β€” score 55 Sources: huggingface Β· arxiv/cs.AI

Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a

Other Signals

🟑 GLM-5.2 Is The Best Open Weight Creative Writing Model β€” score 68 Sources: reddit/r/LocalLLaMA

As Per Sam Paech's Creative Writing Benchmark on EQ Bench: https://eqbench.com/creative_writing.html

🟑 OSS models decisively overtook Proprietary models in market share (based on the last 3 months of OpenRouter data) β€” score 61 Sources: reddit/r/LocalLLaMA

🟑 The Korean telecom giant at the center of Anthropic's Mythos controversy β€” score 50 Sources: hackernews

🟒 Incremental

Model Releases

🟒 Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter β€” score 25 Sources: reddit/r/LocalLLaMA

Hey! We heard the feedback on making the model more portable and accessible. So in light of that we have 2 updates to share. First, you can pull a new 4-bit quant straight from Hugging Face, so it’s now small enough to run on a Mac or wh

🟒 [NEW MODEL] SupraLabs just released SupraVL-Nano-900k, a Vision-Language Model built entirely from scratch! β€” score 4 Sources: reddit/r/LocalLLaMA

Hey r/LocalLLaMA! We just released SupraVL-Nano-900k, our first VLM. It has ~900k parameters, was trained from scratch on Flickr8k, and the entire architecture fits in a single Jupyter notebook. This is not a production model, it's a fully transparent, readable blueprint for anyone who wants to

Developer Tools

🟒 Looking for 3–4 people with running AI agents to test a multi-agent collaboration platform ($20/hour) β€” score 39 Sources: reddit/r/AIAgents

Hey everyone, I’m looking for 3–4 people who already have AI agents running and are willing to help test a multi-agent collaboration platform I’ve built. The platform allows agents to connect with other agents in a controlled/supervised environment. An agent can create a session, invite β€œfriend” age

🟒 labring/FastGPT β€” FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration. β€” score 37 Sources: github_trending

FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive s

🟒 livekit/agents β€” A framework for building realtime voice AI agents πŸ€–πŸŽ™οΈπŸ“Ή β€” score 26 Sources: github_trending

A framework for building realtime voice AI agents πŸ€–πŸŽ™οΈπŸ“Ή

🟒 Google DeepMind unveils plan to protect itself from its own rogue AI agents β€” score 17 Sources: reddit/r/AIAgents

🟒 Sharing my DIY framework that gives AI coding agents eyes β€” they can finally see the UI they build (open source) β€” score 8 Sources: reddit/r/AIAgents

I kept hitting the same wall with coding agents: they're blind . An agent writes a web page, a chart, an SVG, a PDF… and never actually sees the result. It reasons from source code and terminal output, then confidently says "done" while the button overflows, the text fails contrast, the chart

Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.

Infrastructure & Compute

🟒 GLM-5.2 (744B, 2-bit) at 7.3 tok/s on 4Γ—3090 + 192GB β€” and why IQ1_M wasn't any faster β€” score 18 Sources: reddit/r/LocalLLaMA

TLDR: For the first time, I feel relief that they could shut down the cloud services and I would be ok. I got my 4th 3090 and then unsloth dropped the Q2 and Q1. I wrote nothing else here its from CC, so it might be wrong. GLM-5.2 UD-IQ2_M runs across 4Γ—3090 + RAM expert offload at ~7.3 tok/s. Two

Business & Funding

🟒 LQ50/LQ50-24GB cost around $1200 β€” score 39 Sources: reddit/r/LocalLLaMA

Well found this shit on TAOBAO very expensive

Enterprise Adoption

🟒 Voice debugging at the conversation level seems far more useful than isolated benchmark metrics [D] β€” score 19 Sources: reddit/r/MachineLearning

I have been thinking a lot about how poorly isolated benchmark metrics capture real conversational system quality once models are deployed into multi-turn environments. You can have strong STT scores, decent latency, high task completion rates, and still end up with conversations that humans perceiv

Research Papers

🟒 HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining β€” score 25 Sources: huggingface

Embodied foundation models are expected to benefit from data scaling like large language models, but face a much tighter data bottleneck. Teleoperated real-robot trajectories remain the dominant pretraining source due to their precise action supervision and embodiment alignment, yet their scalabilit

🟒 No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages β€” score 5 Sources: huggingface

Large Language Models (LLMs) have significantly advanced the automation of software engineering tasks. One prominent example is code generation, where an LLM produces code in a specified programming language based on a natural language description. Most research in this area has focused on high-reso

Other Signals

🟒 Researchers trained a Deep Research agent with 32 H100s and open-sourced everything β€” score 32 Sources: reddit/r/LocalLLaMA

Ohio State University's NLP team released QUEST-35B, an open-source Deep Research agent trained using ~32 H100s and ~8K synthetic samples. The team open-sourced the training recipe, code, weights and datasets. Benchmark results show competitive performance against several frontier Deep Research

🟒 Latent space interpretation [R] β€” score 31 Sources: reddit/r/MachineLearning

Hi all, I have trained a convolutional autoencoder on a set of medical images. Further classified latent feature maps using random forest to find the top scoring feature map. Now my goal is to understand which input image is captured in top scoring latent feature map. Any suggestions? I have tried e

🟒 Zen and the Art of Machine Learning Research β€” score 17 Sources: hackernews

🟒 GLM-5.2 can now run locally in llama.cpp and Unsloth Studio. β€” score 11 Sources: reddit/r/LocalLLaMA

The 2-bit model retains ~82% accuracy after we shrunk it from 1.51TB to 238GB (-84% size). Run on a 256GB Mac or RAM/VRAM setups. GLM-5.2 is the strongest open model to date. Check the graph for the accuracy of each GLM-5.2-GGUF quantization. Full guide: https://unsloth.ai/docs/models/glm-5.2 GGUF:

RepoDescriptionStars TodayLanguage
Kilo-Org/kilocodeKilo is the all-in-one agentic engineering platform. Build, ship, and iterate faster with the most popular open source coding agent.1345typescript
K-Dense-AI/scientific-agent-skillsTurn any AI agent into an AI Scientist. The #1 Agent Skills library for science, used by 160,000+ scientists worldwide. 140 ready-to-use skills plus 100+ scientific databases covering biology, chemistry, medicine, and drug discovery. Compatible with Cursor, Claude Code, Codex, Pi, Antigravity, and the open Agent Skills standard.174python
BuilderIO/agent-nativeA framework for building agent-native applications.172typescript
garrytan/gbrainGarry's Opinionated OpenClaw/Hermes Agent Brain167typescript
microsoft/qlibQlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, including supervised learning, market dynamics modeling, and RL, and is now equipped withhttps://github.com/microsoft/RD-Agentto automate R&D process.92python
openai/skillsSkills Catalog for Codex75python
Lightricks/LTX-2Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.51python
cocoindex-io/cocoindex-codeA super light-weight embedded code search engine CLI (AST based) that just works - saves 70% token and improves speed for coding agent 🌟 Star if you like it!48python
labring/FastGPTFastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.42typescript
livekit/agentsA framework for building realtime voice AI agents πŸ€–πŸŽ™οΈπŸ“Ή19python

πŸ“„ New Papers

TitleCategoryHotnessLink
S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligenceresearch_paper22Open
Playful Agentic Robot Learningresearch_paper29Open
DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objectsresearch_paper14Open
JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoisingresearch_paper12Open
FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flowsresearch_paper8Open
DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesisresearch_paper5Open
Deontic Policies for Runtime Governance of Agentic AI Systemscs.AI0Open
Measuring Curriculum Alignment across Topical Coverage, Competency, and Cognitive Depth: A Longitudinal Framework Applied to CS2013 and CS2023cs.AI0Open
Diffusion Language Models: An Experimental Analysiscs.AI0Open
Hidden Anchors in Multi-Agent LLM Deliberationcs.AI0Open
DeXposure-Claw: An Agentic System for DeFi Risk Supervisioncs.AI0Open
LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Datacs.AI0Open
REVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer's Disease Riskcs.AI0Open
Emergent Alignmentcs.AI0Open
ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrencecs.AI0Open

🏒 Lab Blog Posts

🐦 Twitter/X Highlights

AccountTweet Summary
xaiGrok models are now available on Databricks Agent Bricks. Bring SpaceXAI's latest models to your enterprise data to power capable AI agents. https://x.ai/news/grok-databricks Post
AnthropicAINew Frontier Red Team blog: Phase 2 of Project Fetch, where we test how well Claude can program a robodog. Opus 4.7, on its own, was ~20x faster than last year's best human team aided by Opus 4.1. (The robodog, alas, still failed to fetch a beach ball.) https://www.anthropic.com/research/project-fet Post
GoogleDeepMindPinned: Instead of assuming AI will always do what we intend, we ask: what if it doesn't? That’s why we’ve developed our AI Control Roadmap: a framework for building and managing the advanced AI we deploy within Google. 🧡 Post

Repeated From Recent Briefings