π΄ High Significance
Model Releases
π΄ Qwen3.6-27B vs Coder-Next β score 88
Sources: reddit/r/LocalLLaMA
Burned about 20 hours of side-by-side compute on my two RTX PRO 6000 Blackwells trying to get a definitive answer on which of these two models was clearly better. As with many things in life, after many tokens and kWhs later the answer was "it depends."
These models in the aggregate are actually cr
π΄ DeepClaude β Claude Code agent loop with DeepSeek V4 Pro β score 75
Sources: hackernews
Developer Tools
π΄ TauricResearch/TradingAgents β TradingAgents: Multi-Agents LLM Financial Trading Framework β score 99
Sources: github_trending
TradingAgents: Multi-Agents LLM Financial Trading Framework
π΄ ruvnet/ruflo β π The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, self-learning swarm intelligence, RAG integration, and native Claude Code / Codex Integration β score 97
Sources: github_trending
π The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, self-learning swarm intelligence, RAG integration, and native Claude Code / Codex Integration
π΄ One bash permission slipped... β score 96
Sources: reddit/r/LocalLLaMA
How? It kept getting chained bash commands wrong, with wrong escapes. So it created many bad directories, and tried "fixing" its mistake. It offered to run a large bash command, with
rm -rfinside, and stupid me missed it.
I'm glad I push everything often. But the disruption is massive.
FAQ:
π΄ Are modern ML PhDs becoming too incremental, or is this just what research looks like now? [D] β score 94
Sources: reddit/r/MachineLearning
Iβve been thinking about the current state of machine learning PhDs, including my own work, and Iβd like to hear how others see it.
My impression is that a large fraction of modern ML PhD work follows a fairly predictable pattern: take an existing idea, connect it to another existing idea, apply i
π΄ Why do AI responses get worse after a while of working on them? And what to do with it β score 93
Sources: reddit/r/AIAgents
AIs have a known problem (it's called context rot): the longer the chat, the worse the responses. Even staying on the same topic. The model begins to confuse old decisions with new ones, re-proposes ideas that have already been discarded, loses the thread of what is current and what is not.
It'
Omitted 8 additional developer tools items from the main section; see raw data and source-specific sections below.
Infrastructure & Compute
π΄ AMD Strix Halo refresh with 192gb! β score 81
Sources: reddit/r/LocalLLaMA
Looks like the next strix halo, the Gorgon halo 495 max will have more then 128gb! I already bought a strix halo mini forms couple months ago since the 2026 refesh rumors was not interesting. Was not planning on getting another till 2027 with the bigger refresh, and linking them together. But was pl
Research Papers
π΄ UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors β score 95
Sources: huggingface
Recent progress has shown that video diffusion models (VDMs) can be repurposed for diverse multimodal graphics tasks. However, existing methods often train separate models for each problem setting, which fixes the input-output mapping and limits the modeling of correlations across modalities. We pre
π΄ Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies β score 85
Sources: huggingface
Generalist robot policies increasingly benefit from large-scale pretraining, but offline data alone is insufficient for robust real-world deployment. Deployed robots encounter distribution shifts, long-tail failures, task variations, and human correction opportunities that fixed demonstration datase
π΄ From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills β score 75
Sources: huggingface
LLM agents increasingly rely on reusable skills, capability packages that combine instructions, control flow, constraints, and tool calls. In most current agent systems, however, skills are still represented by text-heavy artifacts, including SKILL.md-style documents and structured records whose mac
Other Signals
π΄ A Qwen finetune, that feels VERY human β score 73
Sources: reddit/r/LocalLLaMA
Hello guys,
So TL;DR, I was asked by multiple people to make an Assistant_Pepe_32B version, but the best base model contender was Qwen3-32B, a model that is very hard to tune on anything other than STEM.
The concept of Assistant_Pepe is an assistant without a typical 'assistant brain', that is
π‘ Notable
Model Releases
π‘ What a time to be alive from 1tk/sec to 20-100tk/sec for huge models β score 65
Sources: reddit/r/LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/1eb6to7/llama_405b_q4_k_m_quantization_running_locally/
[https://www.reddit.com/r/LocalLLaMA/comments/1ebbgkr/llama_31_405b_q5_k_m_runnin
π‘ **[@OpenAI: One week since the launch of GPT-5.5, and itβs already our strongest model launch yet.
API revenue is growing more than 2x faster than any prior release, while Codex doubled revenue in under seven d](https://x.com/OpenAI/status/2050250926888468929)** β score 60
Sources: twitter_rss
One week since the launch of GPT-5.5, and itβs already our strongest model launch yet.
API revenue is growing more than 2x faster than any prior release, while Codex doubled revenue in under seven days as enterprise demand for agentic coding tools keeps climbing.
π‘ **[@xai: Voice Cloning is now live via the xAI API!
Create a custom voice in less than 2 minutes or select from our library of 80+ voices across 28 languages to personalize your voice agents, audiobooks, vide](https://x.com/xai/status/2050355373052223585)** β score 60
Sources: twitter_rss
Voice Cloning is now live via the xAI API!
Create a custom voice in less than 2 minutes or select from our library of 80+ voices across 28 languages to personalize your voice agents, audiobooks, video game characters, and more.
http://x.ai/news/grok-custom-voices
π‘ **[@xai: Introducing Grok Voice Think Fast 1.0
A state-of-the-art voice model built for complex, multi-step workflows with snappy responses and high accuracy.
It takes the top spot on the Tau Voice Bench and](https://x.com/xai/status/2047441173569216721)** β score 60
Sources: twitter_rss
Introducing Grok Voice Think Fast 1.0
A state-of-the-art voice model built for complex, multi-step workflows with snappy responses and high accuracy.
It takes the top spot on the Tau Voice Bench and handles real-world messiness like noise, accents, and interruptions better than any other model in
π‘ **[@MistralAI: π Today, we're releasing the public preview of Workflows, the orchestration layer for enterprise AI.
π Enterprise teams have capable models. What they don't have is a way to run them reliably in prod](https://x.com/MistralAI/status/2049128071874179091)** β score 60
Sources: twitter_rss
π Today, we're releasing the public preview of Workflows, the orchestration layer for enterprise AI.
π Enterprise teams have capable models. What they don't have is a way to run them reliably in production. That's the gap Workflows fills. It takes AI-powered business processes from prototype to pro
Omitted 3 additional model releases items from the main section; see raw data and source-specific sections below.
Developer Tools
π‘ LearningCircuit/local-deep-research β Local Deep Research achieves ~95% on SimpleQA benchmark (tested with Qwen 3.6). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local & Encrypted. β score 66
Sources: github_trending
Local Deep Research achieves ~95% on SimpleQA benchmark (tested with Qwen 3.6). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local & Encrypted.
π‘ iOfficeAI/AionUi β Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | π Star if you like it! β score 64
Sources: github_trending
Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | π Star if you like it!
π‘ harvard-edge/cs249r_book β Machine Learning Systems β score 61
Sources: github_trending
Machine Learning Systems
π‘ Spent 6 months building one platform that replaces my LLM proxy + agent framework + workflow engine + observability stack - sharing before I keep adding features forever β score 59
Sources: reddit/r/AIAgents
Motivation: I wanted one tool that handles every aspect of building an agent. Didn't want to pay for a stack of products (LiteLLM, n8n, LangSmith, etc.) and didn't want five dashboards, five auth setups, and traces that don't connect across layers. We're already dependent on the model providers
π‘ Q00/ouroboros β Agent OS: Stop prompting. Start specifying. β score 59
Sources: github_trending
Agent OS: Stop prompting. Start specifying.
Omitted 13 additional developer tools items from the main section; see raw data and source-specific sections below.
Research Papers
π‘ Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction β score 65
Sources: huggingface
Agentic web search increasingly faces two distinct demands: deep reasoning over a single target, and structured aggregation across many entities and heterogeneous sources. Current systems struggle on both fronts. Breadth-oriented tasks demand schema-aligned outputs with wide coverage and cross-entit
π‘ Online Self-Calibration Against Hallucination in Vision-Language Models β score 55
Sources: huggingface Β· arxiv/cs.LG
Large Vision-Language Models (LVLMs) often suffer from hallucinations, generating descriptions that include visual details absent from the input image. Recent preference alignment methods typically rely on supervision distilled from stronger models such as GPT. However, this offline paradigm introdu
π‘ LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation β score 55
Sources: huggingface Β· arxiv/cs.CL
A speaker encoder used in multilingual voice cloning should treat the same speaker identically regardless of which script the audio was uttered in. Off-the-shelf encoders do not, and the failure is accent-conditional. On a 1043-pair Western-accented voice corpus across English, Hindi, Telugu, and Ta
π‘ Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization β score 40
Sources: huggingface
Distributed blackbox consensus optimization is a fundamental problem in multi-agent systems, where agents must improve a global objective using only local objective queries and limited neighbor communication. Existing methods largely rely on handcrafted update rules and static cooperation patterns,
π‘ AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval β score 40
Sources: huggingface
Analog circuit design relies heavily on reusing existing intellectual property (IP), yet searching across heterogeneous representations such as SPICE netlists, schematics, and functional descriptions remains challenging. Existing methods are largely limited to exact matching within a single modality
Omitted 1 additional research papers items from the main section; see raw data and source-specific sections below.
Other Signals
π‘ Anyone submit ML articles to ACM journals (eg. TOPML or TIST)? [D] β score 69
Sources: reddit/r/MachineLearning
Have any of you submitted ML articles to ACM journals (eg. TOPML or TIST)? How long did the process take, and were the reviews high-quality? How does it compare to other journals (eg. TMLR) in terms of difficulty? Thanks.
π’ Incremental
Model Releases
π’ Open source models are going to be the future on Cursor, OpenCode etc. β score 35
Sources: reddit/r/LocalLLaMA
I just wanted to share my experience. At work we have Cursor with the Enterprise tier. Today I burned 10$ with 2 prompts, one on gpt-5.5 and one on claude-opus-4.6-thinking. Last month I burned 80$ in one week with claude-opus-4.7 even with the 50% off they had with the launch. If they continue with
π’ Excellent discussion about LLM scaling [D] β score 25
Sources: reddit/r/MachineLearning
I came across an excellent in depth discussion of memory and compute scaling analysis for LLMs. One takeaway is that running LLMs locally or on private cloud is wasteful. Memory / compute scaling makes large batching during inference very efficient.
Highly recommend. [How GPT, Claude, and Gemini
Developer Tools
π’ njbrake/agent-of-empires β Manage multiple Claude Code, OpenCode agents from either TUI or Web for easy access on mobile. Also supports Mistral Vibe, Codex CLI, Gemini CLI, Pi.dev, Copilot CLI, Factory Droid Coding. Uses tmux and git worktrees. β score 36
Sources: github_trending
Manage multiple Claude Code, OpenCode agents from either TUI or Web for easy access on mobile. Also supports Mistral Vibe, Codex CLI, Gemini CLI, Pi.dev, Copilot CLI, Factory Droid Coding. Uses tmux and git worktrees.
π’ xingkongliang/skills-manager β A lightweight desktop app to manage, sync, and organize AI agent skills across 15+ coding tools β Cursor, Claude Code, Codex, Copilot, and more. β score 34
Sources: github_trending
A lightweight desktop app to manage, sync, and organize AI agent skills across 15+ coding tools β Cursor, Claude Code, Codex, Copilot, and more.
π’ nexu-io/nexu β The simplest desktop client for OpenClaw π¦ β bridge your Agent to WeChat, Feishu, Slack & Discord in one click. Works with Claude Code, Codex & any LLM. BYOK, Oauth, local-first, chat from your phone 24/7. β score 24
Sources: github_trending
The simplest desktop client for OpenClaw π¦ β bridge your Agent to WeChat, Feishu, Slack & Discord in one click. Works with Claude Code, Codex & any LLM. BYOK, Oauth, local-first, chat from your phone 24/7.
π’ should agentic systems have models specialized only for code? β score 21
Sources: reddit/r/AIAgents
Most current agents feel like they rely on one big general-purpose model for everything, planning, reasoning, and actually writing code. but coding is a different beast compared to normal text.
what if we had dedicated coding models inside the agent stack? one model trained only for code understand
π’ Xiaomi mimo coding plan is a absolute scam/misleading marketing β score 21
Sources: reddit/r/AIAgents
They say on their page it is 1.6 billion credit and mimo v2.5 pro takes 2 credit per token, mimo v2.5 takes 1 credit per token but here is how they get you, cached token is still billed the same credit per round trip, absolutely not suitable for coding cli then, because every single one of them by d
Omitted 3 additional developer tools items from the main section; see raw data and source-specific sections below.
Research Papers
π’ Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling β score 10
Sources: huggingface
Joint audio-video generation models have shown that unified generation yields stronger cross-modal coherence than cascaded approaches. However, existing models couple modalities throughout denoising via pervasive attention, treating high-level semantics and low-level details in a fully entangled man
Other Signals
π’ Mistral-Medium-3.5-128B-Q3_K_M on 3x3090 (72GB VRAM) β score 27
Sources: reddit/r/LocalLLaMA
Here is the actual speed of Mistral Medium Q3 running locally on 3x3090
first some Python
https://preview.redd.it/76a3j6u7o0zg1.png?width=1620&format=png&auto=w
π’ Built a LangChain middleware that enforces signed authorization receipts before every tool call. Here is why wrap_tool_call is the right enforcement point. β score 27
Sources: reddit/r/AIAgents
Been building a pre-execution authorization layer for Al agents. The core idea is that a signed delegation receipt needs to exist before any tool call executes. Not a policy. Not a system prompt. A cryptographic constraint the agent cannot reason around.
ror LangChain specifically wrap-
tool_ca
π’ UAI Reviews disappeared [D] β score 25
Sources: reddit/r/MachineLearning
Did everyone elseβs reviews disappear on their submissions?
π’ OpenAIβs o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors β score 25
Sources: hackernews
π’ Where should you invest in the coming 25 years? Check the screenshot β score 21
Sources: reddit/r/AIAgents
While I am building for the future at Layout.dev (Acquired by Incorta), I thought what areas should I invest as a development for my kids so that they would have good opportunities in their professional careers.
So I went to [https://layout.dev](h
Omitted 3 additional other signals items from the main section; see raw data and source-specific sections below.
π Trending Repos
| Repo | Description | Stars Today | Language |
|---|---|---|---|
| TauricResearch/TradingAgents | TradingAgents: Multi-Agents LLM Financial Trading Framework | 3313 | python |
| ruvnet/ruflo | π The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, self-learning swarm intelligence, RAG integration, and native Claude Code / Codex Integration | 1840 | typescript |
| 1jehuang/jcode | Coding Agent Harness | 591 | rust |
| AIDC-AI/Pixelle-Video | π AI ε ¨θͺε¨ηθ§ι’εΌζ | AI Fully Automated Short Video Engine | 497 | python |
| firecrawl/firecrawl | π₯ The API to search, scrape, and interact with the web for AI | 462 | typescript |
| virattt/dexter | An autonomous agent for deep financial research | 418 | typescript |
| Hmbown/DeepSeek-TUI | Coding agent for DeepSeek models that runs in your terminal | 343 | rust |
| czlonkowski/n8n-mcp | A MCP for Claude Desktop / Claude Code / Windsurf / Cursor to build n8n workflows for you | 282 | typescript |
| cocoindex-io/cocoindex | Incremental engine for long horizon agents π Star if you like it! | 163 | python |
| LearningCircuit/local-deep-research | Local Deep Research achieves ~95% on SimpleQA benchmark (tested with Qwen 3.6). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local & Encrypted. | 143 | python |
π New Papers
| Title | Category | Hotness | Link |
|---|---|---|---|
| UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors | research_paper | 61 | Open |
| Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies | research_paper | 7 | Open |
| From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills | research_paper | 6 | Open |
| Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction | research_paper | 3 | Open |
| Online Self-Calibration Against Hallucination in Vision-Language Models | research_paper | 2 | Open |
| LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation | research_paper | 2 | Open |
| TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data | cs.AI | 0 | Open |
| AgentReputation: A Decentralized Agentic AI Reputation Framework | cs.AI | 0 | Open |
| Minimal, Local, Causal Explanations for Jailbreak Success in Large Language Models | cs.AI | 0 | Open |
| Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents | cs.AI | 0 | Open |
| TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization | cs.AI | 0 | Open |
| ARMOR 2025: A Military-Aligned Benchmark for Evaluating Large Language Model Safety Beyond Civilian Contexts | cs.AI | 0 | Open |
| Causal Foundations of Collective Agency | cs.AI | 0 | Open |
| Agentic AI for Trip Planning Optimization Application | cs.AI | 0 | Open |
| Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference | cs.AI | 0 | Open |
π¦ Twitter/X Highlights
| Account | Tweet Summary |
|---|---|
| OpenAI | One week since the launch of GPT-5.5, and itβs already our strongest model launch yet. API revenue is growing more than 2x faster than any prior release, while Codex doubled revenue in under seven days as enterprise demand for agentic coding tools keeps climbing. Post |
| xai | Voice Cloning is now live via the xAI API! Create a custom voice in less than 2 minutes or select from our library of 80+ voices across 28 languages to personalize your voice agents, audiobooks, video game characters, and more. http://x.ai/news/grok-custom-voices Post |
| xai | Introducing Grok Voice Think Fast 1.0 A state-of-the-art voice model built for complex, multi-step workflows with snappy responses and high accuracy. It takes the top spot on the Tau Voice Bench and handles real-world messiness like noise, accents, and interruptions better than any other model in the world. https://x.ai/news/grok-voice-think-fast-1 Post |
| MistralAI | π Today, we're releasing the public preview of Workflows, the orchestration layer for enterprise AI. π Enterprise teams have capable models. What they don't have is a way to run them reliably in production. That's the gap Workflows fills. It takes AI-powered business processes from prototype to production, with the durability, observability, and fault tolerance that production actually requires. Leading organisations like ASML, ABANCA, CMA-CGM, France Travail, La Banque Postale, Moeve, and many others are already using Workflows to automate critical processes. Post |
| OpenAI | Bring your workflow to Codex in just a few clicks. Import settings, plugins, agents, project configuration, and more so you can keep working with fewer interruptions. Your move. Post |
| MistralAI | Mistral AI made the TIME100 Most Influential Companies list for 2026 β and the top 10 for AI. Why we're proud: customers run frontier models in production on their own terms, on their own infrastructure. Thank you to our customers for their trust and for joining us on the journey. Grateful to our incredible team members around the world and congrats to all the businesses recognized this year. Learn more at: https://time.com/collection/time100-most-influential-companies/2026/mistral/ #TIME100Companies #TIME100CompaniesIndustryLeader Post |
| karpathy | Fireside chat at Sequoia Ascent 2026 from a ~week ago. Some highlights: The first theme I tried to push on is that LLMs are about a lot more than just speeding up what existed before (e.g. coding). Three examples of new horizons: 1. menugen: an app that can be fully engulfed by LLMs, with no classical code needed: input an image, output an image and an LLM can natively do the thing. 2. install .md skills instead of install .sh scripts. Why create a complex Software 1.0 bash script for e.g. installing a piece of software if you can write the installation out in words and say "just show this to your LLM". The LLM is an advanced interpreter of English and can intelligently target installation to your setup, debug everything inline, etc. 3. LLM knowledge bases as an example of something that was impossible with classical code because it's computation over unstructured data (knowledge) from arbitrary sources and in arbitrary formats, including simply text articles etc. I pushed on these because in every new paradigm change, the obvious things are always in the realm of speeding up or somehow improving what existed, but here we have examples of functionality that either suddenly perhaps shouldn't even exist (1,2), or was fundamentally not possible before (3). The second (ongoing) theme is trying to explain the pattern of jaggedness in LLMs. How it can be true that a single artifact will simultaneously 1) coherently refactor a 100,000-line code base and 2) tell you to walk to the car wash to wash your car. I previously wrote about the source of this as having to do with verifiability of a domain, here I expand on this as having to also do with economics because revenue/TAM dictates what the frontier labs choose to package into training data distributions during RL. You're either in the data distribution (on the rails of the RL circuits) and flying or you're off-roading in the jungle with a machete, in relative terms. Still not 100% satisfied with this, but it's an ongoing struggle to build an accurate model of LLM capabilities if you wish to practically take advantage of their power while avoiding their pitfalls, which brings me to... Last theme is the agent-native economy. The decomposition of products and services into sensors, actuators and logic (split up across all of 1.0/2.0/3.0 computing paradigms), how we can make information maximally legible to LLMs, some words on the quickly emerging agentic engineering and its skill set, related hiring practices, etc., possibly even hints/dreams of fully neural computing handling the vast majority of computation with some help from (classical) CPU coprocessors. Post |
| simonw | I released LLM 0.32a0 this morning, a major backwards-compatible refactor of my LLM Python library and CLI tool for working with language models - the new changes should help LLM work better with reasoning models and other new frontier capabilities https://simonwillison.net/2026/Apr/29/llm/ Post |