Paper 011
March 9, 2026
AI
Research
New
How Robots Are Learning to Remember Like Us
A new study introduces RoboMME, a benchmark that tests whether robots can remember and apply knowledge across everyday tasks. Most robots today forget everything between jobs โ this research measures how close we are to fixing that.
Paper 008
March 6, 2026
Policy
AI Safety
National Security
The Pentagon Just Labeled Anthropic a "Supply-Chain Risk"
The U.S. Department of Defense formally designated Anthropic a supply-chain risk after the company refused to allow fully autonomous lethal weapons and mass surveillance. This is the first time a major American AI company has been hit with this label.
Paper 007
March 5, 2026
Agents
Research
Safety
Coding Agents Just Got Much More Trustworthy
A new semi-formal reasoning method from Meta pushes patch-equivalence accuracy from 78% to 93% โ without ever executing a line of code. No new model needed: just a structured checklist prompt you can drop in today.
Paper 006
March 3, 2026
AI
Cognition
Opinion
Why Today's AI Models Are Shockingly Good at Doing Exactly What Humans Do When They Don't Remember
They call it "hallucination." The real word is confabulation โ the same unconscious gap-filling humans have done forever. These models aren't broken. They're doing exactly what we do.
Read full analysis โ
Paper 005
March 3, 2026
Agents
Open Source
Privacy
Khoj Just Gave You a True Self-Hosted AI Second Brain
Khoj (khoj-ai/khoj) turns any local or cloud LLM into a persistent, private AI companion that indexes your entire life, builds custom agents, schedules real tasks, and runs deep research โ all on your machine.
Paper 004
March 2, 2026
Agents
112 AI Agents Just Turned Claude Code Into a Full Dev Team
wshobson/agents ships 112 specialized agents, 72 plugins, and 146 modular skills โ turning Claude Code into a composable AI development team that only loads what you need. ~1,000 tokens per plugin.
Paper 003
March 1, 2026
Agents
The First Open-Source AI That Actually Remembers You and Gets Smarter Every Day
Nous Research releases Hermes-Agent โ a fully open-source, self-hosted personal AI agent with persistent memory, autonomous skill-building, and multi-platform support. It never forgets who you are.
Paper 002
February 28, 2026
Agents
Why AI Route Planners Still Get Your Preferences Wrong โ And the New Benchmark That Proves It
Amap's MobilityBench is the first large-scale benchmark built from 100,000 real navigation queries across 22 countries. It reveals exactly where today's best AI agents break down on personalized route planning.
Paper 001
February 27, 2026
Inference
DeepSeek Just Solved the #1 Hidden Bottleneck Killing AI Agents
Your AI agent is slow and expensive โ and it's not the model's fault. DeepSeek's DualPath paper quietly fixes the storage bandwidth bottleneck that's been quietly capping every agent deployment. Here's what changed and why it matters for your stack.