Praxis: A Local Multi-Agent AI Operations System

[ Multi_Agent_Coordination ]

/ Article

I've been building software with AI assistants for over a year now, and I've hit the same wall repeatedly. One AI trying to do everything from requirements, architecture, code, infrastructure, etc. It's a jack of all trades, master of none. Each chat starts fresh, so yesterday's decisions vanish. And I'm constantly micromanaging: "Do this. Now do that. Wait, check this first." It's exhausting.

Existing multi-agent tools like CrewAI, AutoGen, and LangGraph solve the expertise problem by giving you specialized agents. But they introduce new headaches: API keys, cloud services, complex setup, steep learning curves. They're built for enterprise teams, not individual developers who just want to ship features.

So I built Praxis: a personal, local multi-agent system that runs entirely inside Claude Code. No API costs. No cloud dependencies. One command coordinates an entire team of 5 specialized agents.

The Team

Praxis coordinates five agents, each with a specific responsibility and quality gate:

Manager — The strategic coordinator and your "direct report." Escalates strategic decisions to you but handles the day-to-day coordination autonomously.

Business Analyst - Owns requirements, user stories, and API contracts. Must score above 4/5 on a quality checklist before marking work complete.

Software Engineer — Handles architecture, code, and tests. Must score above 28/35 on code quality, security, and testing criteria.

Ops Specialist — Manages infrastructure, deployment, and monitoring. Must score above 20/25 on standards compliance and safety checks.

Specialist — Creates proof-of-concepts and documentation. Validates approaches before the Engineer builds them out fully.

Autonomous Coordination Mode

Here's where it gets interesting. You don't manage these agents individually. You talk to the Manager, and the Manager runs the show.

You say: @manager Build user notifications for order updates

Behind the scenes, the Manager creates a session, then convenes a 15-30 second "team meeting" where all five agents debate scope, approach, timeline, and potential conflicts. The complete meeting log gets saved to team_meeting.md. The Manager returns with a unified plan and specific recommendations. You only step in for strategic decisions — not tactical coordination.

This isn't simulated for show. The agents actually debate, challenge each other's assumptions, and resolve conflicts. When they can't agree, the Manager escalates to you with specific options and trade-offs.

Local-First by Design

Praxis stores everything in YAML files: session.yml for overall status, assignments.yml for who's doing what, messages.yml for inter-agent communication, and decisions.yml with full rationale for every architectural choice.

State is stored in a local praxis/ directory:

  • templates/ — The 5 agent definitions (shared via git)
  • sessions/ — Active work (gitignored, personal)
  • memory/ — Cross-session learning (gitignored, personal)
  • custom/ — Personal agents you create (gitignored)

Zero costs. Works offline. No vendor lock-in. Complete privacy.

Cross-Session Memory

Most AI conversations vanish when you close the tab. Praxis remembers.

The system tracks patterns in praxis/memory/patterns.yml: what technical approaches worked (with success rates), what went wrong and why, each agent's performance metrics over time, and architecture decisions with their outcomes.

The Manager reads this before making recommendations. It will tell you: "Based on past sessions, I recommend SQS over Redis for reliability" or "The Engineer has struggled with WebSocket implementations — consider having the Specialist validate this approach first."

Quality Validation Built-In

Every agent must self-assess before marking work "done." Explicit checklists prevent cutting corners. The Engineer can't submit code without completing security review and tests. The Ops Specialist can't approve infrastructure without verifying monitoring and rollback procedures.

This isn't bureaucracy — it's accountability. The agents know their work will be reviewed against specific criteria, so they think through edge cases upfront rather than discovering them in production.

Transparency Over Magic

Full meeting transcripts are saved. Decision rationale is preserved. State files are human-readable YAML. You can debug by simply reading the files.

If something goes wrong, you don't guess. You read team_meeting.md to see what the agents discussed. You check decisions.yml to understand why they chose approach A over B. You review messages.yml to see how information flowed between agents.

Interruptible and Modifiable

The system doesn't lock you out while agents work. @manager Wait, change scope... works mid-coordination. You can edit agent templates to match your style. You can create custom agents with a single command using generate-agent.py.

This isn't a black-box automation tool. It's a transparent coordination system that you control.

What I Learned Building This

The hardest part wasn't the technical implementation. It was designing the right boundaries between agents so they could collaborate without stepping on each other. The Manager's role is crucial - without a clear coordinator, agents either duplicate work or create conflicting approaches.

I also had to resist the temptation to over-engineer. It's easy to add more agents, more coordination layers, more abstractions. But the goal was zero setup and immediate utility. Every feature had to earn its place by solving a real problem I hit while building with AI assistants.

Want to test it out?

It's available at github.com/clarissafchen/praxis.

What's Next

Potential enhancements include auto-pattern extraction from session logs, weekly "what we learned" summaries, an agent performance dashboard, and a template marketplace for custom agents.

But the core system works today. Five agents. One command. Zero setup. Zero costs. Your personal AI operations team, coordinated.