ECO-AIM-AI-002 — No inference batching¶

Category: AI/ML (AIM)
Family: AI (AI)
Layer: AI
Tier: 2
Severity: warning
Tags: ai, batching
Legacy ID: ECO-AI-002

Summary¶

No batching increases per-request overhead and lowers throughput.

Batching improves throughput and reduces overhead per token/request.

{
  "confidence": 0.7,
  "notes": "Often significant for GPU inference.",
  "type": "cpu"
}

{
  "languages": [
    "python",
    "infra"
  ],
  "method": "trace"
}

{
  "guidance": "Introduce batching at gateway/inference server.",
  "tradeoffs": "Added latency for low volume; needs tuning."
}

{
  "system_layers": [
    "ai"
  ]
}