Skip to content

ECO-AIM-AI-002 — No inference batching

  • Category: AI/ML (AIM)
  • Family: AI (AI)
  • Layer: AI
  • Tier: 2
  • Severity: warning
  • Tags: ai, batching
  • Legacy ID: ECO-AI-002

Summary

No batching increases per-request overhead and lowers throughput.

Rationale

Batching improves throughput and reduces overhead per token/request.

Impact

{
  "confidence": 0.7,
  "notes": "Often significant for GPU inference.",
  "type": "cpu"
}

Detection

{
  "languages": [
    "python",
    "infra"
  ],
  "method": "trace"
}

Remediation

{
  "guidance": "Introduce batching at gateway/inference server.",
  "tradeoffs": "Added latency for low volume; needs tuning."
}

Ontology

{
  "system_layers": [
    "ai"
  ]
}