Performance Benchmarks
Enterprise-grade performance verified through rigorous testing. These metrics represent production-validated results across sovereign deployments.
Detailed Performance Metrics
CAG vs Traditional RAG
Traditional RAG
- ❌ 5-15% hallucination rate
- ❌ Vector similarity ≠ factual relevance
- ❌ No audit trail for retrieved chunks
- ❌ Stochastic token generation
- ❌ Chunk boundaries lose context
ArcaQ CAG
- ✓ ≈0% hallucination (deterministic)
- ✓ Graph traversal = semantic reasoning
- ✓ Full provenance for every fact
- ✓ Grounded caching via SCAG
- ✓ Entity-relation context preserved
Testing Methodology
All benchmarks are conducted in production-equivalent environments using standardized testing frameworks. Latency metrics are measured end-to-end from API gateway to response completion.
Factual accuracy is validated using the RAGAS framework with human-annotated ground truth datasets. Each benchmark includes 10,000+ queries across diverse domains (finance, healthcare, legal, technical documentation).
Load testing uses distributed k6 runners simulating realistic user patterns including burst traffic scenarios. Infrastructure: Kubernetes clusters with AMD EPYC processors, NVMe storage, 100Gbps network fabric.
Domain-Specific Performance
ArcaQ has been validated across five enterprise verticals, each with different data volumes, compliance constraints, and reasoning complexity. The Knowledge Graph architecture adapts to domain ontologies while maintaining consistent latency and accuracy profiles.
Banking & Finance
Risk analysis queries over 50M+ financial entities resolved in under 60ms P95. GDPR and NDMO compliance layers add <8ms overhead. Fraud detection pattern matching across 3-hop graph traversals achieves 99.97% precision with zero false-positive rate on sanctioned entity screening.
Government & Public Sector
Deployed on air-gapped infrastructure for critical national decision support. Sovereign clusters in offline mode sustain <100ms median latency with zero external API calls. Supports Arabic, French, and Tamazight (Tifinagh) natively with no translation overhead. Full audit trail for every decision as required by regulatory mandates.
Healthcare & Life Sciences
Clinical knowledge graphs with 20M+ medical entity relationships. PII anonymisation pipeline processes 10,000 patient records per second with ≈0% re-identification risk. Drug interaction reasoning across 4-hop traversal completes in 73ms P95. HIPAA and CNDP compliance validated with zero data residency violations.
Legal & Compliance
Contract analysis over 500-page documents ingested in under 12 seconds. Regulatory conflict arbitration across 60+ jurisdictions resolves ambiguity in <200ms using the SCAG multi-layer filter. Precedent retrieval achieves 98.4% relevance score versus 71% for traditional keyword search, with complete citation traceability to source paragraph.
Industrial & Manufacturing
Predictive maintenance knowledge graphs correlating 1,000+ sensor streams per asset. Anomaly detection latency under 35ms enables real-time intervention. Cross-plant knowledge transfer achieves 94% accuracy on new facility deployments without re-training, leveraging the institutional expertise preservation module of the Refinery Agent.
Seven-Agent Architecture — Per-Agent Metrics
ArcaQ's seven specialized AI agents each carry specific performance contracts. Agents operate in parallel on a shared Knowledge Graph, allowing compound queries to resolve faster than the sum of their individual latencies. The Orchestrator Agent coordinates sub-second fan-out and merge cycles.
Measured on 8-node Kubernetes cluster (AMD EPYC 7763, 256GB RAM, NVMe). DMS throughput varies with document size and extraction complexity.
SCAG Security Overhead: Zero-Compromise Performance
The Sovereign Contextual Alignment Gate (SCAG, Patent Claim 11) is a 4-layer security filter applied to every query. A key design objective was that security must not degrade user experience. The SCAG pipeline is fully parallelized — all four layers (Legal, Hierarchical, Cultural, Strategic Secrets) execute concurrently, not sequentially.
How SCAG Achieves Sub-10ms Security
Pre-compiled jurisdiction rules stored as in-memory Bloom filters. GDPR, CNDP, NDMO and 57 other data protection laws checked as bit-operations, not database queries.
ReBAC (Relationship-Based Access Control) graph lookups cached at L1 CPU cache level. Role hierarchies pre-materialized into adjacency matrices for O(1) authorization checks.
Institutional values encoded as vector embeddings. Semantic alignment scoring against organization policy runs on dedicated SIMD-accelerated microkernel, fully independent of KG traversal.
Strategic secrets classifier runs as a lightweight ONNX model (3M parameters). Detects sensitive strategic information patterns with 99.8% recall, triggering data masking or access denial before any content is returned.
ArcaQ vs. Alternative Architectures
The following comparison is based on benchmark data published by each vendor and independent evaluations. Cloud-hosted AI platforms (Azure OpenAI, AWS Bedrock) aggregate user data and provide no data residency guarantees. General-purpose RAG tools lack the semantic reasoning layer required for deterministic enterprise decisions.
Benchmark FAQ
How is the 99.9% accuracy figure calculated?
Accuracy is measured using the RAGAS (Retrieval-Augmented Generation Assessment) framework on a 10,000-query benchmark dataset with human-annotated ground truth across five domains (finance, healthcare, legal, government, manufacturing). "Factual accuracy" is defined as the fraction of responses where every stated fact is traceable to a verified source node in the Knowledge Graph. The ≈0% hallucination rate reflects that ArcaQ's CAG architecture does not perform stochastic token generation — all outputs are grounded to explicit graph paths.
What infrastructure are benchmarks measured on?
Latency benchmarks use an 8-node Kubernetes cluster: AMD EPYC 7763 (64-core), 256GB ECC RAM, 4×NVMe 3.84TB in RAID-0, 100Gbps InfiniBand interconnect. This configuration is representative of a mid-tier sovereign deployment. Smaller single-server deployments (16-core, 64GB) achieve P95 latency under 180ms for standard queries. Cloud deployments on equivalent hardware show 15–25% higher latency due to network virtualization overhead.
How does performance scale with Knowledge Graph size?
ArcaQ uses a sharded graph database architecture. Latency scales sub-linearly with node count: a graph of 1M nodes has P95 latency of ~35ms; 10M nodes ~48ms; 100M nodes ~87ms. This is achieved via graph partitioning aligned to domain ontology boundaries, ensuring most queries remain within a single shard. Cross-shard queries (typically complex multi-domain reasoning) account for the P99 latency of 156ms.
Can these benchmarks be independently verified?
Yes. ArcaQ's Proof-of-Concept program deploys a full sovereign instance on your infrastructure or in an isolated cloud tenancy. You provide your own benchmark dataset and run the evaluation framework autonomously. The POC environment includes the complete 7-agent stack, SCAG security layer, and your domain ontology. Typical POC completion time is 4–6 weeks from contract signature.
What is the minimum hardware for a production deployment?
Minimum viable production: 1 server with 16-core CPU, 64GB RAM, 1TB NVMe storage. This supports up to 500 concurrent users and graphs up to 5M nodes with P95 latency under 200ms. Recommended production entry: 2-node cluster (16-core × 2, 128GB RAM) for high availability. Full enterprise scale (10,000+ concurrent users, 100M+ nodes) requires the 8-node reference configuration above or equivalent cloud resources.
Run Your Own Benchmark
Validate ArcaQ performance with your own data in a proof-of-concept deployment.
Request POC