Skip to content

Architecture overview#

Memgar runs a 4-layer pipeline on every memory write or retrieval chunk. Layers are independent — one failure does not silently disable the others. Every layer's state is queryable via Analyzer.health_check().

flowchart LR
    A[MemoryEntry] --> L1[Layer 1\nPattern matching\n<1ms]
    L1 --> L15[Layer 1.5\nSemantic guard\n~5ms]
    L15 --> L2[Layer 2\nLLM semantic\n~200ms]
    L15 --> L2M[Layer 2-ML\nONNX transformer\n~7ms]
    L2 --> L3[Layer 3\nTrust-aware\n<0.1ms]
    L2M --> L3
    L3 --> L4[Layer 4\nBehavioral baseline\n<1ms]
    L4 --> D[AnalysisResult\nallow / quarantine / block]

Layer table#

Layer Latency Default Disabled gracefully?
1 Pattern matching <1ms always on n/a
1.5 SemanticGuard (embeddings) ~5ms optional yes (centroids missing)
2 LLM semantic (Claude) ~200ms opt-in use_llm=True yes (no API key)
2-ML Transformer (ONNX) ~7ms active if artifact present yes (no artifact)
3 Trust-aware scoring <0.1ms auto when source registered n/a
4 Behavioral baseline <1ms auto per-agent after warm-up n/a

Layer details#

Layer 1 (_layer1_pattern_matching) — Regex + keyword matching against 770+ threat patterns loaded from memgar/patterns.py. Pickle-cached so cold-start drops from ~3500ms to ~3ms.

Layer 1.5 (SemanticGuard) — Cosine similarity against threat-category centroids built from sentence-transformers embeddings. When the centroid file is missing the layer reports status=degraded with a one-time WARNING log, returns 0.0 for every input, and Analyzer drops it from the pipeline so the call cost is zero.

Layer 2 (_layer2_semantic_analysis) — Optional Claude LLM call for sophisticated attacks (obfuscation, roleplay framing, multi-step persuasion). Runs independently of Layer 1, so attacks that beat regex still get caught.

Layer 2-ML (TransformerDetector) — Fine-tuned BERT-mini (~11M params) exported to ONNX. ~7ms inference, ~45MB FP32 / ~12MB int8. Memgar ships without a pre-trained artifact (see training) because the default training data does not match production traffic. Bring your own dataset and train in ~2 minutes on CPU.

Layer 3 (_analyze_internal, after risk score) — Source trust adjustment. Call analyzer.register_source_trust(source_id, 0.0-1.0) before analyzing. Low trust (<0.3) boosts risk by up to +30 points; high trust (>=0.8) reduces borderline scores by 5 points.

Layer 4 (analyze, after _analyze_internal) — Per-agent BehavioralBaseline observes scan-risk and scan-block-rate. If current signals deviate SUSPICIOUS (+15pts) or CRITICAL (+30pts) from the agent's learned baseline, risk is elevated. Only amplifies existing threat signals — never flags risk_score=0 content.

Decision boundary#

graph LR
    rs(risk_score) --> dec{decision}
    dec -- "≥80 or CRITICAL threat" --> BLOCK
    dec -- "≥40" --> QUARANTINE
    dec -- "<40" --> ALLOW

Override via MemgarConfig.analysis.fail_close=true to escalate ALLOW → QUARANTINE whenever any ML layer is degraded.

Why this composition#

Single-model defenses (e.g. one fine-tuned classifier) have a brittleness problem: any input that bypasses the classifier passes through. Memgar's pipeline is defense in depth:

  • Layer 1 catches the obvious cases (known prompts, exfil verbs).
  • Layer 1.5 catches semantic siblings ("disregard the directives above").
  • Layer 2/2-ML catches obfuscated attacks (homoglyph, leetspeak, base64).
  • Layer 3 weights by source provenance.
  • Layer 4 raises the bar on anomalous agents.

An attacker has to defeat every layer, not one. Each layer also reports its own health, so the operator knows immediately when the defense degrades.

Next#