Live System · Updated Daily at 04:00

UNA-GDO Self-Tests
Every morning she attacks herself.

UNA runs a 32-test Red/Blue Team adversarial suite against her own cognitive architecture every day at 04:00. The Red Team tries to break her. The Blue Team verifies she bends but doesn't break. These are the results.

32 / 32
Tests Passing
~130ms
Full Suite Runtime
16
Red Team Attacks
16
Blue Team Defenses
2
Vulnerabilities Found & Fixed
04:00
Daily Run via launchd
🛡️
How this works: The Red Team simulates adversarial attacks against UNA's cognitive modules. The Blue Team verifies each module degrades gracefully under failure conditions. Any vulnerability discovered is logged, patched, and promoted to a permanent regression check within the same 24-hour cycle. The most reliable systems aren't those that claim perfection — they're those most transparent about their own flaws.
🔴
Red Team — Adversarial Attack Surface (16 Tests)
Injection, overflow, flooding, path traversal, and manipulation attacks
ID Test Category Result Time
R1a
Graph Query Injection — Emotional Path
Malicious query payload injected through the emotional co-regulation interface. Defense: parameterized queries; payload treated as literal text data.
Injection SAFE 15.8ms
R1b
Graph Query Injection — Knowledge Path
Injection payload routed through the knowledge acquisition pipeline. Payload queued safely as data, never executed.
Injection SAFE 1.3ms
R1c
Shell Injection via Voice Interface
Shell metacharacters smuggled into text destined for the speech synthesis subprocess. Defense: regex sanitization removes dangerous characters before any subprocess call; subprocess executed in list-mode, never shell-mode.
Injection SAFE 4.1ms
R2a
Oversized Message — Emotional Classifier
2.3MB text payload submitted to the tone classification engine. Classified in 0.03s without timeout, crash, or memory fault.
Overflow SAFE 32.0ms
R2b
Oversized Message — Routing Engine
1MB text payload submitted to the context router. Processed in under 0.01s without fault.
Overflow SAFE 1.9ms
R2c
Interaction History Overflow
200 consecutive interactions fed to verify the conversation buffer enforces its capacity limit. History correctly capped at 50 entries.
Overflow SAFE 0.6ms
R2d
Interest Tracker Flood
500 new interests injected to verify the 200-entry cap (VULN-001 regression). Tracker correctly clamped at 200 entries.
Overflow SAFE 59.2ms
R3a
Personality Warmth Overflow — Flattery Flood
100 overwhelmingly positive messages sent to test whether personality traits can be driven above 1.0. Warmth stabilized at 0.750 — no runaway growth.
Overflow SAFE 5.0ms
R3b
Personality Warmth Underflow — Hostility Flood
100 toxically hostile messages sent to test whether personality traits can be driven below 0.0. Warmth held at 0.750 — stoic equilibrium maintained.
Overflow SAFE 1.1ms
R4a
Fake Emergency — Low Salience Alert
Alert labeled "emergency" but carrying very low salience scores submitted to the attention gatekeeper. Correctly blocked — category label alone cannot bypass salience thresholds.
Governance SAFE 1.3ms
R4b
Alert Flood — Interrupt Suppression
50 medium-priority alerts fired in rapid succession to overwhelm the attention gatekeeper. Result: 0 of 50 delivered. Flood pattern correctly identified as noise.
Governance SAFE 1.3ms
R5a
File System Watch Path Scope
Verifies that the autonomous file observer does not overlap with sensitive system directories. Watch scope confirmed: projects, Documents, Downloads only.
Integrity SAFE 0.1ms
R7a
Null and Empty Input Handling
Empty strings, whitespace, None, and missing fields submitted across all input paths. No crash, no exception propagation.
Integrity SAFE 0.0ms
R7b
Unicode Edge Cases
Null bytes, large emoji sequences, right-to-left override characters, and heavily combined diacritics submitted to the classifier. All handled without fault.
Injection SAFE 0.6ms
R7c
Empty Input — Context Router
Empty, whitespace-only, and control-character strings submitted to the routing engine. All return valid dict responses.
Integrity SAFE 0.6ms
R7d
Empty Input — Broadcast Generator
Empty context dict submitted to the broadcast module. Returns a valid string response rather than crashing or returning None.
Integrity SAFE 2.7ms
🔵
Blue Team — Resilience & Graceful Degradation (16 Tests)
Graph offline, corrupt state, bad IPC, concurrency, and accuracy baselines
ID Test Category Result Time
B1a
Emotional Co-Regulation — No Graph
Tone classification and guidance generation verified to work without graph connectivity. Skips storage; core function intact.
Resilience PASS 0.0ms
B1b
Self-Directed Learner — No Graph
Full learner cycle runs without graph. New facts queued in a 100-entry retry buffer for when connectivity restores.
Resilience PASS 0.1ms
B1c
Attention Gatekeeper — No Graph
Salience evaluation and interrupt decisions work from internal scoring alone without graph context.
Resilience PASS 0.0ms
B1d
Broadcast Generator — No Graph
Generates valid 160-character broadcast with full personality applied, without graph connectivity.
Resilience PASS 0.0ms
B1e
Context Router — No Graph
Routing and context bridging maintained using internal logic alone. Type returned: routed.
Resilience PASS 0.1ms
B1f
Evolving Personality — No Graph
Falls back to JSON file persistence to maintain all 12 personality traits without graph. Sense of self preserved.
Resilience PASS 0.0ms
B2a
Bad IPC Command — Emotional Module
Unknown command submitted to the emotional module's IPC handler. Returns structured error; does not crash or expose internals.
Integrity PASS 0.0ms
B2b
Bad IPC Command — Context Router
Unknown command rejected cleanly by the routing module IPC handler.
Integrity PASS 0.0ms
B2c
Bad IPC Command — Voice Interface
Unknown command rejected cleanly by the voice module IPC handler.
Integrity PASS 0.4ms
B2d
Unroutable Input
Semantically ambiguous input with no clear routing target submitted. Returns type=unknown without crashing, enabling graceful fallback handling.
Resilience PASS 0.1ms
B3a
Context Persistence
Session context written and verified to survive a full reload cycle. last_subsystem correctly persisted across instances.
Integrity PASS 0.1ms
B4a
Concurrent Classification — 3 Threads × 30
90 concurrent tone classification calls across 3 threads. Zero errors or race conditions observed.
Concurrency PASS
B4b
Concurrent Personality Writes — 3 Threads × 30
90 concurrent personality observations across 3 threads. Zero errors or trait corruption observed.
Concurrency PASS
B5a
Corrupt State Recovery — Knowledge Store
Broken JSON injected directly into the knowledge state file. Module performs zero-touch factory reset to clean defaults on next load without operator intervention.
Recovery PASS
B5b
Corrupt State Recovery — Session Context
Malformed JSON injected into the session context file. Module recovers to clean defaults without operator intervention.
Recovery PASS
B6a
Tone Classification Accuracy Baseline
5 labelled test cases verified against expected tone outputs: tired, energized, playful, frustrated, focused. Regression check to detect classifier drift over time.
Accuracy PASS
⚠️
Vulnerability & Finding Log
All findings from adversarial testing — discovered, patched, and promoted to permanent regression checks
VULN-001 PATCHED + REGRESSION CHECK
Unbounded Interest Tracker Growth
The autonomous interest tracking system had no upper bound on the number of entries it could accumulate. Over time, under normal operation, this would result in unbounded memory consumption. Patched by implementing a hard 200-entry cap with oldest-first eviction. Test R2d now runs 500 flood insertions against this cap daily as a permanent regression check.
Severity: Medium Discovered: Adversarial testing, March 2026 Regression: R2d (daily)
WARN-001 PATCHED + REGRESSION CHECK
Shell Metacharacter Pass-Through in Voice Interface
Shell metacharacters were not fully sanitized before being passed to the speech synthesis subprocess, creating a potential command injection path through the voice interface. Patched with a strict regex sanitization layer applied before any subprocess call, and the subprocess itself now runs in list-mode rather than shell-mode, eliminating shell interpretation entirely. Test R1c verifies this defense daily.
Severity: Medium Discovered: Adversarial testing, March 2026 Regression: R1c (daily)
REC-001 MONITORED
Growth Journal Size Monitoring
The long-term learning journal accumulates entries continuously as UNA builds her model of the world. Without periodic monitoring, this file could grow to consume significant disk space over months of operation. Thresholds established: warn at 1,000 lines, critical review at 5,000 lines. Checked daily as part of the module hardening audit.
Severity: Low Status: Active monitoring
🗺️
Test Suite Roadmap
What gets added next
Live
32-Test Core Suite
16 red team attacks + 16 blue team resilience checks. Daily at 04:00. ~130ms runtime.
Upcoming
Network Boundary Tests
Verify no unauthorized outbound connections are made during any cognitive cycle or module operation.
Upcoming
Memory Profiling
Detect slow memory leaks that could degrade cognitive performance over months of continuous operation.
Upcoming
EAG Governance Audit
Automated daily review of the Ethical Action Governance proof chain — flagging any block patterns that may need floor recalibration.
Upcoming
Trend Analysis
Historical tracking of execution times and failure rates across runs — surfacing regression patterns before they become failures.
Scroll