#AI runtime
nyx-agent-ai is the crate that turns a Prompt or AgentTask into
model output. It owns one trait (AiRuntime), four shipped adapters
(Anthropic Messages, local OpenAI-compatible, Claude Code CLI, Codex
CLI), one host port
(BudgetTracker), and the six task implementations that build
typed structured prompts on top of the trait. Everything else in
the agent (the run dispatcher, the AI pipeline binary glue, the
trace viewer) sees only the trait, not the vendor SDK.
The crate stays vendor-neutral. Adapters depend on nyx-agent-types
for the wire envelope and on the BudgetTracker port for spend
accounting; nothing else. The binary at
crates/nyx-agent/src/main.rs:1793 is the only place that picks a
concrete adapter from [ai] runtime, wires it to the
BudgetStore-backed tracker, and hands it to the dispatcher.
#The AiRuntime trait
trait AiRuntime: Send + Sync {
fn name(&self) -> &'static str;
fn default_model(&self) -> &str;
fn supports_agent_loop(&self) -> bool;
fn supports_prompt_cache(&self) -> bool;
fn supports_deterministic_sampling(&self) -> bool;
async fn one_shot(&self, prompt, budget, sink) -> Result<Response, AiError>;
async fn agent_loop(&self, task, budget, sink) -> Result<AgentResult, AiError>;
fn cost_estimate(&self, prompt) -> Option<CostEstimate>;
}
Defined at crates/nyx-agent-ai/src/runtime.rs:18. Adapters that
implement only one of the two execution modes return
AiError::UnsupportedMode("agent_loop") (or "one_shot") from the
mode they do not support; the binary checks supports_agent_loop
before dispatching exploration work, so the unsupported-mode error
is a defence-in-depth, not the primary gate.
| Method | Behaviour |
|---|---|
name |
Stable adapter name persisted in trace rows ("anthropic", "local-llm", "claude-code", "codex"). |
default_model |
Used when Prompt.model is None. |
supports_agent_loop |
false for one-shot-only adapters; the dispatcher uses this to route. |
supports_prompt_cache |
Affects request body shape (system block array with cache_control). |
supports_deterministic_sampling |
false today on both shipped adapters: temperature: 0 is the only knob. |
one_shot |
Single round trip. Streams AiEvent::TokenReceived plus cache + budget ticks. |
agent_loop |
Multi-turn tool-use loop. Streams ToolCallStarted / Finished plus tokens. |
cost_estimate |
Pre-call min/max bound in USD micros. Optional; None for adapters that can't price ahead. |
The trait is intentionally minimal. Anything richer (per-attempt
retry, prompt-version tracking, structured-output validation) lives
in tasks/, not in adapter implementations.
The live candidate verification path intentionally uses a
planner/reviewer split. AttackPlanning and LiveTestPlan propose
safe executable probes, while LiveEvidenceReview receives the
candidate, proposed plan, collected live evidence, and oracle result
after deterministic verification. The deterministic oracle remains the
hard gate; the reviewer can only downgrade or block weak confirmations
before verified_vulnerabilities rows are created.
#Wire envelope: Prompt, Response, AgentTask, AgentResult
Defined in crates/nyx-agent-types/src/agent.rs. Both adapters
consume the same shape so the binary never depends on a vendor
schema.
#Prompt
| Field | Type | Notes |
|---|---|---|
prompt_version |
String |
Stable slug of the prompt template; persisted on every trace row. |
task_id |
String |
Echoed on every emitted AiEvent for fan-out. |
model |
Option<String> |
None falls back to default_model(). |
system |
String |
System prompt. Adapters with prompt caching wrap it in a cache block. |
user |
String |
User message body. |
max_output_tokens |
u32 |
Clamped to vendor limits inside the adapter. |
temperature |
f32 |
0.0 for deterministic decoding. |
seed |
Option<u64> |
Honoured only by adapters that report supports_deterministic_sampling = true. |
#Response
| Field | Type | Notes |
|---|---|---|
prompt_version |
String |
Echoes the request. |
task_id |
String |
Echoes the request. |
model |
String |
Vendor-reported model id (may differ from the requested alias). |
content |
String |
Final completion text. |
usage |
TokenUsage |
Input / output token counts. |
cache |
Option<CacheStats> |
Set when the adapter reports cache deltas. |
cost_usd_micros |
i64 |
Adapter-computed total in micros (1e-6 USD). |
#AgentTask and AgentResult
AgentTask carries tools: Vec<String> and max_turns: u32
alongside the prompt fields; the Claude Code adapter renders this
into the markdown agent brief it pipes on stdin. AgentResult
carries turns, extracted: Vec<ExtractedAgentResult>, plus the
same token / cache / cost accounting as Response.
ExtractedAgentResult is the typed lift the adapter performs over
the tool-use trace. Recognised tool names are classified at
agent.rs:261 (classify_tool_use):
| Tool name | Variant |
|---|---|
record_payload |
PayloadFound { rule_id, body } |
record_spec |
SpecFound { capability, spec } |
record_chains |
ChainsRanked { chain_ids, rationale } |
record_exploration_finding |
ExplorationFinding { ... } |
| anything else | ExplorationEvent { message } |
The task crates consume the typed variants; the binary never re-parses the raw transcript.
#Adapters
The official provider paths are BYOK/direct API or local endpoints. Claude Code and Codex are optional local CLI adapters for users who have installed and authenticated those tools themselves. Nyx Agent does not include, proxy, sublicense, or resell model access.
#Anthropic Messages (one_shot only)
crates/nyx-agent-ai/src/adapter/anthropic.rs.
Direct reqwest against POST /v1/messages. No third-party SDK,
so version drift on the SDK side cannot couple us to its release
cadence. Constants:
| Constant | Value |
|---|---|
DEFAULT_BASE_URL |
https://api.anthropic.com |
ANTHROPIC_VERSION |
2023-06-01 |
DEFAULT_RANKING_MODEL |
claude-haiku-4-5 |
DEFAULT_SYNTHESIS_MODEL |
claude-opus-4-7 |
Capability flags: supports_agent_loop = false,
supports_prompt_cache = true,
supports_deterministic_sampling = false. agent_loop returns
AiError::UnsupportedMode("agent_loop").
Per-model pricing lives at anthropic.rs:65 (pricing_for). Match
order is prefix-based so claude-opus-4-7-20260101 prices as the
opus alias. Unknown model names default to opus pricing so a
mis-typed model id does not silently price as the cheapest tier.
Request body:
{
"model": "claude-opus-4-7",
"max_tokens": 4096,
"temperature": 0.0,
"system": [
{ "type": "text", "text": "<system prompt>",
"cache_control": { "type": "ephemeral" } }
],
"messages": [
{ "role": "user", "content": "<user>" }
]
}
The system field is a single-element block array (not a string)
when supports_prompt_cache is true so the cache_control
attachment can ride along. Adapters that do not support caching
emit a plain string for system.
The non-streaming path is the shipping one. The Messages API
supports SSE streaming via stream: true; a future revision can
flip to streaming and emit one AiEvent::TokenReceived per delta
without changing the trait.
#Local OpenAI-compatible (one_shot only)
crates/nyx-agent-ai/src/adapter/local_llm.rs.
Direct reqwest against <api_base>/chat/completions. The runtime
expects api_base to point at a local OpenAI-compatible /v1
endpoint such as LM Studio, Ollama, or vLLM. Any bearer token is read
from the OS keychain; it is never written to TOML.
Capability flags: supports_agent_loop = false,
supports_prompt_cache = false,
supports_deterministic_sampling = false. agent_loop returns
AiError::UnsupportedMode("agent_loop").
Request body:
{
"model": "local-model",
"max_tokens": 4096,
"temperature": 0.0,
"stream": false,
"messages": [
{ "role": "system", "content": "<system prompt>" },
{ "role": "user", "content": "<user>" }
]
}
Set [ai].model when the local server requires a specific model id.
Local runs record token counts when the server reports usage, but
cost is recorded as zero because Nyx Agent has no way to know a local
operator's hardware or provider accounting.
#Claude Code (agent_loop only)
crates/nyx-agent-ai/src/adapter/claude_code.rs.
Spawns the claude CLI as a subprocess so the agent does not have
to embed Anthropic's tool-use loop. Detection runs
which claude (or which claude-code as a fallback alias) at
construction time; failure surfaces as
AiError::AdapterUnavailable. The binary path plus --version
output is captured into ClaudeBinary and surfaced by
nyx-agent doctor.
Wire shape:
- Write
agent_task.mdinto a per-task scratch directory. - Spawn
claude --print --output-format stream-json --verbose --max-turns <N>. - Pipe the rendered task body on stdin and read the NDJSON event
stream on stdout. A sibling task drains stderr into a bounded
64 KiB trailing-window ring (
MAX_STDERR_CAPTURE_BYTES) so a verbose child cannot block on a full pipe. - Classify each tool-use block via
classify_tool_useinto a typedExtractedAgentResult; emitToolCallStartedandToolCallFinishedevents on the bus. - On timeout, kill the child, emit
TaskHalted { reason: OperatorCancelled }, and annotate the returnedAiError::Transportwith the captured stderr.
Default model: claude-opus-4-7. Default wall-clock timeout: 15
minutes, with host phases able to override it; the unsafe attack-agent
phase uses 30 minutes per pass. Capability flags:
supports_agent_loop = true,
supports_prompt_cache = true,
supports_deterministic_sampling = false. one_shot returns
AiError::UnsupportedMode("one_shot").
#Codex CLI (one_shot and agent_loop)
crates/nyx-agent-ai/src/adapter/codex.rs.
Spawns the installed codex CLI and consumes the JSONL stream emitted
by codex exec. This adapter is optional and depends on the user's own
Codex installation and authentication. nyx-agent doctor reports the
resolved binary and version when present.
Default wall-clock timeout: 15 minutes, with host phases able to override it; the unsafe attack-agent phase uses 30 minutes per pass.
#Adapters on the roadmap
OpenAI API, Bedrock, and Vertex.
#Budget tracking
BudgetTracker (runtime.rs:55) is the host-side port the
adapter calls on every successful round trip. The contract is
deliberately small:
async fn cap(run_id, kind) -> Result<Option<i64>, AiError>;
async fn current_spend(run_id, kind) -> Result<i64, AiError>;
async fn add_spend(run_id, kind, micros) -> Result<i64, AiError>;
Adapters never write a halted flag; the host owns that audit
trail in the budgets table. The boundary on both pre-call and
post-call cap checks is strictly >: a call landing exactly at
the cap proceeds, the call after does not.
BudgetKind (agent.rs:151) has three variants:
| Variant | Used by |
|---|---|
OneShot |
AiRuntime::one_shot paths. |
AgentLoop |
AiRuntime::agent_loop paths. |
Total |
Reserved for per-run aggregate the host writes itself. |
Two implementations ship:
InMemoryBudgetTracker(runtime.rs:74). Process-local, used by adapter tests and any future in-memory dispatcher.BudgetStoreTrackerlives in the binary glue and forwards intonyx_agent_core::store::BudgetStore. The wizard picks a per-run cap (default unlimited fromAiConfig::DEFAULT_RUN_BUDGET_USD_MICROS) and the tracker auto-creates the row on firstadd_spend.
Budget (agent.rs:138) is the per-call envelope:
{ run_id, kind, cap_usd_micros }. The cap_usd_micros field on
the envelope is the operator-visible per-call cap; the tracker
sees the per-run accumulated total separately.
#Per-call cap allocation ladder
Four one_shot tasks share a single
(run_id, BudgetKind::OneShot) bucket: PayloadSynthesis,
SpecDerivation, ChainReasoning, and NovelFindingDiscovery. The
binary drives PayloadSynthesis and SpecDerivation before deeper
candidate work, then runs NovelFindingDiscovery and finally
post-live ChainReasoning after candidate/live verification has
populated graph proof nodes (see the scan_loop
function in crates/nyx-agent/src/main.rs), so earlier-pass spend
reduces the budget every later pass sees through the same tracker.
Each pass also carries its own per-call cap on the wire
(payload_synthesis_per_call_cap_usd_micros,
spec_derivation_per_call_cap_usd_micros,
chain_reasoning_per_call_cap_usd_micros,
novel_discovery_per_call_cap_usd_micros); each value clamps a
single call below the shared per-run bucket and falls back to
AiConfig::DEFAULT_RUN_BUDGET_USD_MICROS when unset.
The invariant the binary commits to is: PayloadSynthesis and
SpecDerivation get the full per-run cap to drive their fan-outs;
NovelFindingDiscovery can spend on missed candidates, and
ChainReasoning fires a single post-live call against whatever
budget remains. For CLI-backed runtimes that support agent loops
(Claude Code or Codex), that call is source-aware: the task is
given repository workspace roots and can read/search code before
returning the strict chain JSON. One-shot-only runtimes use the
same graph input without repository tool access. The order is intentional. The static-pass
refusals that PayloadSynthesis and SpecDerivation address are
the most actionable signal in a run; novel discovery benefits
from source context before live planning; chain reasoning is most
valuable after candidate tests have created verification_attempt
and verified_vulnerability graph nodes. These enrichments degrade
gracefully when an earlier pass exhausted the cap (the adapter
pre-call check refuses, the pass logs and continues). Operators
who want chain reasoning or novel-discovery to see a larger
headroom should raise
default_run_budget_usd_micros rather than try to slice the
shared pool. The BudgetKind enum does not sub-bucket today, and
splitting OneShot into OneShot.payload / OneShot.spec /
OneShot.chain / OneShot.novel would touch every adapter and
every tracker in tree without changing the realised behaviour for
a run that finishes inside its cap.
AI Exploration is the only agent_loop task and lives in a
separate (run_id, BudgetKind::AgentLoop) row with its own
per-run hard cap (default $10.00). It does not draw from the
OneShot pool.
The AgentLoop bucket itself does not sub-bucket per adapter.
Today only the Claude Code adapter consumes the bucket, so the
question is academic. The shape that takes hold the moment a
second agent_loop-capable adapter ships (an OpenAI assistant
API path, a Bedrock agent path) is identical to the OneShot
case: the bucket is the cap the operator pays for; the
per-adapter accountability lives one layer down in the trace
store. Every adapter call writes one agent_traces row with
runtime_name, model, and cost_usd_micros columns, so an
operator dashboard that needs "how much did Claude Code burn vs
the OpenAI assistant during this run" sums cost_usd_micros
grouped by runtime_name from agent_traces rather than asking
the budget bucket to sub-bucket itself. Splitting
BudgetKind::AgentLoop into AgentLoop.claude_code /
AgentLoop.openai / etc. would touch every adapter and every
tracker in tree without changing the realised behaviour for a
run that finishes inside its cap; the per-adapter share is
already recoverable from the trace store.
#Event stream
Every model call publishes a fan-out of AgentEvent::Ai { data: AiEvent } frames on the bus (crates/nyx-agent-types/src/event.rs:145).
The same task_id rides on every variant so subscribers can
multiplex concurrent calls.
| Variant | Emitted when |
|---|---|
TokenReceived { task_id, token } |
Each token batch the adapter materialises (or the full body for non-streaming Anthropic). |
ToolCallStarted { task_id, name } |
Agent loop sees a ContentBlock::ToolUse. |
ToolCallFinished { task_id, name, ok } |
After the tool-use block lands in extracted. |
CacheHit { task_id, tokens } |
usage.cache_read_input_tokens > 0. |
CacheMiss { task_id, tokens } |
usage.cache_creation_input_tokens > 0. |
BudgetTick { task_id, run_id, spent_usd_micros } |
After every successful add_spend. |
TaskHalted { task_id, reason } |
Cap overrun, timeout, or upstream refusal. |
HaltReason (agent.rs:319) has three variants:
BudgetCapReached, OperatorCancelled, UpstreamRefused. See
events.md for the full envelope and the WebSocket
filter contract.
#Tasks
Five task modules sit on top of the trait. Each task builds a
typed Prompt, drives the model once (or twice on validation
retry), parses the JSON contract, validates the result, and
returns a typed outcome the binary persists.
| Task | File | Outcome |
|---|---|---|
| PayloadSynthesis | tasks/payload_synthesis.rs |
Synthesised { output, ... } or Quarantined. |
| SpecDerivation | tasks/spec_derivation.rs |
Synthesised { spec, spec_blob, ... } or Quarantined. |
| ChainReasoning | tasks/chain_reasoning.rs |
Ranked { output, ... } or NoChains. |
| NovelFindingDiscovery | tasks/novel_findings.rs |
Discovered { candidates, ... } or NoCandidates. |
| Exploration | tasks/exploration.rs |
Completed { findings, ... } plus halt reasons. |
Common rules across the four one_shot tasks:
- Two attempts max. First attempt uses the v1 prompt; the
retry uses the
*_strictervariant with explicit "your previous reply did not validate" framing. - Shared budget bucket. Both attempts charge the same
(run_id, BudgetKind::OneShot)row; the tracker is the gate. spent_usd_microsandattemptsride on every outcome. The binary persists both on the agent-trace row even on quarantine.metrics: AgentTraceMetrics(agent.rs:88) accumulates per-call observability across attempts via saturating add. The binary'sbuild_trace_rowliftsusage/cache/modelfrom this envelope into the trace columns.
#NovelFindingDiscovery file priority
The walker behind build_novel_inputs_for_repo orders source files
before chunking them into per-batch prompts. The score combines three
inputs:
- Keyword score. Path segments matching
route,controller,handler,view,api,model,auth,login,query,sql,db,execadd 2 to 6 points each. Picks up the high-leverage surface (HTTP layer, ORM, auth flows) on a typical web app. - Size band. Tiny files (<256 B) and oversize files (>200 kB) subtract 5; the 2 kB to 50 kB band adds 3.
- Historical promotion rate. When the run has prior
AI-originated findings on disk, the walker calls
FindingStore::per_path_promotion_rate(repo)and looks up each path. Rate =promotions / (total + 5)wherepromotionsis the row count withstatus IN ('Open', 'Verified')andtotalis every AI-originated row on the path (provenanceLlmSynthesisedorAiExploration). The denominator's+ 5smooths low-cardinality paths so a single observation does not peg the rate. The score adds up to +10 at rate = 1.0; the boost scales linearly.
A store error on the rate lookup degrades to the keyword + size heuristic; the pass still produces a useful ordering when the findings table has not yet accumulated AI-promotion history.
Exploration is the only agent_loop task. It runs against a
chain-lane sandbox with three guard rails:
- Escape suite gate. A pre-flight
EscapeSuiteGatecheck refuses dispatch if the escape-regression suite is red. - Per-run hard cap. Default
$10.00USD micros, in the(run_id, BudgetKind::AgentLoop)bucket. - Per-task soft cap. A separate warning threshold emits a
TokenReceivedevent with a[soft-cap]prefix but does not halt; the hard cap is the only ceiling that aborts.
Before dispatch, the binary also attaches a bounded KNOWN SCANNER LEADS digest built from pentest_candidates rows for the run. Nyx,
route/API discovery, OpenAPI specs, JavaScript bundle endpoint
extraction, form discovery, ZAP baseline, Nuclei, Trivy, OSV-Scanner,
Gitleaks/detect-secrets, Katana, httpx, and future candidate-producing
scanners reach exploration through this normalized summary rather than
raw scanner JSON/log output, so the agent can pivot from existing leads
without overloading the prompt or trusting them as verified proof. The
candidate row's source and source_ids carry attribution; live
verification remains the only path to Verified.
#Determinism
deterministic_seed(run_id, task_id) (runtime.rs:159) produces a
stable 64-bit seed via BLAKE3(run_id || "\0" || task_id).
Adapters that expose random_seed upstream pass it through;
adapters that do not ignore the value but the function is still
called so the binary's trace row carries the same number. Both
shipped adapters report supports_deterministic_sampling = false
today, so temperature: 0 is the only knob; the seed becomes
load-bearing once a vendor surfaces a sampling-seed parameter.
#Prompt versions
Every prompt template lives in crates/nyx-agent-ai/src/prompts/.
Stable version slugs are persisted on every trace row:
| Task | Slug |
|---|---|
| PayloadSynthesis | PAYLOAD_SYNTHESIS_PROMPT_VERSION |
| SpecDerivation | SPEC_DERIVATION_PROMPT_VERSION |
| ChainReasoning | CHAIN_REASONING_PROMPT_VERSION |
| NovelFindingDiscovery | NOVEL_FINDING_DISCOVERY_PROMPT_VERSION |
| Exploration | EXPLORATION_PROMPT_VERSION |
Slug constants live next to each task's run function. Rev a slug
only when the prompt body changes in a way downstream consumers
must distinguish; the trace store compares slugs verbatim.
#Configuration
Operators pick the runtime in nyx-agent.toml under the [ai]
section (defined at crates/nyx-agent-core/src/config.rs:166):
[ai]
provider = "anthropic"
model = "claude-opus-4-7"
runtime = "anthropic" # none | anthropic | local-llm | claude-code | codex
max_concurrent_one_shot = 4
# Optional. Omit for unlimited AI runs.
default_run_budget_usd_micros = 25_000_000 # $25.00 per run
| Field | Default | Notes |
|---|---|---|
provider |
None |
Free-form provider hint surfaced by the wizard. |
model |
None |
Per-run model override; tasks may still pick a model per prompt. |
api_base |
None |
Endpoint URL for local-llm. |
runtime |
none |
One of none, anthropic, local-llm, claude-code, codex. |
max_concurrent_one_shot |
4 |
In-flight one-shot fan-out. Floored to 1. |
default_run_budget_usd_micros |
unset (unlimited) | Optional per-run cap stamped on auto-created budget rows. |
Secrets do not live in TOML. The wizard stashes the API key in the
OS keychain under secrets::ACCOUNT_AI_ANTHROPIC (Anthropic) or
secrets::ACCOUNT_AI_LOCAL_LLM (local LLM).
#Failure modes
| Error | When |
|---|---|
AiError::BudgetExceeded |
Pre-call or post-call cap check fails. Emits TaskHalted { BudgetCapReached }. |
AiError::UnsupportedMode |
Adapter does not implement the requested mode (anthropic/local-llm + agent_loop, claude-code + one_shot). |
AiError::UpstreamRefused |
Non-2xx HTTP status (anthropic/local-llm) or non-zero exit (CLI adapters). Body / stderr rides in the variant string. |
AiError::MalformedResponse |
JSON deserialisation failed on the response body. |
AiError::Transport |
Network, IO, or scratch-dir failure. CLI agent-loop timeout maps here with the captured stderr appended. |
AiError::BudgetTracker |
The host tracker returned an error (database write failure, etc.). |
AiError::AdapterUnavailable |
Construction failed (e.g. claude not on PATH). |
thiserror variants live at crates/nyx-agent-types/src/agent.rs:326.
#Related pages
- architecture.md for where the AI runtime sits in the crate map.
- events.md for the
AiEventstream and the WebSocket filter contract. - config.md for the rest of
nyx-agent.toml. - api.md for the
/api/v1/budgetsroute and the/api/v1/tracesendpoints that read the per-call trace store.