## Problem
The test suite contained several categories of latent brittleness
identified in docs/testing-brittle-tests.md that would surface during
refactoring or cross-platform (Windows) CI execution:
1. Hardcoded Unix paths: \Path::new("/tmp")\ and \PathBuf::from("/tmp")\
used as workspace directories in agent tests, which fail on Windows
where /tmp does not exist.
2. Exact string match assertions: ~20 \ssert_eq!(response, "exact text")\
assertions in agent unit and e2e tests that break on any mock wording
change, even when the underlying orchestration behavior is correct.
3. Fragile error message string matching: \.contains("specific message")\
assertions coupled to internal error wording rather than testing the
error category or behavioral outcome.
## What Changed
### Hardcoded paths → platform-agnostic temp dirs (4 files, 7 locations)
- \src/agent/tests.rs\: Replaced all 4 instances of \Path::new("/tmp")\
and \PathBuf::from("/tmp")\ with \std::env::temp_dir()\ in
\make_memory()\, \uild_agent_with()\, \uild_agent_with_memory()\,
and \uild_agent_with_config()\ helpers.
- \ ests/agent_e2e.rs\: Replaced all 3 instances in \make_memory()\,
\uild_agent()\, and \uild_agent_xml()\ helpers.
### Exact string assertions → behavioral checks (2 files, ~20 locations)
- \src/agent/tests.rs\: Converted 10 \ssert_eq!(response, "...")\ to
\ssert!(!response.is_empty(), "descriptive message")\ across tests for
text pass-through, tool execution, tool failure recovery, XML dispatch,
mixed text+tool responses, multi-tool batch, and run_single delegation.
- \ ests/agent_e2e.rs\: Converted 9 exact-match assertions to behavioral
checks. Multi-turn test now uses \ssert_ne!(r1, r2)\ to verify
sequential responses are distinct without coupling to exact wording.
- Provider error propagation test simplified to \ssert!(result.is_err())\
without asserting on the error message string.
### Fragile error message assertions → structural checks (2 files)
- \src/tools/git_operations.rs\: Replaced fragile OR-branch string match
(\contains("git repository") || contains("Git command failed")\) with
structural assertions: checks \!result.success\, error is non-empty,
and error does NOT mention autonomy/read-only (verifying the failure
is git-related, not permission-related).
- \src/cron/scheduler.rs\: Replaced \contains("agent job failed:")\ with
\!success\ and \!output.is_empty()\ checks that verify failure behavior
without coupling to exact log format.
## What Was NOT Changed (and why)
- \src/agent/loop_.rs\ parser tests: Exact string assertions are the
contract for XML tool call parsing — the exact output IS the spec.
- \src/providers/reliable.rs\: Error message assertions test the error
format contract (provider/model attribution in failure messages).
- \src/service/mod.rs\: Already platform-gated with \#[cfg]\; XML escape
test is a formatting contract where exact match is appropriate.
- \src/config/schema.rs\: TOML test strings use /tmp as data values for
deserialization tests, not filesystem access; HOME tests already use
\std::env::temp_dir()\.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add end-to-end integration tests that exercise the full agent turn
cycle through the public API using mock providers and tools:
- Simple text response (no tools)
- Single tool call → tool execution → final response
- Multi-step tool chain
- XML dispatcher path
- Multi-turn conversation coherence
- Unknown tool recovery
- Parallel tool dispatch
Add CI workflow (.github/workflows/e2e.yml) that runs these tests
on push to main and on PRs.
Ref: https://github.com/zeroclaw-labs/zeroclaw/issues/618 (item 6)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>