Systems Programming

SysRift

Deterministic syscall-level execution replay engine using Linux ptrace.

CLinuxptraceProcess Tracing

~15

Overhead vs Native

100%

Syscalls Intercepted

Faithful

Replay Fidelity

~200

Event Log Size

01

Problem

Problem statement, constraint shape, and the gap this project explores.

Problem Statement

How do you reproduce a non-deterministic program failure without modifying the program?

Challenge

Built a low-level execution capture and replay system using Linux ptrace to deterministically record and reproduce program behavior at the syscall boundary.

Why Existing Approaches Failed

Debugging non-deterministic failures in production systems is hard. Race conditions, timing-dependent crashes, and environment-specific behavior are nearly impossible to reproduce by re-running. SysRift captures the syscall boundary — the interface between user-space and the kernel — and records enough state to replay program execution faithfully without re-executing kernel interactions.

02

Constraints

Non-negotiable boundaries that shaped the implementation.

Interface

Syscall boundary (no source modification)

Overhead

Must not significantly alter timing behavior

Correctness

Replay must produce identical observable behavior

Scope

User-space memory buffers and return values

Platform

Linux x86-64 with ptrace support

03

Architecture

The primary design surface: flow, subsystem roles, and state boundaries.

Architecture Brief

SysRift attaches to a target process using ptrace(PTRACE_ATTACH). At each syscall entry and exit, it captures arguments, return values, and relevant memory regions. The record phase produces an event log. The replay phase re-executes the program, intercepting syscalls and substituting recorded return values instead of forwarding to the kernel.

Execution Flow
01

Target process

02

ptrace attach

03

[syscall entry intercept

04

record args]

05

[syscall exit intercept

06

record return + buffers]

07

Event log

08

Replay: inject recorded returns

01

Tracer

ptrace-based process monitor that intercepts SIGTRAP on syscall entry/exit.

02

Event Recorder

Captures syscall number, arguments, return values, and memory snapshots.

03

Memory Capturer

Reads process_vm_readv to snapshot output buffers at syscall exit.

04

Replay Engine

Re-runs process under ptrace, substituting recorded return values for live kernel calls.

05

Event Log

Binary-serialized event stream with timestamps and context.

04

Engineering Tradeoffs

Design review notes: what was optimized and what was deliberately left behind.

EDR-01
Decision

ptrace vs eBPF

Why Chosen

ptrace allows direct process control and memory reading without kernel module requirements. eBPF requires higher kernel version and more setup complexity.

Alternative Rejected

Lower overhead (eBPF is faster)

Impact

ptrace

EDR-02
Decision

Full memory snapshot vs selective capture

Why Chosen

Full memory snapshots are prohibitively large. Capturing only the output buffers of I/O syscalls covers the majority of replay correctness requirements.

Alternative Rejected

Complete state reconstruction

Impact

Selective buffer capture at syscall exit

EDR-03
Decision

Binary vs text event log

Why Chosen

Binary logs are significantly smaller and faster to write/read. Text logs add I/O overhead that affects the timing fidelity of the recording.

Alternative Rejected

Human readability

Impact

Binary serialized log

05

Failure Modes

Incident-style notes for the ways the design can break.

Signal delivery during syscall interception

FM-01
Impact

Signals arriving between syscall entry and exit corrupt the intercept state.

Mitigation

with PTRACE_O_TRACESYSGOOD flag and explicit signal masking.

Multi-threaded processes

FM-02
Impact

ptrace tracks per-thread, not per-process. Multi-threaded programs require tracing every child thread independently.

Mitigation

is single-threaded process only.

Exec-based process replacement

FM-03
Impact

execve replaces the process image. Replay must handle re-attachment after exec or use PTRACE_O_TRACEEXEC.

Mitigation

execve replaces the process image. Replay must handle re-attachment after exec or use PTRACE_O_TRACEEXEC.

Memory layout ASLR mismatch on replay

FM-04
Impact

ASLR assigns different base addresses each run. Replay requires ASLR disabled or base address fixups for pointer arguments.

Mitigation

ASLR assigns different base addresses each run. Replay requires ASLR disabled or base address fixups for pointer arguments.

06

Benchmarks

Environment first, numbers second. Metrics should be inspectable, not ornamental.

Test Environment
Runtime

C / Linux

Workload

Syscall-level capture

Stack

C, Linux, ptrace, Process Tracing

Scope

Systems Programming

Evidence

Project-level benchmark notes

Performance Results
Overhead vs Native

~15% slowdown

Syscalls Intercepted

100%coverage

Replay Fidelity

Faithfulfor single-threaded

Event Log Size

~200bytes/syscall

Supported Platform

Linuxx86-64

07

Lessons Learned

Engineering takeaways from the implementation, including remaining work.

01

ptrace is powerful but has sharp edges

signal handling and multi-threading require significant additional complexity.

02

The syscall boundary is sufficient for replay correctness in most single-threaded programs, but misses in-memory concurrency.

03

Binary event logs are essential

text logs introduce enough I/O latency to alter the timing of the recorded execution.

04

Future: extend to multi-threaded programs using per-thread ptrace handles and a global ordering constraint on event replay.

Akshat