Daniel Bodnar
Building reliable platforms for high-throughput services.
San Francisco · CA
dan@bodnar.sh
roles.sh/u/dan
github.com/danielbodnar
Summary
Staff-level platform engineer with a decade of building high-throughput Go services and the Kubernetes substrate they run on. At HashiCorp, owned the multi-region K8s platform for Terraform Cloud (99.97% SLO, 5 regions). At Cloudflare, built bot-protection that runs at 300M req/s on the edge. I like infrastructure that's boring on purpose.
Experience
Staff Platform Engineer
HashiCorp · Terraform Cloud
Mar 2022 — Present
San Francisco
San Francisco
- Owned the multi-region Kubernetes platform backing Terraform Cloud — 14 clusters across 5 AWS regions, sustained 99.97% SLO over 11 trailing quarters.
- Designed and shipped the internal platform layer (Go + custom controllers) that 80+ application teams ship against — reduced average onboarding from 3 weeks to 4 days.
- Led the migration off Consul-as-source-of-truth to a custom CRD-based service catalog; cut cross-cluster lookup p99 from 240ms → 18ms.
- Wrote chaos-friday, the team's chaos-engineering practice; MTTR dropped 38% across the year we ran it.
Senior Site Reliability Engineer
Cloudflare · Workers & Bot Management
Jun 2019 — Mar 2022
San Francisco
San Francisco
- Took Cloudflare Workers from late beta to GA on the SRE side; sustained 300M req/s of edge compute across 270 PoPs with sub-millisecond cold starts.
- Built the bot-management control plane (Rust + V8 isolates) that classifies ~2T requests/day; tuned the model-serving path to keep p99 under 8ms at edge.
- On-call for 4 customer-impacting incidents — 3 were resolved without external-facing impact thanks to fast-path circuit breakers we'd added the quarter prior.
- Authored polite-bot, a public spec for HTTP bot identification; adopted by 6 companies and now on the IETF RFC track.
Linux Solutions Engineer
Red Hat · OpenShift Field
Aug 2017 — Jun 2019
Remote
Remote
- Field engineer for OpenShift enterprise rollouts — embedded with 8 Fortune-100 financial and government customers; learned what "production" actually means.
- Wrote the platform-hardening playbook still used by the field team (Ansible + CIS benchmarks), reducing avg customer-side install time from 6 weeks to 9 days.
DevOps Engineer
Stitch Fix · Data Platform (de-emphasized for this role)
Feb 2015 — Aug 2017
San Francisco
San Francisco
- First infrastructure hire on the data platform team. Built and operated Spark/Airflow infra for ~120 analysts and ML engineers.
Skills
- Languages
- Go, Rust, Python, TypeScript, Bash
- Orchestration
- Kubernetes (CKAD, CKS), Nomad, custom CRD controllers
- Cloud
- AWS (EKS, IAM, VPC, KMS), GCP, bare-metal Linux at scale
- Infra-as-Code
- Terraform, Pulumi, Crossplane, Ansible
- Observability
- Prometheus, Grafana, OpenTelemetry, Honeycomb, Tempo
- Distributed
- Consensus, leader election, leaderless replication, queues, exactly-once semantics
Education & Notes
- B.S. Computer Science · University of Waterloo · 2014
- Open-source: maintain kontain (4.2k stars), publish linuxlife.sh (18k subscribers).
- Speaker at SREcon '22, KubeCon EU '23 ("The platform that fits in your head").
tailored by roles.sh · v3 · 2026-04-14 14:32 PT theme: editorial-modern