Skip to main content

Joshua Gilman

Lead Site Reliability Engineer

Download PDF
Portland, OR JoshuaGilman@gmail.com (707) 580-8473 LinkedIn

Veteran SRE with 15+ years' experience building and operating large-scale systems for big tech, Web3, data center, and manufacturing industries. I design resilient cloud platforms, cut waste through automation, and lead cross-functional teams to ship reliably for 1K-400K+ users.

At a glance

Lead SRE

IOG / Project Catalyst

Built and ran the Cardano governance platform for 400K+ users.

SRE, Core Infra

Google

Modernized delivery pipelines for 1K+ engineers; reduced toil across DC ops.

Controls & SCADA

Google DCs

Automated security/monitoring, preventing incidents and saving 400K+ hours.

Technical Skills

Cloud & Kubernetes

AWS, Google Cloud, EKS, Kubernetes, Argo CD, Helm, Timoni

IaC & Automation

Terraform, Ansible, Nix, Jsonnet, GitLab CI, Docker

Languages & Scripting

Go, Python, Rust, JavaScript, SQL

Reliability & Observability

SLO/SLI design, Observability (Prometheus, Grafana), Incident response, Runbooks & playbooks, Chaos testing, Cost optimization, Security automation

Experience

Input Output HK

Portland, OR

Lead Site Reliability Engineer

Oct 2022 – Present
  • Develop, iterate, and maintain infrastructure (Kubernetes, Terraform, Earthly) for operating Cardano's decentralized innovation system (Project Catalyst), completing projects 4-6 months quicker than prior average, supporting 4 funds annually ($50M+ annual distributions), and improving community engagement with platform (200K-400K+ members)
  • Architect complete software stack (CUE, Timoni, Argo CD) capable of running blockchain workload and implement safe/reliable methodologies necessary, improving public confidence in an on-chain governance system
  • Design and implement automated software testing and delivery pipeline, decreasing testing times by 92% and reducing testing by 100+ hours weekly
  • Create documentation and training to enable 20+ software engineers to take ownership of systems and promote strong DevOps culture focusing on platform engineering, saving $250K annually on infrastructure costs

Google

Portland, OR

Site Reliability Engineer (Core Infrastructure Team)

Mar 2022 – Oct 2022
  • Created multiple software delivery pipelines with Python, improving software delivery reliability by 10% for 1K+ SWEs
  • Migrated legacy applications used by 10K+ employees to newer orchestration systems, increasing developer velocity and reducing cognitive load
  • Spearheaded first Data Center Modernization Initiative at Google and integrated SRE principles into data center operations, improving visibility into critical data center services and reducing toil

Data Center Technician I/II

Jul 2014 – Mar 2022
  • Led 2-3 year data center build projects to commission 3 data centers (total 400MW+), achieving top ranking for controls engineering and compliance and securing first ever promotion from Data Center Technician to SRE at Google
  • Collaborated with Controls Engineer to maintain and improve complex SCADA software and networking deployments, including medium-sized Cisco-based network and 4 FactoryTalk deployments
  • Designed, led, and implemented software-based improvements to SCADA systems (automated security systems, software-based monitoring tools, disaster recovery software), leading to no major cybersecurity incidents and saving 400K+ manual labor hours for data center operations annually
  • 15+ Google Awards & Recognition including Top Awards for Control Systems Automated Anti-Virus Solution

Johnson Controls

San Francisco, CA

HVAC Controls Technician

May 2013 – Jul 2014
  • Supported 20+ sites in the Greater San Francisco Bay Area and performed preventative maintenance on building management solutions, ranking in the top 5% of technicians with a 100% on time project delivery rate
  • Responded to and resolved 15-20 BMS service requests weekly, completing 90% of requests within 2-3 hours
  • Designed and commissioned 9 HVAC process plants ($100K-$3M+) for technology companies, data centers, skyscraper HVAC systems, manufacturing plants, and laboratory facilities

Previous Experience

US Navy Electrical Maintenance Technician (Nuclear)
Alias Projects Software Developer

Certifications

Programming for Data Science Nano Certification (Udacity)

Interests

Locally Hosted AI Server Home IoT Hiking Camping Reading Music Guitar