Resume

Professional Experience

Designed and built the infrastructure for Ouroboros, a service for automating SLO creation and monitoring
Worked with Junior SRE team members of design and implementation of Ouroboros backend components
Successfully executed the transition of the Metrics Service from the Platform team to the SRE team
Performed SRE engagements with partner teams to asses reliability of services and develop SLOs and SLIs for measuring success
Approved by Security as a Level 2 reviewer for assessing code for security issues and quality

Member of the SRE team for the Tanzu Mission Control service
Designed, built, and maintained the infrastructure for Tanzu Mission Control
Implemented Kubernetes controllers for managing infrastructure programmatically including an operator for vault secrets and an operator for server lifecycle management
Integrated vault clusters into infrastructure for managing PKI, Kubernetes authentication, and secrets storage.
Integrated monitoring and alerting into our platform using prometheus and alert-manager
Member of the inaugural SRE team at Heptio to build the HQ SaaS offering.

Team lead for the Microservices Paved Road squad focused on building tooling and infrastructure to enable other dev teams to build, run, deploy, and operate microservices.
Architected and implemented production and staging Kubernetes infrastructure on AWS using kops, terraform, and helm.
Authored helm charts for common services running on Kubernetes. Built companion scripts to enable: canary, blue/green, and shadow deployments.
Worked with service teams to launch multiple services into production on Kubernetes.
Authored the Datadog adapter for Istio, released in version 0.6.0
Integrated Datadog into the infrastructure and built monitors for critical services
Migrated production database to AuroraDB for increased performance and capacity.

Lead for the SRE team consisting of myself and five other members
Architected and implemented the infrastructure for Ping’s future microservice platform which leveraged the latest containerization technology including: Mesos, Marathon, Consul, Linkerd, SaltStack and more.
Implemented best-practices for the team including: On-call reviews, Operational Readiness Reviews, and Design Reviews for new services.
Authored a self-service application for managing Cassandra keyspaces and users.

Member of On-call rotation for PingOne, Ping Identity’s IDaaS product spanning multiple AWS regions and composed of 100s of instances.
Responsible for managing production & development environments, server configuration, monitoring & alerting configuration, public DNS, and AWS Infrastructure.
Began the process of using Terraform to manage Ping’s AWS infrastructure as code.
Authored numerous scripts and chat-bots to assist with day-to-day activities.

Member of the Fleet Operations and Region operations teams for S3, an online storage service which hosts trillions of objects, serves millions of requests per second.
Lead a program to retire thousands of older server models from production. Developed scripts, tools, and automation to assist with server identification and retirement.
Supported multiple on-call rotations and operations by responding to critical issues, deploying new code, and developing tools and libraries to support the system.
Authored scripts to improve hardware utilization by identifying servers that failed to transition to production. Identified over 10 PB of lost storage this way.
Responsible for launching new regions of S3 including a private cloud isolated from the rest of the internet. This involved the provisioning of hardware, making code updates, deploying new software, troubleshooting, and testing.
Member of a volunteer team within S3 to raise the bar on standards and quality of S3’s change management process for delivering code and updates to production systems.

Member of the IDOS-G2 program supporting GIMS: a Geo-spatial Intelligence System built upon Linux, J2EE, Weblogic and using SOA.
Primary responsibility was maintaining the 24/7/365 availability of the operational system consisting of over 200 servers.
Roles included performing deployments of the latest versions of software, developing monitoring tools using BASH and Python, troubleshooting issues and errors, and testing interfaces with external partners located across the globe.
Subject-matter expert for the program's SOA infrastructure.

Installed, configured, and maintained dozens of Weblogic servers in multiple development, test, and operational environments
Developed deployment scripts using Jython/WLST for the deployment of custom code to Weblogic servers, the configuration of FJNDI links, JDBC connection configuration, and JMS configurations.
Developed utility scripts and websites using Python, PHP, MySQL, Apache to build and maintain internal sites for environment status, resource allocation, and property configurations.