Capacity and Observability Management Lead, Apple, Cupertino, California
November 2020 to April 2025
Designed and implemented a budget forecasting and reporting platform for public and
private cloud spend in Ads Platforms organization. Drove adoption leading to a 95%+
rate of public cloud cost attribution to specific products and capabilities, despite almost
all products being served from the same cloud infrastructure. Led team to implement and
deploy production business metrics observation tools which reduced impact of business-
level incidents across the board. Worked with SRE, Engineering, Product, and Finance
teams to create spending forecasts, track alignment to those forecasts, and update
forecasts with the impacts of changing business priorities.
Principal Site Reliability Engineer, Optimizely, San Francisco, California
November 2019 to August 2020
Centralized most SRE roles into a new SRE organization while preserving the benefits of
embedded SREs in each engineering team. Built a framework to absorb planned business
growth with minimal additional staff. Identified key infrastructure, database, and cloud
technologies, then revised production architectures to leverage existing staff expertise in
those areas. Worked with Product, Sales, and Sales Engineering to enable the business’s
pivot to an SRE and Engineering audience, including deploying internal use cases,
documenting them for sales engineering teams, and creating an excellent live demo for
Sales Engineers to use with live, anonymized production data.
Senior Staff Site Reliability Engineer, LinkedIn, Sunnyvale, California
March 2019 to September 2019
Led cross-team, cross-functional effort to automate OS upgrades across a fleet of more
than 100,000 servers with no impact to service availability or reliability. Reduced the
time required to complete the project by more than 13 months. Developed a process to
upgrade the Linux OS in-place without a hard reboot, saving more than 5,000 man-hours
for on-site disk drive, firmware, and battery replacement. Created a separate process to
ensure these tasks were completed without affecting delivery of the OS upgrades.
Mentored on-site operations team in developing their first custom tool to automate the
upgrades in more than 20 distinct scenarios of hardware, UEFI firmware, management
network availability, and other factors, to allow rolling upgrades to be automated and
scheduled, avoiding higher-risk deployments without full on-site staffing levels.
Platform Solutions Architect, Rundeck Inc., Redwood City, California
January 2018 to January 2019
Joined early startup offering data center operations automation tools. Performed a wide
range of functions, including the development of third-party integrations such as the
Rundeck App for Splunk. Performed Sales Engineering duties while training the first
Sales Engineers. Provided on-site and remote post-sales training and consulting.
Performed competitive analysis. Developed the corporate security policy. Created and
maintained business analytics reports for senior staff. Participated in weekly, quarterly,
and yearly senior staff planning sessions. Participated in product roadmap planning.
Identified key areas of improvement. Created technical marketing content for the
company blog and website.
Operations Architect, Proofpoint Inc., Sunnyvale, California
May 2014 to December 2017
Providing technical leadership in a VP-level role at the world leader in messaging
security, by working across Software Development and Service Reliability Engineering
teams to design highly available public cloud, managed, and customer-premise products
while reducing time to delivery and improving reliability and efficiency. Working with
Product and Engineering teams to plan Splunk integration strategies for Proofpoint
products, including design and planning, marketing considerations and release strategy,
prioritizing product integrations, and developing custom integrations for Fortune 100
customers.
Growing the Operations team into a world class, globally distributed organization.
Enabling efficient and accurate deployment at scale. Setting technical and organizational
standards for service delivery via private data centers, public clouds, and on-premise
infrastructure. Developing processes and standards for development, testing, and delivery
of SaaS solutions to meet the needs of dozens of software engineering teams, including a
containerization strategy, meeting security and compliance requirements, and maintaining
customer data confidentiality while allowing continuous integration tools to have access
to production-like datasets. Reducing customer-visible incident volumes by 40%.