Paul M. Lambert‘s résumé

Professional Summary

SRE/DevOps Leader who transforms engineering organizations through strategic automation, process optimization, and data-driven decision making. Combines hands-on technical expertise in systems architecture, cloud platforms, and container orchestration with business acumen in forecasting, vendor management, and cross-functional leadership. Track record includes leading SRE practices at Apple, LinkedIn, and multiple high-growth companies

Experience

Capacity and Observability Management Lead, Apple, Cupertino, California
November 2020 to April 2025

Designed and implemented a budget forecasting and reporting platform for public and private cloud spend in Ads Platforms organization. Drove adoption leading to a 95%+ rate of public cloud cost attribution to specific products and capabilities, despite almost all products being served from the same cloud infrastructure. Led team to implement and deploy production business metrics observation tools which reduced impact of business- level incidents across the board. Worked with SRE, Engineering, Product, and Finance teams to create spending forecasts, track alignment to those forecasts, and update forecasts with the impacts of changing business priorities.

Principal Site Reliability Engineer, Optimizely, San Francisco, California
November 2019 to August 2020

Centralized most SRE roles into a new SRE organization while preserving the benefits of embedded SREs in each engineering team. Built a framework to absorb planned business growth with minimal additional staff. Identified key infrastructure, database, and cloud technologies, then revised production architectures to leverage existing staff expertise in those areas. Worked with Product, Sales, and Sales Engineering to enable the business’s pivot to an SRE and Engineering audience, including deploying internal use cases, documenting them for sales engineering teams, and creating an excellent live demo for Sales Engineers to use with live, anonymized production data.

Senior Staff Site Reliability Engineer, LinkedIn, Sunnyvale, California
March 2019 to September 2019

Led cross-team, cross-functional effort to automate OS upgrades across a fleet of more than 100,000 servers with no impact to service availability or reliability. Reduced the time required to complete the project by more than 13 months. Developed a process to upgrade the Linux OS in-place without a hard reboot, saving more than 5,000 man-hours for on-site disk drive, firmware, and battery replacement. Created a separate process to ensure these tasks were completed without affecting delivery of the OS upgrades. Mentored on-site operations team in developing their first custom tool to automate the upgrades in more than 20 distinct scenarios of hardware, UEFI firmware, management network availability, and other factors, to allow rolling upgrades to be automated and scheduled, avoiding higher-risk deployments without full on-site staffing levels.

Platform Solutions Architect, Rundeck Inc., Redwood City, California
January 2018 to January 2019

Joined early startup offering data center operations automation tools. Performed a wide range of functions, including the development of third-party integrations such as the Rundeck App for Splunk. Performed Sales Engineering duties while training the first Sales Engineers. Provided on-site and remote post-sales training and consulting. Performed competitive analysis. Developed the corporate security policy. Created and maintained business analytics reports for senior staff. Participated in weekly, quarterly, and yearly senior staff planning sessions. Participated in product roadmap planning. Identified key areas of improvement. Created technical marketing content for the company blog and website.

Operations Architect, Proofpoint Inc., Sunnyvale, California
May 2014 to December 2017

Providing technical leadership in a VP-level role at the world leader in messaging security, by working across Software Development and Service Reliability Engineering teams to design highly available public cloud, managed, and customer-premise products while reducing time to delivery and improving reliability and efficiency. Working with Product and Engineering teams to plan Splunk integration strategies for Proofpoint products, including design and planning, marketing considerations and release strategy, prioritizing product integrations, and developing custom integrations for Fortune 100 customers. Growing the Operations team into a world class, globally distributed organization. Enabling efficient and accurate deployment at scale. Setting technical and organizational standards for service delivery via private data centers, public clouds, and on-premise infrastructure. Developing processes and standards for development, testing, and delivery of SaaS solutions to meet the needs of dozens of software engineering teams, including a containerization strategy, meeting security and compliance requirements, and maintaining customer data confidentiality while allowing continuous integration tools to have access to production-like datasets. Reducing customer-visible incident volumes by 40%.

Languages

English (Native)
Spanish (Intermediate)

Relevant Skills

Technologies Incident Management, Kubernetes, Containerization, Security, IP Networking, Application Performance Observability, Kernel Performance, Systems Performance Tuning, Configuration Management, Log Management, Business Performance Observability

Server Operating Systems Linux, macOS, FreeBSD, Solaris, and more

Programming/Scripting Languages Crystal, Python, JavaScript, Perl, Ruby, Bash shell, C/C++, and more

Education

August 1991 to June 1996

Pennsylvania State University, Bachelor of Science candidate, Computer Science.