Site Reliability Engineering Lead (SRE) 3158815
The Technology division partners with our business units and leading technology companies to redefine how we do business in ever more global and dynamic financial markets.
Our sizeable investment in technology results in leading-edge tools, software, and systems. Our insights, applications, and infrastructure give a competitive edge to clients’ businesses—and to our own.
Enterprise Technology & Services (ETS) delivers shared technology services for the Firm supporting all business applications and end users. ETS provides capabilities for all stages of the Firm’s software development lifecycle, enabling productive coding, functional and integration testing, application releases, and ongoing monitoring and support for over 3,000 production applications.
ETS also delivers all workplace technologies (desktop, mobile, voice, video, productivity, intranet/internet) in integrated configurations that boost the personal productivity of our employees. Application and end user services are delivered on a scalable, secure, and reliable infrastructure composed of seamlessly integrated datacenter, network, compute, cloud, storage, and database services.
The company’s Application Infrastructure organization is seeking a Site Reliability Engineering lead to establish a small team of SREs to own reliability and operational efficiency related to the portfolio of products related to the Morgan Stanley Development Environment (MSDE). MSDE squads are responsible for shaping the SDLC within Morgan Stanley by implementing the tools, systems, and processes used by 12,000+ developers in the Firm for software development and deployment to increase release velocity.
Reporting to the Head of Site Reliability Engineering & Operations for Application Infrastructure, this role requires establishing SRE capabilities for the MSDE products and a small global team to deliver reliable systems, without wasteful operational effort, and understanding MSDE’s products front to back to maximize developer productivity across the entire Firm. Establishing SRE is a shared organizational goal.
This is a production-side, operational role requiring participation in an on-call rotation and one that will require strong influencing skills amongst a technical stakeholder group. The successful candidate might be a developer today looking to evolve site reliability engineering as a practice, or an infrastructure engineer, or a strong operational lead with software development experience. Strong Linux troubleshooting and hands-on experience with Python to develop task automation are essential, as is an interest in building and motivating a team. Prior SaaS toolchain, Cloud, Docker and Kubernetes experience will be an advantage as Morgan Stanley evolves towards these, however, these skills can be learned in the future by a candidate with the right technical aptitude and potential.
Prior experience in the financial industry is not required; candidates from software companies and other industries are welcome.
- Building and maintaining knowledge front to back of Morgan Stanley’s development environment
- Ensuring SRE & Ops team members are skilled to consult with clients (the Firm’s development community) to maximize productivity, including troubleshooting the issues they have using MSDE’s solutions
- Maximizing the availability and performance of supported systems through optimized and automated plant management, ongoing problem management, and architecture reviews with product/feature delivery peers
- Reduction of the cost of support (hours of effort) through the elimination of operational issues, optimization and automation of tasks, development of operational tools and driving client self-service to minimize constraints
- Identification and prioritization of technical debt that is impacting client developer productivity, reliability or the efficiency of the ops team
- Complex troubleshooting of front to back development environment issues
- Minimizing the escalation rate to the dev-side team members to ensure the department has the greatest possible flow of feature delivery
- Removing friction to the operational onboarding of MSDE’s new solutions
- Being operationally responsive, including sharing on-call rotation with the rest of the global team (with a time-off in lieu system)
- 7 of years of relevant experience
- Hands-on experience with Python to develop task automation
- Hands-on development experience in Python
- Strong Linux troubleshooting skills
- Ability to organize and prioritize the work of staff
- Familiarity with the tools of the trade : experience of multiple SCM systems, code review systems, issue tracking tools, build tools, test frameworks, code quality tools, CI systems, and IDEs
- Prior experience supporting or developing tools for build automation, version control (Bitbucket, Github), issue tracking (Jira), continuous integration (Jenkins, Azure DevOps, Github Actions), automated testing, or deployment automation
- Prior experience in a client onboarding, client consulting or training role