Job Summary

NOC Incident Management Analyst

  • Location:
    Los Angeles, California
  • Job reference:
  • Category:
    System Administrator
  • Contract Type:

Our client located in Santa Monica, CA has an opportunity for an Network Operations Center (NOC) Incident Management Analyst to join their team.
Candidates with a background within the DevOps world or service reliability engineering environment are highly preferred - Ideally you will also be familiar with Python or another similar scripting language.
As a(n) Operations Center Incident Management Analyst, the position will require technical expertise and strong problem solving skills to manage, triage, and maintain the necessary infrastructure for our platform.  Candidates should be able to assist in developing the needed policies and procedures for operating efficiently and effectively, while providing the needed technical support to help stakeholders.
The Operations Center Incident Management Analyst will help to establish and be responsible for adherence to policies, procedures, support coordination and all other functions relating to the management and administration of the data centers and cloud operations.  The right person for this role has 3+ years’ experience at scale, is a fantastic communicator, and stays cool under pressure.  
If you are someone who believes operational excellence is just as important as awesome features, then this is a great role for you.  
• Monitoring, detecting, communicating and handling all operations related service disruptions).
• Analyze and act upon data to make informed decisions and identify potential problems and risks
• Ensure timely declaration and escalation of unplanned/planned service disruptions, including the associated data collection, reporting, and tracking of follow-ups.
• Establish timelines for technology operations and development teams to ensure incidents are accurate for measurement of Meantime to Detect, Escalation, and Resolution.
• Work with Service Reliability Engineers to help ensuring service availability levels are met and adverse impacts are kept to a minimum.
• Proactively communicate with stakeholders on potential customer impacting issues (i.e. network issues, site performance, service disruptions, planned maintenance, equipment failures, etc.).
• Help ensure we’re always pursuing better ways by maintaining training manuals, policies, checklists and incident management procedures.
• 24x7 On-call Operations Support with a willingness to cover shifts gaps if necessary.
• 3 years’ experience working in a high traffic environments with 24x7, mission critical operations environment.
• A high degree of technical knowledge to understand the environment and provide clear updates to the teams.
• Excellent communication, follow-up, and conflict resolution abilities.
• ITIL v3 certification is preferred.
• Experience with administering and tuning applications/system monitoring and management tools (i.e. Graphite, Grafana, New Relic, PagerDuty, Nagios, Solarwinds and Splunk) is required.
• Knowledge of incident and change management controls and processes.
• Experience with compliance activities associated with SOX, and PCI DSS.
• Experience in Devops world, or service reliability engineering environment.
• Familiar with Python scripting
• Experience in Broadcast Operations and CDN technology


Apply Below


Note: Required fields marked with an asterisk (*).


Primary Number
[Ctrl (Cmd Mac) + Click] to select multiple industries
Upload your resume
Terms of Use


Upload your resume using

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or protected veteran status and will not be discriminated against on the basis of disability.

Equal employment opportunity information:
EEO is the Law (poster) | EEO is the Law (poster supplement) | Reaffirmation of Affirmative Action Policy Statement