ALTA IT Services is a wholly owned subsidiary of System One, a leading provider of specialized workforce solutions and integrated services. ALTA is an established leader in IT Staffing and Services, for both government and commercial enterprises across the United States, specializing in Program & Project Management, Application Development, Cybersecurity, Data & Advanced Analytics, and Agile Transformation Services.
Major Incident Manager
100% Remote
Shift Details:
– Night Shift (6:00 PM – 6:00 AM)
Rotating Two-Week Schedules:
– Week 1 – Work: Mon-Tues, Off: Wed-Thurs, Work: Fri-Sun
– Week 2 – Off: Mon-Tues, Work: Wed-Thurs, Off: Fri-Sun
Weekends and Holidays:
– Contractors will be expected to work every other weekend and on some holidays
Job Description
Major Incident Management is responsible for driving the coordination and recovery efforts of major outages at the client. When issues impact the clients services or systems, major outages may occur, which result in serious interruptions to business and member activities. The Major Incident Management team operates 24×7 to ensure that impacted services are restored as efficiently and effectively as possible. The team actively monitors systems and services, documents and timelines recovery efforts, manages and coordinates various support team activities, and notifies business units of potential impacts and on-going recovery efforts. The team is also responsible for providing continual process improvement suggestions for the major incident management service, and monitoring for weekend change activities and military pay days.
Major Responsibilities
Monitors Service Desk ticket queues, system alerts, and escalation methods to identify possible trends or outages
Serves as the main point of contact for all incident and service issue escalations directed to the Major Incident Management team
Ensures that incident management processes are efficiently and effectively followed
Determines the impact and priority of incidents based on affected customers and/or business units
Communicates operational issues to respective IT management, support teams, and incident communication managers
Provides outage notification and recovery effort updates to business units via the Status Page
Engages various support teams and resources to major incident bridges
Manages and coordinates troubleshooting and recovery efforts between support teams and vendors
Ensures continuous collaboration with IT Operations Management and other areas or teams
Documents initial issues, recovery activities, and resolution steps taken via MIM timelines
Ensures prompt resolution and coordination of incident management activities during recovery efforts
Updates and validates outage information in availability management tools for reporting and tracking purposes
Makes recommendations, proposals, and suggestions for improvement within the service to reduce severity and frequency of incidents
Attends Post Incident Review Meetings or reviews meeting notes once the meetings conclude to ensure compliance with service improvement initiatives
Attends and participates in TCABs (technical change advisory board meetings) to review, discuss, and approve or reject concerning upcoming changes or releases to the environment
Coordinates, communicates, and manages Sunday Maintenance Windows for weekend scheduled activities
Works with Problem Management and Change Management to resolve incidents
Coordinates, communicates, and manages Military Pay Bridge activities
Prepares operational status reports to IT Operations Management
Updates and publishes Morning Reports
Required Qualifications
Bachelors Degree in a related field, or the equivalent combination of education, training, and/or experience
Extensive IT experience that demonstrates knowledge of hardware and infrastructure protocols used to provide services to customers
Extensive IT experience in at least one of the following areas: mainframe, networking, middleware Websphere, Azure
Prior experience leading incident bridge calls from initial triage to guiding recovery efforts, maintaining a timeline and ensuring that service is restored as quickly as possible
Experience in leading or supervising an IT team
Demonstrated ability to lead others in a challenging and fast-paced large enterprise environment
Strong research, analytical, and problem solving skills
Strong planning, organizational, and multi-tasking skills
Demonstrated ability in exercising initiative to produce desired results and achieve objectives
Ability to effectively interface with various levels of employees, management, and vendors
Excellent interpersonal, verbal, and written communication skills
Practical Incident management work experience
Desired Qualifications
ITIL v3 or v4 Foundations Certificate
CCNA / Networking Training and Certificates
Middleware Training and Certificates
Azure Training and Certificates
System One, and its divisions and subsidiaries including Joulé, ALTA IT Services, CM Access, and MOUNTAIN, LTD., are leaders in delivering workforce solutions and integrated services across North America. We help clients get work done more efficiently and economically, without compromising quality. System One not only serves as a valued partner for our clients, but we offer eligible full-time employees health and welfare benefits coverage options including medical, dental, vision, spending accounts, life insurance, voluntary plans, as well as participation in a 401(k) plan.
System One is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, age, national origin, disability, family care or medical leave status, genetic information, veteran status, marital status, or any other characteristic protected by applicable federal, state, or local law.