Search by job, company or skills

Smart IMS Inc.

Service Reliability Engineer (Production Support- Banking domain)

7-9 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role Purpose

The Service Reliability Engineer is responsible for the stability, availability, and support of production systems across multiple business units.

This role owns application support, incident management, vendor coordination, and service reliability, ensuring critical platforms remain operational during market hours and beyond.

The role is hands-on, business-facing, and focused on preventing incidents, managing live issues effectively, and improving operational resilience.

Key Responsibilities

Service Reliability & Application Support

  • Own day-to-day production support for business-critical applications.
  • Ensure systems are stable, available, and performant, especially during market hours.
  • Act as the primary point of contact for production issues impacting the business.
  • Coordinate support coverage, including extended hours where required.

Incident & Problem Management

  • Lead major incidents end to end, including coordination, communication, and resolution.
  • Run war rooms and ensure timely updates to stakeholders.
  • Conduct post-incident reviews, identify root causes, and drive permanent fixes.
  • Track recurring issues and work with technology teams to reduce repeat incidents.

Operational Readiness

  • Own and coordinate daily system health checks before market open.
  • Ensure operational readiness for key events, releases, and business changes.
  • Support go/no-go decisions for releases and production changes.

Vendor & SaaS Management

  • Manage incidents involving third-party vendors and SaaS platforms.
  • Track SLAs, response times, and vendor performance.
  • Challenge gaps in monitoring, alerting, and communication.
  • Drive corrective actions and follow-ups with vendors.

Change & Stability Improvement

  • Partner with delivery and engineering teams to ensure changes do not compromise stability.
  • Identify and recommend improvements in monitoring, alerting, and support processes.
  • Help establish clear runbooks, escalation paths, and support procedures.

Stakeholder Communication

  • Communicate clearly with business, operations, compliance, and senior management.
  • Provide executive-level incident summaries and status updates.
  • Act as a calm and trusted presence during high-pressure situations.

Qualifications & Experience

Essential

  • 7+ years experience in production support, incident management, or IT operations.
  • Strong experience supporting business-critical applications in financial services, trading, or regulated environments.
  • Proven ability to manage major incidents and high-pressure situations.
  • Experience working with vendors and SaaS platforms.
  • Strong communication skills, with the ability to engage business and senior stakeholders.
  • Good understanding of ITIL / ITSM concepts (formal certification not mandatory).

Desirable

  • Experience in securities brokerage, trading platforms, or market-facing systems.
  • Exposure to problem management, change management, or service reliability roles.
  • Understanding of regulatory and audit expectations in financial services.
  • Experience with monitoring tools, incident tracking systems, and runbooks.

Key Attributes

  • Calm under pressure
  • Strong sense of ownership and accountability
  • Practical and outcome-focused
  • Comfortable being hands-on when required
  • Respected by both technical and business teams

Please share your resume at [Confidential Information]

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 142596243