Search by job, company or skills

Dbs Bank

SVP, Site Reliability Engineering Domain Lead, SRE & Governance, Group Technology

10-15 Years
Save
  • Posted 10 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Roles & Responsibilities

  • Manage a large team of Production Support Personnel across multiple geographical locations covering Applications and Infrastructure
  • Ensure SLAs on Alerts and Incidents (Application & Infra) are proactively managed and reduce Mean Time To Recover (MTTR) by 20%
  • Ensure strict adherence to Standard Operating Procedures for recovery across Application and Infrastructure layers
  • Deliver a playbook for onboarding new tasks / activities covering both Application and Infrastructure support models
  • Identify opportunities to automate Production support activities (App & Infra) and reduce manual interventions
  • Drive application and infrastructure improvements including performance, capacity, resilience, and operational stability; eliminate toil through automation
  • Automate manual activities/processes and system health checks for Production Applications and Infrastructure; ensure SLIs/SLOs are defined and met
  • Follow Production Support Processes and provide inputs to continuously strengthen them for App + Infra operations
  • Provide status to leads, stakeholders and work with vendors to review Infra/Application design, fixes, and production deployments
  • Coordinate recurring issues and ensure long-term resolution through robust Incident and Problem Management across Infra and Application domains
  • Work with Infrastructure, Development, and Platform teams for root cause analysis of complex issues and outages
  • Drive strong stakeholder management with focus on service stability, continuous improvement, and delivery excellence across Infra and Applications
  • Lead Root Cause Analysis with technology partners and facilitate RCA reviews post incident resolution
  • Work with Risk teams to respond to Audit & Risk RFIs; manage audit walkthroughs covering Infrastructure and Application controls

Requirements

  • 10–15 years of experience in Banking with minimum 5+ years in a Run-the-Bank (RTB) Lead role covering Application and Infrastructure Support
  • Strong implementation of Site Reliability Engineering (SRE) principles across Applications and Infrastructure including performance, reliability, monitoring, alerting, and maintenance
  • Proactive capacity monitoring and observability of Production Infrastructure (compute, storage, network, platform, MF and DB) with automated alerting and reporting
  • Proven experience in automation of Infra & Application support tasks and reducing manual toil
  • Build and maintain monitoring and automation solutions for Infrastructure and Application stacks
  • Drive service improvements by tracking SLIs/SLOs/SLAs and improving system and infrastructure performance KPIs
  • Strong technical understanding across RDBMS, Unix/Linux, Cloud platforms, and Infrastructure components (servers, network, middleware, containers)
  • Hands-on knowledge of infrastructure technologies, especially Linux, Database, OpenShift (or container platforms)
  • Solid understanding of BAU support, Incident/Problem Management, and escalation management across distributed Infra-App environments
  • Good understanding of Infrastructure architecture, capacity planning, DR/BCP, IT security, and regulatory compliance
  • Strong collaborator with experience working across global teams and vendors
  • Ability to present recommendations effectively in both written and verbal formats
  • Proactive, independent, resourceful, and team-oriented mindset

Location:

DBS Asia Hub

Job:

Technology

Schedule:

Regular

Employee Status:

Full time

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 148963017

Similar Jobs

Singapore

Skills:

JavaUnixOpenshiftProblem ManagementAutomationAixRDBMSCloudPerformance MonitoringOracleIT Regulatory ComplianceAlertingDisaster RecoveryRisk ManagementSite Reliability EngineeringIT Security ArchitectureBusiness Continuity

Singapore

Skills:

OracleJBoss EAPMicroservicesJavaMqWeblogicRestful ApiConnectDirect messagingUnix operating systems

Singapore, Robinson

Skills:

Identity And Access ManagementGcpAzureKubernetesAWSDistributed systems architectureEncryption and key managementEnterprise data architectureContainer orchestrationRegulatory compliance frameworksMulti-tenant SaaS architectureData governance and analyticsAPI gateway and integration architectureObservability and monitoring platformsHigh-performance transaction systemsSecurity ArchitectureService-oriented and microservices platformsInfrastructure-as-CodeReal-time processing platformsEvent-driven architectures