The Role:
The Senior Cloud Database Operations Engineer is responsible for the design, deployment, optimization, and day-to-day operational management of the company's mission-critical cloud database infrastructure. This role focuses on ensuring financial-grade data reliability, high availability, and performance across AWS-managed and self-managed database services. You will serve as the database domain expert within the operations team, driving database automation, disaster recovery strategies, and data security compliance initiatives.
Job Description:
- Cloud Database Management: Lead the deployment, configuration, and lifecycle management of AWS-managed database services, including Amazon RDS (MySQL, PostgreSQL, Oracle), Amazon Aurora, Amazon DynamoDB, Amazon ElastiCache (Redis/Memcached), and Amazon Redshift.
- High Availability & Disaster Recovery: Design and maintain database HA architectures (Multi-AZ, Read Replicas, Global Databases) and DR strategies (cross-region replication, automated failover) to meet financial-level RPO/RTO SLA requirements.
- Performance Tuning & Optimization: Conduct proactive database performance monitoring, slow query analysis, index optimization, and capacity planning to ensure optimal throughput and low-latency response for core business systems.
- Backup & Recovery: Establish and maintain robust backup policies (automated snapshots, point-in-time recovery, cross-region backup), and regularly conduct recovery drills to validate data integrity and recoverability.
- Database Automation & Tooling: Design and develop internal database automation tools and scripts (using Python, Shell, or Golang) for routine operations such as automated provisioning, schema migration, parameter tuning, health checks, and alerting integration.
- Database Security & Compliance: Manage database access controls, encryption at rest and in transit (KMS, SSL/TLS), audit logging, and data masking to meet strict financial security auditing and regulatory compliance standards (e.g., MAS TRM).
- Migration & Upgrades: Plan and execute database migration projects, including on-premises to cloud (using AWS DMS, SCT), cross-engine migrations, and major version upgrades with minimal downtime.
- Monitoring & Alerting: Build and maintain comprehensive database observability dashboards using CloudWatch, Prometheus, Grafana, and integrate with centralized alerting platforms for proactive incident detection.
- Incident Management: Lead database-related incident troubleshooting (replication lag, connection storms, deadlocks, storage issues), perform root cause analysis, and participate in on-call rotations to ensure 24/7 database uptime.
- Documentation & Knowledge Sharing: Maintain up-to-date database runbooks, SOPs, and architecture documentation mentor junior team members on database best practices.
Job Requirement:
- Education: Bachelor's degree in Computer Science, Software Engineering, Information Systems, or a related technical field.
- Experience: 3-5 years of solid experience in Database Administration (DBA) or Database Operations roles, with at least 2 years focused on cloud database environments.
- AWS Database Proficiency: Expert knowledge of the AWS database ecosystem, including Amazon RDS (MySQL/PostgreSQL/Oracle), Aurora, DynamoDB, ElastiCache, Redshift, DMS, and related services (KMS, CloudWatch, Secrets Manager).
- RDBMS Expertise: Deep hands-on experience with at least two of the following: MySQL, PostgreSQL, Oracle, or SQL Server - including performance tuning, replication, partitioning, and high-availability configuration.
- NoSQL & Caching: Practical experience with NoSQL databases (DynamoDB, MongoDB) and in-memory caching systems (Redis, Memcached) in production environments.
- Programming & Scripting: Proficient in Python and Shell scripting for database automation familiarity with Golang is a plus.
- Infrastructure as Code: Experience with Terraform or CloudFormation for database resource provisioning and management.
- Observability: Familiar with database monitoring and alerting using CloudWatch, Prometheus, Grafana, and log analysis tools (CloudWatch Logs, ELK, or Loki).
- Networking Fundamentals: Solid understanding of VPC, Security Groups, NACLs, and network connectivity as they relate to database access and security.
- Industry Context: Prior experience in Financial Services (FinTech, Banking, or Payments) is highly preferred familiarity with financial regulatory requirements for data management is a strong advantage.
- Certifications (Preferred): AWS Certified Database - Specialty, AWS Solutions Architect - Associate, or equivalent cloud database certifications.
- Soft Skills: Strong analytical and problem-solving abilities proven ability to work under pressure with tight timelines excellent communication skills for cross-team collaboration proactive team player with strong professional ethics.