Senior Splunk Engineer

HCL TechBee

Singapore

8-10 Years

Save

Posted 16 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Job Summary

We are looking for a highly skilled Senior Splunk Engineer to design, implement, and manage enterprise-scale SIEM and observability solutions. This role will focus on enhancing system visibility, ensuring platform reliability, and supporting security and compliance requirements within a regulated environment. The ideal candidate will have strong expertise in Splunk, cloud platforms, and SRE practices, along with the ability to troubleshoot complex issues and drive continuous improvements.

Responsibilities

Design, implement, and maintain Splunk-based SIEM and observability platforms.
Develop and optimize log ingestion, parsing, correlation searches, dashboards, and alerts.
Integrate Splunk with cloud platforms (AWS, Azure) and enterprise tools such as ServiceNow and Datadog.
Define and implement monitoring strategies, including SLIs/SLOs, service health models, and alerting frameworks.
Perform incident investigation, troubleshooting, and root cause analysis (RCA) for system and application issues.
Build and implement automation and auto-remediation solutions using Terraform, Ansible, and Python.
Support CI/CD pipelines for Splunk configurations and infrastructure deployments.
Ensure adherence to security, compliance, and regulatory standards, particularly within financial services environments.
Collaborate with cross-functional teams (Infrastructure, Security, DevOps, and Application teams) to improve observability and reliability.
Drive continuous improvement initiatives and adopt SRE best practices.

Requirements

Bachelor's degree in Computer Science, Engineering, or a related discipline.
Minimum 8 years of experience in Infrastructure, Cloud, or SRE roles, with at least 5 years specializing in Splunk/SIEM engineering or observability.
Strong hands-on expertise in:
SIEM Platforms: Splunk (mandatory), Elastic (ELK Stack)
Automation & IaC: Terraform, Ansible, Python, CI/CD tools
Cloud Platforms & Integrations:
AWS (CloudWatch, X-Ray, CloudTrail)
Azure (Monitor, Log Analytics, Application Insights)
Datadog, ServiceNow
Deep understanding of SRE principles, including service health modeling, SLIs/SLOs, error budgets, and auto-remediation.
Strong analytical and troubleshooting skills with experience in deep-dive investigations and long-term solutioning.
Familiarity with financial sector operational resilience, regulatory compliance, and incident governance frameworks.
Excellent written and verbal communication skills.
Strong interpersonal skills with the ability to engage and collaborate with diverse stakeholders.
Agile mindset with the ability to learn quickly and adapt to changing environments.