Responsibilities:
- Engage with product, architects, developers, Certification, Project management, Operations & Infrastructure teams from the start of the SDLC phase.
- Become subject matter expert for the assigned product verticals. Analyze complex systems from a reliability and resilience perspective.
- Run the production environment by monitoring availability and taking a holistic view of system health
- Understanding the end-to-end product topology from infrastructure and application perspective.
- Identify sources of instability in large-scale distributed systems and drive operational excellence. Dive deep and understand every issue occurred and own them completely for end-to-end closure.
- Performing functional analysis of products by gathering and analyzing metrics from both operating systems and applications to assist in performance tuning and fault finding - integration/operational challenges.
- Performing code bug fixes in production and recommending any architectural improvements during issue/incident analysis.
- Work closely with development and product teams on suggesting new features and enhancements based on live issues.
- Drive down the burden of toil with tooling and automation to achieve operational efficiency and smoother customer experience.
- Technical consultancy for monitoring, incidents and problem management. Lead technical bridges and interact with both technical staff and management during the incident and change management process.
- Engage with tech and non-tech partners on regular basis to analyze functional and technical in-depth solutions.
- Understanding new changes in production systems and assessing its risk from application perspective for driving reliability and availability
- Provide guidance and technical expertise to junior team members.
Preferred Qualifications:
- 5+ years of experience with Java applications, SQL/NoSQL, Handling Production operations.
- Experience working with any log analysis tools and observability applications like Grafana, Prometheus, Splunk.
- Excellent communication, collaboration, and problem-solving skills
- Knowledge on Docker/Kubernetes, Caching, Kafka etc.
Preferred Requirements:
- Strong work ethic, leadership skills, excellent judgment and good time management in prioritizing work, and the ability to work in fast paced, team-oriented environment.
- Strong technical background with full stack application knowledge with hands on experience on REST technologies.
- Knowledge in Java or related technologies would aid in code bug fixes, platform enhancements, understanding the products supported better and also to support integration related issues.
- Need to have an excellent systems and product architecture understanding from application components and infrastructure perspective such as network, load balancer, firewall, gateway services etc.