Team:
Our Tech & product team is defining the next generation of trusted enterprise computing in the cloud. We're a fast-paced, agile and innovative team. We're highly collaborative and work across all areas of our technology stack. We enable critical services for the business, qualify complex compute changes, trail-blaze new engineering solutions for the cloud.
Responsibilities:
- You have experience balancing live-site management, feature delivery, and retirement of technical debt across an entire development team.
- Develop tools/products for service deployments, system configuration management, monitoring, diagnostics, and performance measurement
- Supporting Homegrown Self Serve services and software design solutions
- Review and Management of version-controlled deployment technologies
- An unwavering love of shipping software.
- Excited about building reliable, self-healing services on unreliable hardware.
- Design, deployment and continuous improvement of important infrastructure services
- Provide input into long-range platform requirements and operational guidelines, with a focus on automation and continuous improvement of Platform Service availability.
- Analyze and understand how stakeholders are using the platform and help drive continuous improvement of the offering based on that.
- Ensure that we are continuously raising our standard of engineering excellence by implementing best practices for coding, testing, code coverage and deployment
- Scoping of work, management of backlog and predictability of project delivery.
- Share daily operational responsibilities with the team
Required Skills:
- BS or MS in Computer Science or equivalent experience.
- Strong CS fundamentals including data structures, algorithms, and distributed systems.
- You care about code simplicity and performance
- Min 5-8 years of industry experience in designing, building, supporting, scalable, available, and low-latency distributed systems.
- Understanding of OO programming and concepts (Java, C++, C#, Python)
- Experience designing, developing, debugging, and operating resilient distributed systems that run across thousands of compute nodes in multiple data centres.
- Fluency in one or more scripting languages such as Python, ruby.
- Solve challenging technical problems related to security, parallel and distributed systems, programming, resource management, large-scale system maintenance, and more!
- Experience with AWS or GCP or another cloud PaaS provider.
- Solid understanding of how to configure, deploy, manage and maintain large cloud-hosted systems including auto-scaling, monitoring, performance tuning, troubleshooting and disaster recovery.
- Proficiency with source control, continuous integration, and testing pipelines.
- Being a great listener, collaborator, communicator, and mentor.
- Championing a culture and work environment that promotes diversity and inclusion
- A strong background in open source technology.
- Experience using telemetry and metrics to drive operational excellence
- Experience with building APIs and services using REST, SOAP, etc.
- Experience in the development of distributed/scalable systems and high-volume transaction applications
- Knowledge of professional software engineering and best practices for the full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations
- Experience with building services on top of relational and non-relational stores like SQL Server, MySQL, PostgreSQL, Cassandra, or MongoDB
Preferred Skills:
- Experience with Containers and orchestration services like Kubernetes, Docker etc.
- Experience with HBase, Hadoop, and large-scale big data systems.
- Knowledge about cloud security and best practices.
- Deep understanding of fundamental network technologies like DNS, Load Balancing, SSL, TCP/IP, SQL, HTTP.