Senior Cloud Service Reliability Engineer

Recruiter
PVH
Location
Bangalore, India
Posted
22 Sep 2022
Closes
29 Oct 2022
Ref
PVH1USR33589WDINTERNALENGLOBALEXTERNAL
Function
Technology
Hours
Full Time
POSITION SUMMARY:

The Cloud Service Reliability Engineer is responsible for effective design, implementation, operation and maintenance of infrastructure on premise and in the cloud. Contribute as a core team member in the design, development, testing, and support of data analytics systems. This position requires evaluation, implementation and management of software tools and practices to mitigate risks and introduce operational efficiencies.

PRIMARY RESPONSIBILITIES/ACCOUNTABILITIES OF THE JOB:
  • Ensure that our hybrid cloud environment - Specifically Google Cloud Platform (GCP), Amazon Web Services (AWS) and Microsoft Azure meets requirements for redundancy, scalability, performance and security.
  • Expert-level knowledge in Microsoft technologies including Active Directory, Azure Active Directory, Office 365, and windows servers
  • Work collaboratively with server, storage, network and applications teams to help set the directions on all current and future projects and processes related to hybrid cloud infrastructure
  • Provide hands-on technical expertise to design, deploy, secure and optimize cloud services
  • Familiarity with container solutions (kubernetes, Docker, etc.)
  • Be proficient in one or more Configuration management tools including Puppet, Chef, Fabric, Ansible, and/or Salt
  • Working knowledge of infrastructure as code (IaC) software tools such as Terraform/Ansible with a demonstrated implementation.
  • Design & implement DevOps Best practices, establish standards and policies for managing source code and continuous integration/delivery using Jenkins and Github.
  • Manage multi-tenant infrastructure and data analytics systems consisting of tehcnologies like Hadoop, MapR, Informatica and other data related technologies.
  • Collaborate with product managers, lead engineers and data scientists on all facets of Big Data ecosystem.
  • In-depth understanding of networking, distributed systems, cloud design patterns, API's, and security
  • Engage in service capacity planning and demand forecasting, performance analysis and system tuning.
  • Investigate, evaluate, test and recommend technical solutions for future systems.
  • Participate in a 24x7 on-call rotation to handle product availability issues as well as urgent customer support escalations
  • Ability to work on complicated projects with multiple stages and convert long term strategy into short and long-term objectives
  • Participate in architecture reviews.

QUALIFICATIONS & EXPERIENCE:

Experience:
  • Minimum of 8 year of experience in engineering site reliability, Linux, Windows, DevOps, and maintaining infrastructure on premise and in cloud environment.
  • Possess at least 3 years of managing a multi-tenant production Hadoop or other data analysis environment.

Education:
  • Bachelor's Degree in Information Technology
  • Cloud Systems Administrator or Developer certification considered a plus
Skills:
  • A deep understanding of operating systems and computer architecture
  • Well versed with DevOps and SRE practices
  • Strong knowledge and understanding of microservices based architectures, APIs, etc
  • Ability to write scripts from scratch using Python, Perl or Ruby
  • Strong analytical and troubleshooting skills
  • Experience with Splunk, Solarwinds and other operational monitoring tools
  • Highly collaborative with effective written and verbal communication skills
  • Ability to concentrate on a wide range of loosely defined complex situations, which require creativity and originality, where guidance and counsel may be unavailable.

Internal: This position will be required to work with technical resources through the leadership level (up to the VP level) of the Corporate Applications organization that is responsible for database and business analytics services. Must be able to liaise between the multiple organizations within the Infrastructure & Operations team, often coordinating project and BAU efforts the systems, network and operations teams. Be able to present issues and suggestions to Infrastructure & Operations management up to and including the SVP responsible for the area.

External: This position will be required to interact with multiple external service providers and companies. They will need to interface with managed services and support providers in the big data technology space. Responsible to coordinate activities around platform health, stability, architecture, design and disaster recovery. The position will also be required to engage external public cloud service providers and have discussions on cloud services, service levels, and costs associated with their services. This will span technical and account management vendor resources.

______________________________________________________________

SUPERVISORY RESPONSIBILITIES:

Direct: N/A

Indirect: N/A

________________________________________________________________

BUDGETARY RESPONSIBILITIES:
  • Monitors and reviews technology and project budgets.

DECISION MAKING:
  • Recommend best methods for technical resolutions
  • Develop Infrastructure standards
  • Recommends and evaluates vendors for projects
  • Delegation of day to day problems and / or issues that may arise

RESOURCEFULNESS/CREATIVITY:

Enjoy technology. Ability to change, as the business needs evolves. Stay on pace with the rapid change present within the cloud computing space.

ENVIRONMENT:

Ability to work in a fast-paced environment where change is the norm.

This a global role.