Here's your chance to join a diverse global team and play a central role in building the next generation of Internet Services that touch hundreds of millions of users across the globe every day.
Our client's Cloud Platform Department is building very large scale, high availability platforms to empower the company's entire range of Internet Services worldwide.
What you will be doing as a Data Reliability Engineer:
Constantly re-evaluate the existing architecture, infrastructure and process and take actions to make a change.
Guide the team to new technologies and best practices.
Develop new functions and maintain operations tools and configuration management system.
Automating operations for the existing system platform.
Incident handling and trouble shooting. This includes being part of the 24x7 team.
Work with other team members who are in a different time zones.
Qualifications: The skills you need:
- +3 years of experience managing Cassandra or Couchbase.
- +3 years of experience as Linux system administration.
- +2 years of experience writing chef cookbooks.
- +2 years of experience using Prometheus, Grafana and kibana.
- +3 years of experience using public cloud infrastructure.
- Deep understanding of networking protocols (TCP/IP, SSH, DHCP, HTTP, HTTPS, DNS, GOSSIP), packet structure and load balancing equipment.
- Preferably experience writing groovy scripts and using Jenkins.
- Excellent written and verbal communication skills.
- Very strong will to automate everything.
- Strong eagerness to learn new technologies.
- Ability to effectively work with members living in different time zones.
※ Applicants must be residing in the Tokyo area with permission to work in Japan.