Site Reliability Engineer
We are looking for an experienced Site Reliability Engineer for a long term role in a project in Lisbon or Porto. This person will help building, supporting, and managing the activities towards the best practices implementation.
This person will:
- Participate in the solution definition to ensure its operability;
- Ensure the solution resilience, acting as a SPOC within the team:
- Collaborate in the definition of performance tests;
- Participate in the definition of resilience tests.
- Ensure the solution observability:
- Define monitoring requirements (e.g. log types);
- Validate performance metrics and monitoring KPI's. - Challenge the best practices for CI/CD solution and its evolution;
- Work with stakeholders to fully understand and communicate the Root Cause Analysis and implement the lessons learnt;
- Look at monitoring KPI’s & logging efficiency to introduce new tools towards a more reliable solution;
- Drive initiatives to make the solution (and all its components) more reliable – that is, less prone to cause support tickets;
- Work with developers during the software development lifecycle to ensure that developed services are operationalized.
What are we looking for?
- Familiar with DevOps culture;
- Experience in application reliability practices for client (internal and client) facing experiences;
- Experience with Environments & Infrastructure (Unix/Linux);
- Experience with Cloud (AWS, Oracle, Azure);
- Experience with Containers (Docker, Kubernetes);
- Experience in business/technical assessments on solutions life cycle asset management processes.
Personal traits:
- Ability to adapt to different contexts, teams and Clients;
- Teamwork skills but also sense of autonomy;
- Motivation for international projects and ok if travel is included;
- Willingness to collaborate with other players;
- Strong communication skills.