Basic Information
Ref Number
Last day to apply
Primary Location
Państwo
Job Type
Work Style
Description and Requirements
- Bachelor’s degree in Information Technology, Computer Science, or Computer Engineering
- Experience with cloud platforms
- Linux/Windows skills (using the command line, Powershell, Bash shell, etc)
- Experience with scripting and automation tools such as Bash and Python
- Ability to work effectively as part of a team
- Excellent problem-solving skills
- Proficiency in Kubernetes and containerized workloads
- Experience with writing automated performance and/or disaster recovery tests
- Strong understanding of networking concepts, including TCP/IP, DNS, and HTTP
- Ability to adapt to new technologies and changing business needs
- Cloud Provider: Azure
- Database: Microsoft SQL Server
- Message Broker: RabbitMQ
- Applications: .NET, Docker, Kubernetes
- Monitoring: Grafana, Prometheus, Loki, Tempo
Additional Job Description
- Communicate and collaborate with software engineering teams to learn and document how their applications are designed to run in production to identify points of failure and bottlenecks
- Establish and implement monitoring requirements and incident response procedures for both system and application level availability and performance
- Assemble, create, and maintain monitoring applications and resources
- Design and implement automated performance, and disaster recovery tests to proactively identify issue
- Troubleshoot incidents, identify root cause, fix and document problems, and implement preventative measures
- Define and document standard run books and operating procedures
- Work on-call shifts to remediate issues before SLAs are violated
- Actively facilitate continuous improvement by staying current with industry trends to advance our processes and workflows
EEO Statement