Job Description
- Strong Site Reliability Engineering(SRE) Experience with 5 to 8+ yrs of hands-on experience
- Experience working on Linux based infrastructure and Prior experience managing infrastricture with VMs and Non-VMs
- Monitoring logs with Splunk or similar tools , good in maintaining CX and KPIs , performance optimization
- Hands-on experience with scripting knowledge with -bash or Python scripts
- Hands-on Experience with Ansible/Spinnaker for Config management , experience managing Ansible Playbooks in past
- Good Knowledge of AWS Cloud Infrastructure and respective services in enterprise setup , orchestration with Kubernetes
- System / Software Engineering Background – Production code changes, alerting and monitoring, complex troubleshooting, implementing changes
- Defining and setting development, test, release, update, and support processes for DevOps operation
- Troubleshooting Support Escalation – Understanding of critical issues to route support escalation incidents to concerned teams
- Process Optimization – Increase reliability and performance through process optimization
- Experience working with enterprise grade Java or Python applications , any experience with Kafka is useful for this role (nice to have)
- Awareness of critical Agile principles