Engineer, Systems Reliability
The System Reliability Engineer (SRE) improves and protects the software and systems behind all of T-Mobile's IT services, including management of scalability, availability, latency, performance, security, and capacity, and delivering of software faster, better, and cheaper. From designing & maintaining CICD Pipelines to building the next generation of TMobile applications on cloud native platforms, the SRE's enable great customer experience and product innovation by continuous improvement of operational support.
What you'll do in your role.
Technology and System
- Demonstrates fluency in emerging DevOps-centric automation tools and technologies for CICD, Configuration management, etc. for all environments.
- Creates, manages, and uses dashboard for continuous monitoring and health check of applications, and the underlying infrastructure, improve the quality of services using the monitoring feedback for non-production environment.
- Contributes in future improvement of software delivery processes and operations, e.g., cloud enablement, use of microservices with containerization.
- Monitor and oversee comprehensive data warehouse systems to balance optimization of data access with batch loading and resource utilization factors, according to customer requirements.
- Co-ordinates with Auditors and stakeholders on Sox and CPNI compliance requests.
- Provide guidance to DevOps teams on compliance and Security on associated process and roles/responsibility for solution and remediation.
The experience you'll bring.
- 2-4 years Relevant experience as a Systems Reliability Engineer with data warehousing platforms
- Experience with data analytics databases like Teradata, Big Data (Hadoop)
- Experience with data analytical tools including data virtualization and data visualization tools Qlik, Denodo, Informatica and PowerBI
- Experience in one or more of: C, C#, Java, Perl,Python, Go, or scripting experience in Shell and Perl.
- Requires proficiency using Microsoft tools (including SharePoint Online, MS Teams, and Office products)
- Requires proficiency using Atlassian tools (including Confluence and JIRA)
- Experience in Continuous Integration/Continuous Delivery tools, such as, Jenkins, Cloudbees, etc., and other automation tools.
- Experience with DevOps tools, such as, Ansible, Chef, Puppet, etc. Experience in Docker, Kubernetes, etc. is preferable.
- Experience in APM tool, like, AppDynamics, logging tool, like Splunk.
- Experience working in a cloud environment (public/private).
- Experience with Data Security, Sox Audits and CPNI compliance
- Experience with managing SLA’s and KPI’s for multiple operational systems
- Experience with standardizing playbooks, operational processes and procedures
- Experience with Change Management and Release Management of Data Analytical solutions in an Agile DevOps environment.
- Minimum of two years working in ITIL service or similar incident/problem management framework (strongly preferred)
- Bachelors Degree in Computer Science, Information Technology or Related Field