About the role: The CX SRE team includes expert Software and System engineers who are custodians of the Availability, Scalability and Performance of the SaaS products. We build tools and frameworks to monitor, load test and sometimes build full platform features that other products use. We undertake architecture reviews and help the individual product teams to identify performance bottlenecks. We look at the application from a system perspective bottom-up rather than top-down. Our engineers have the freedom to pick the challenges that they work on and own the task to completion.
Roles and Responsibilities:
● Strong software development skills in any one of the technologies (Python,Ruby,Java etc.,)
● Flag down potential performance issues at the development stage
● Provide technical assistance and ensure performance SLA are met
● Ability to derive and define NFR goals (SLIs, SLOs) and requirements
● Experience in APM/tracing tools and work collaboratively with development teams to
● enable quicker resolution of performance issues.
● Ability to deep-dive and find root cause for performance and scalability issues in
● production/non-prod environments
● Undertake system level debugging in Linux independently
● Mentor the juniors on performance engineering areas and debugging
● Collaborate and work closely with the product engineering team to share knowledge,
● tools, and best practices on performance engineering and scalability areas
● Knowledge on internal technologies such as load balancers, firewalls and database servers
● Broad technical and working knowledge on Kubernetes, docker, Windows, Linux,
● Prometheus, Grafana. Expert knowledge on telemetry configuration and alerting
● Bring in automation practice wherever applicable
● Experience building and scaling large-scale performance services
● Ensuring availability, scalability and resilience of our SaaS platform and services
Requirements:
● 7-12 years of experience in design, development and architecture
● Experience with open source load testing tools such as Apache JMeter, httperf etc and APM, infrastructure monitoring tools like SumoLogic, NewRelic and AppDynamics.
● Working knowledge in any one of the cloud platforms (AWS, Azure, GCP)
● Software automation development skills (at least one) – Python, GO, or Java.
● Experience in working with SQL/NOSQL databases
● Passionate about performance, reliability, scalability and resilience
● A self-starter, able to architect, build, drive and advocate for Snow SRE solution
● Proven ability to work with multiple teams and multi-task and prioritising