Site Reliability Engineer I

CROSSLINK Professional Tax Solutions - Florida

Quick Apply

Job Details

Full-time $78,000 - $105,800 a year 9 days ago

Qualifications

Jira
Computer science
Cloud infrastructure
Go
Computer Science
Kubernetes
Ansible
DevOps
Google Cloud Platform
Microservices
AWS
Docker
Bachelor's degree
SRE
Distributed systems
Terraform
Scripting
GitHub
Cloud computing
IT
Chef
Jenkins
Communication skills
Python
Shell Scripting

Full Job Description

Description:

A Site Reliability Engineer (SRE) is an advanced DevOps role that combines software engineering and systems administration to ensure the scalability, performance, and reliability of large-scale, cloud-based applications and infrastructure.

A SRE has the overall responsibility of taking a proactive approach in detecting issues, automatically handling failures, preparing disaster recovery plans, keeping the system up and reliable, and mitigating broken systems and preventing them from causing future disruptions.

PRIMARY RESPONSIBILITIES

Ensure system reliability and availability

Monitor system issues.
Create strategies to detect issues.
Address those issues.
Design systems to troubleshoot automatically.
Write and review post-mortems.

Mitigate operational risks

Collaborate with development teams and other stakeholders to identify potential risks.
Once risks are identified, analyze and evaluate potential impact and likelihood of occurrence.
Based on the risk assessment, implement various risk mitigation strategies to mitigate operational risks.
Continuously monitor and review the effectiveness of risk strategies.

Monitor system health

Study historical trends in terms of performance by using metrics like charts and graphs.
Trace the problems with system monitoring tools.
Monitor log files to manage infrastructures at scale.
Minimize emergency response

Maintain internal tooling

GitHub workflows
AWS
Jenkins
Jira

Other tasks as assigned

Requirements:

Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent experience).
Proven experience in designing, building, and operating large-scale distributed systems or cloud-based infrastructure.
Proficiency in scripting and programming languages such as Python, Go, or Shell scripting.
Deep understanding of networking and distributed systems.
Experience working in cloud computing environments (e.g., AWS, GCP).
Hands-on experience with containerization technologies (e.g., Docker, Kubernetes) and microservices architecture.
Strong knowledge of infrastructure as code principles and tools (e.g., Terraform, Ansible, Chef).
Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
Excellent problem-solving skills and a proactive approach to troubleshooting complex issues.
Effective communication skills and the ability to collaborate with cross-functional teams in a fast-paced environment.

Must be living in following states to qualify:

AR, AZ, CA, CO, FL, GA, ID, IL, KS, MN, SC, TN, WA

Quick Apply

Job Seeker Tools

Employer Tools

Browse

Stay Connected