Site Reliability Engineer (Cloud Engineering)

Location: Anywhere in North America (Eastern time zone preferred)

At Network to Code, our dedication to pioneering network automation technologies sets us apart from the rest. We don’t just keep up with trends; we define them. Our innovative solutions revolutionize the way organizations deploy, manage, and utilize their networks.

Through a combination of managed and professional services, we implement data-driven network automation strategies grounded in NetDevOps principles. This approach enhances reliability, boosts efficiency, fortifies security, and slashes costs for our clients.

As proud sponsors of Nautobot, the premier open source Network Source of Truth and Automation platform, we’re not just contributing to the industry; we’re leading it. Our efforts haven’t gone unnoticed. We’ve been honored as an Inc. Best Workplace and featured in the prestigious Inc. 5000 list. Additionally, our groundbreaking work has earned recognition in multiple Gartner reports, solidifying our position as trailblazers in the field of network automation.

As a Site Reliability Engineer (SRE) on the Nautobot Cloud Engineering team, you will help deliver and maintain our managed Nautobot SaaS offering. Your primary focus will be operating, supporting, and evolving customer environments in AWS—especially EKS, EC2, and related services—while ensuring uptime, performance, and security. You will also handle occasional escalations for legacy customers running on AKS or on-premises deployments.

This role combines operational excellence with a mindset for continuous improvement. You will work across infrastructure, CI/CD pipelines, and observability tooling, applying DevOps best practices to deliver a reliable, scalable, and secure platform for our customers.

A day in the life

Operate and support Nautobot Cloud deployments in AWS, including EKS, EC2, RDS, and associated services.
Use Jira to manage operational and project-related tasks, track incidents, and document changes.
Support resolution of escalated issues related to other Kubernetes-like, including AKS or on-prem, customers as needed.
Deploy and update Nautobot instances using Helm charts, Kubernetes manifests, and automation workflows.
Automate improvements to CI/CD pipelines (GitHub Actions, Terraform, Ansible) for provisioning, upgrades, and configuration management.
Maintain observability tools (Prometheus, Loki, Grafana) to ensure accurate monitoring, alerting, and logging.
Troubleshoot application and infrastructure issues across containerized environments.
Collaborate with engineers across Cloud Operations, Nautobot Core, and Nautobot Apps teams to deliver cross-functional solutions.
Contribute to documentation for operational runbooks, troubleshooting guides, and architecture diagrams.
Participate in Agile ceremonies, including standups and retrospectives.

What you bring

Passion for reliability, customer success, and operational excellence.
Ability to troubleshoot complex distributed systems and quickly identify root causes.
Strong communication skills—able to clearly convey technical concepts to both peers and customers.
A proactive mindset, looking for opportunities to improve processes and prevent issues before they occur.
Flexibility to adapt to changing priorities and technologies

What you have

3–5 years of experience applying DevOps or SRE practices to production systems.
2+ years experience operating workloads in AWS, with a focus on EKS, EC2, IAM, and networking.
2+ years working with Kubernetes (preferably in production) and Helm.
Experience with IaC tools such as Terraform and configuration management tools like Ansible.
Familiarity with CI/CD pipelines (GitHub Actions, Jenkins, CircleCI, etc.).
Proficiency in scripting languages such as Python or Bash.
Comfortable working in Linux-based environments.
Familiarity with monitoring, logging, and alerting solutions (Prometheus, Loki, Grafana, Datadog, ELK).
Skilled in using Jira to manage operational tasks, incident response, sprint planning, and project tracking. Experience with similar ticketing systems is also a plus.
Analytical and troubleshooting skills using k9s for real-time Kubernetes management and Terraform for diagnosing and resolving Infrastructure-as-Code deployment issues. Prior experience with these tools is a plus.
Networking fundamentals (equivalent to CCNA-level understanding) is a plus.

SUBMIT RESUME

Why Us

At Network to Code, you won’t just work with the brightest minds in network automation—you’ll grow alongside them. We’re a team of builders, innovators, and mentors who believe in pushing the boundaries of what’s possible while lifting each other up along the way.

Our culture is rooted in collaboration, inclusivity, and shared purpose. With a remote-first mindset and a globally distributed team, we stay closely connected through virtual events, team-building sessions, and day-to-day collaboration—so no matter where you are, you’ll feel part of something bigger.

We’re proud to foster an environment where every voice is heard, every idea is valued, and every individual can thrive, whether you’re a seasoned expert or a rising star.

We’ve been recognized as an Inc. Best Workplace and named to the Inc. 5000 list of fastest-growing private companies—proof that we’re not only building cutting-edge solutions, but also a company where people love to work. Our impact has also been acknowledged in multiple Gartner reports, highlighting our leadership and innovation in the field.

Network to Code is proud to be an Equal Opportunity Employer dedicated to celebrating diversity and promoting a culture of inclusivity. We believe in equal opportunities for all employees and qualified applicants irrespective of race, religion, gender identity, sexual orientation, age, disability, national origin, genetic information, veteran status, or any other characteristic protected by law. In addition to federal law requirements, Network to Code is committed to complying with applicable state and local laws governing nondiscrimination in employment.

At Network to Code, we hire individuals who demonstrate aligned talent, merit, experience, and the potential to contribute significantly to our business’s success.

Our compensation packages are designed to be competitive, equitable, and tailored to each employee’s unique experience and qualifications. In addition to a base salary range, we offer discretionary bonuses, option grants, and a comprehensive benefits package. We believe in transparency and aim to ensure that our compensation structures reflect the value our employees bring to the organization and our customers.

Featured benefits

Medical insurance, Vision insurance, Dental insurance, 401(k), Paid maternity leave, Paid paternity leave, Tuition assistance

SUBMIT RESUME

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Author

Parth Shah

View all posts

Site Reliability Engineer (Cloud Engineering)

Site Reliability Engineer (Cloud Engineering)