Sr Site Reliability Engineer
Who we are:
The Trade Desk is a global technology company with a mission to create a better, more open internet for everyone through principled, intelligent advertising.
Handling over 1 trillion queries per day, our platform operates at an unprecedented scale. We have also built something even stronger and more valuable: an award-winning culture based on trust, ownership, empathy, and collaboration. We value the unique experiences and perspectives that each person brings to The Trade Desk, and we are committed to fostering inclusive spaces where everyone can bring their authentic selves to work every day.
Do you have a passion for solving hard problems at scale? Are you eager to join a dynamic, globally connected team where your contributions will make a meaningful difference in building a better media ecosystem? Come and see why Fortune magazine consistently ranks The Trade Desk among the best small- to medium-sized workplaces globally.
About the team:
The Trade Desk Network Team owns end-to-end networking across one of the industry's most demanding infrastructures, spanning large-scale bare-metal datacenters and major public cloud platforms. We work at the intersection of network engineering and software development, embedding deeply with application, datacenter, and SRE teams to design and operate networks that power a global, high-performance ad tech platform. Our team takes a software-first approach to everything we build, and we actively integrate modern AI-assisted development tools like Cursor and Claude into our daily workflows. You will be engineering the future of network automation, not just maintaining the status quo.
Who are we looking for:
We are looking for a Senior Software Engineer who thrives at the intersection of deep networking expertise and software craftsmanship. You will partner closely with SRE and infrastructure teams to shape strategy and build the next generation of network automation, grounded in industry best practices and a bias toward scalable, maintainable solutions. You have an almost obsessive drive to keep networks healthy, performant, and resilient.
What will you be doing:
- Design, build, and scale a global network platform spanning physical datacenters and multi-cloud environments across AWS, Azure, and Alibaba Cloud.
- Support thousands of hosts worldwide, engineering reliable and efficient solutions to petabyte-scale data challenges.
- Own troubleshooting and resolution of complex network issues, upholding high availability and performance across the entire infrastructure footprint.
- Lead root cause analysis and postmortems, turning incidents into actionable improvements that raise the bar for operational excellence.
- Eliminate toil by building tools, automating workflows, and continuously improving the processes your team depends on every day.
- Share responsibility for network integrity through participation in a global, follow-the-sun on-call rotation.
What you bring to the table:
- You have 6-8 years of hands on network automation and operational experience supporting large scale production infrastructure.
- You have a software-first mindset with strong development and networking experience, able to think like an engineer and operate like an architect.
- You bring deep expertise in TCP/IP, the OSI model, and large-scale IP networking protocols including BGP and OSPF.
- You have hands-on experience with Kubernetes networking technologies such as Cilium and Calico, and a solid understanding of container network interfaces (CNIs).
- You have managed software load balancers like NGINX Ingress, Envoy, or HAProxy in large-scale production environments.
- You are skilled at troubleshooting and performance tuning in Kubernetes and Docker environments, with a focus on networking. Experience running Kubernetes clusters on bare-metal is a plus.
- You are proficient in advanced networking technologies, including:
- IPv6 configuration and transition strategies
- Software-Defined Networking (SDN) and SDN controller experience
- Quality of Service (QoS) implementations and bandwidth management
- You have operated network devices at scale using network operating systems such as SONiC, Cisco IOS, JunOS, Arista EOS, or Nokia SR Linux/SR OS.
- You are comfortable with monitoring and alerting systems, writing complex rules and time-series queries using tools like Prometheus and Grafana.
- You practice infrastructure-as-code and apply DevOps and SRE principles to build and manage networks programmatically.
- You know how to build robust workflows and pipelines to test and safely deploy changes to production.
- You have an interest or background in platform engineering and can plan and build infrastructure to support large-scale, distributed systems.
- Proficient creating automation and building tools using Python or Go.
- Experience integrating AI tools (LLMs, MCP, agentic workflows) into engineering processes to automate tasks and improve development velocity.
Key attributes:
Technical Contributions:
- Demonstrated experience building resilient, always-on networks across diverse technologies and layers.
- Data-driven decision maker who evaluates ROI, implementation complexity, and customer impact before committing to a direction.
- Operationally minded: you reduce complexity, mitigate risk, and keep scaling cost-effective as systems grow.
- A track record of self-directed, high-impact contributions to large-scale, long-horizon projects.
Collaboration and Communication
- Strong communication and documentation skills, with the ability to distil complex technical topics and drive alignment across teams.
- Empathetic thinker who understands the broader context and motivations behind objectives, not just the immediate ask.
- Highly collaborative by nature, able to work fluidly across engineering disciplines and bring people together around a shared goal.
One of the best things about working at The Trade Desk is the breadth of technical opportunity, and we do not expect you to walk in knowing every technology we use. What we care about is your ability to learn quickly, think critically, and reach for the right tool for the job. What you know matters less than how fast you grow and how creatively you solve problems. We are not looking for engineers who have all the answers; we are looking for engineers who can invent answers no one has thought of yet, to questions no one has thought to ask.
As an Equal Opportunity Employer, The Trade Desk is committed to creating an inclusive hiring experience where everyone has the opportunity to thrive.
Please reach out to us at accommodations@thetradedesk.com to request an accommodation or discuss any accessibility needs you may require to access our Company Website or navigate any part of the hiring process.
When you contact us, please include your preferred contact details and specify the nature of your accommodation request or questions. Any information you share will be handled confidentially and will not impact our hiring decisions.