Lead Systems Reliability Engineer (Linux & Distributed Systems)
The Trade Desk is a global technology company with a mission to create a better, more open internet for everyone through principled, intelligent advertising. Handling over 1 trillion queries per day, our platform operates at an unprecedented scale. We have also built something even stronger and more valuable: an award-winning culture based on trust, ownership, empathy, and collaboration. We value the unique experiences and perspectives that each person brings to The Trade Desk, and we are committed to fostering inclusive spaces where everyone can bring their authentic selves to work every day.
Do you have a passion for solving hard problems at scale? Are you eager to join a dynamic, globally- connected team where your contributions will make a meaningful difference in building a better media ecosystem? Come and see why Fortune magazine consistently ranks The Trade Desk among the best small- to medium-sized workplaces globally.
What we do
We are looking to hire a Lead Systems Reliability Engineer to join our engineering team to continue building and maintaining our data-driven platform. We leverage technologies like Aerospike, MongoDB, and Kafka to perform many real time activities, translating to with a p99 latency under 1 millisecond on the back end!
Do you enjoy tuning, performance testing, troubleshooting, automation, and operating at scale? Does testing next-gen hardware, evaluating data access patterns, and designing automation around distributed systems excite you?
What makes this role different:
- First in the Industry: The Trade Desk is the first company to run over 5MM QPS to NVMe in Aerospike on a single node, forcing core software redesigns to achieve this scale.
- Work on Cutting-Edge Hardware: Design clusters with nodes featuring 300TB of NVMe, 3TB RAM, and 512 cores, delivering a global 2,500GB/s throughput directly from flash.
- Shape the Future of Infrastructure: Spec your own systems and collaborate directly with AMD and NoSQL vendors to run PoCs and optimize bleeding-edge technology for internet-scale workloads.
- Deep Performance Engineering: Dive into kernel, hardware, and system interactions, leveraging tools like flamegraphs, NUMA counters, BIOS tuning, and synthetic testing to achieve world-class performance.
- Push Hardware Endurance Limits: Build clusters engineered to withstand over 1 zettabyte of endurance.
What you’ll do:
- Lead a team to influence, manage, and plan work streams, systems, and data structures at scale within a global ecosystem, spanning multiple infrastructure providers (cloud and traditional datacenters).
- Encourage, improve, and build infrastructure automation in a way that works with stateful systems at scale.
- Own operations for Linux-based systems running Aerospike, Kafka, and Mongo.
- Serve as a point of contact to review new use cases, answer questions, and participate in on-call rotation.
- Learn to be a NoSQL SME. You do not need experience to apply – we will train you.
- Benchmark and analyze next generation hardware offerings.
Who you are:
Skills and Experience
- Linux operating system
- Leadership experience and ability to mentor
- Troubleshooting
- Techniques for isolation, scientific method
- Identify bottlenecks (Is it CPU? IO?)
- Nice-To-Have experience:
- Physical hardware (on-prem) internals, management, and operation
- Performing testing and tuning
- Databases (relational or NoSQL)
- Ansible/PyInfra/Chef
- Prometheus
- Kubernetes
- Python/Ruby/Rust/Bash/Golang/C#
- Thinking beyond the task at hand to deeply understand the 'why' behind an objective.
- A welcoming of ideas, and understanding of, perspectives that are different from your own and an interest in seeking and building from a common ground.
- You are a creative thinker, not bound by "the way things have always been done" but are thinking of the questions nobody has thought of and are "yet to be asked". What you know is less important than how well you learn, innovate, collaborate, and adapt.
- As a global team from many diverse backgrounds, experiences, and perspectives, you value and seek out paths for fostering diversity.
The Trade Desk does not accept unsolicited resumes from search firm recruiters. Fees will not be paid in the event a candidate submitted by a recruiter without an agreement in place is hired; such resumes will be deemed the sole property of The Trade Desk. The Trade Desk is an equal opportunity employer. All aspects of employment will be based on merit, competence, performance, and business needs. We do not discriminate on the basis of race, color, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law.
As an Equal Opportunity Employer, The Trade Desk is committed to creating an inclusive hiring experience where everyone has the opportunity to thrive.
Please reach out to us at accommodations@thetradedesk.com to request an accommodation or discuss any accessibility needs you may require to access our Company Website or navigate any part of the hiring process.
When you contact us, please include your preferred contact details and specify the nature of your accommodation request or questions. Any information you share will be handled confidentially and will not impact our hiring decisions.