- Career Center Home
- Search Jobs
- Data Center Production Operations Engineer (Third Shift)
Results
Job Details
Explore Location
Meta
New Albany, Ohio, United States
(on-site)
Posted
12 hours ago
Meta
New Albany, Ohio, United States
(on-site)
Industry Categories
Internet / E-Commerce
Job Function
Other
Data Center Production Operations Engineer (Third Shift)
The insights provided are generated by AI and may contain inaccuracies. Please independently verify any critical information before relying on it.
Data Center Production Operations Engineer (Third Shift)
The insights provided are generated by AI and may contain inaccuracies. Please independently verify any critical information before relying on it.
Description
Meta is seeking a Data Center Production Operations Engineer to support the reliability, efficiency, and scalability of our global data center infrastructure. In this role, you will be responsible for the day-to-day operational health of server fleets and production systems that underpin Meta's family of apps and services. You will work at the intersection of hardware lifecycle management, systems reliability, and operational process improvement, ensuring that production environments meet the demands of billions of users worldwide.Data Center Production Operations Engineer (Third Shift) Responsibilities:
- Manage and maintain large-scale server fleets across data center environments, including hardware triage, failure analysis, and coordinating repair and replacement workflows
- Monitor production systems health using observability tooling and telemetry data to proactively identify and resolve infrastructure anomalies before they impact service availability
- Develop and refine operational runbooks, escalation procedures, and incident response playbooks specific to data center server environments
- Collaborate with hardware engineering, network operations, and capacity planning teams to support server deployment, decommissioning, and lifecycle transitions
- Analyze failure trends and operational data to identify systemic issues in server hardware or firmware, and drive root cause analysis and corrective action
- Contribute to automation initiatives that reduce manual toil in server provisioning, health checks, and fleet management workflows, including leveraging AI-integrated tooling
- Partner with cross-functional teams to evaluate and implement process improvements that increase operational efficiency and reduce mean time to resolution for production incidents
- Communicate infrastructure status, incident timelines, and risk assessments to engineering and operations stakeholders through clear written and verbal updates
- Support capacity readiness activities by validating server acceptance criteria and coordinating with data center technicians during hardware bring-up and commissioning
- Identify gaps in monitoring coverage or operational tooling and propose solutions that improve fleet visibility and production reliability
- Participate in 24/7 on-call rotation
- Ability to travel up to 15% of the time
- Required to work a shifted schedule (includes nights and weekends)
Minimum Qualifications:
- 6+ years of experience in data center operations, site operations, or production infrastructure engineering supporting large-scale server environments
- 6+ years of experience with server hardware components including CPUs, memory, storage, and network interface cards, including hands-on troubleshooting and failure diagnosis
- Experience using systems monitoring and observability platforms to track fleet health, identify anomalies, and drive incident resolution in production data center environments
- Experience developing or improving operational processes, runbooks, or automation scripts to support server fleet management at scale
- Experience collaborating with hardware engineering, network, and capacity teams to coordinate infrastructure deployments and lifecycle activities
Preferred Qualifications:
- Experience contributing to post-incident reviews and translating findings into durable operational improvements that reduce recurrence across a server fleet
- Background in capacity planning or hardware acceptance testing processes within a large-scale cloud or hyperscale data center organization
- Familiarity with server firmware management, BIOS configuration, and out-of-band management interfaces such as IPMI or Redfish in hyperscale data center environments
- Experience with scripting languages such as Python or Bash to automate data center operations tasks including health checks, inventory management, or alerting workflows
About Meta:
Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today-beyond the constraints of screens, the limits of distance, and even the rules of physics.
Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.
Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at [email protected].
$111,010/year to $158,995/year + bonus + equity + benefits
Individual compensation is determined by skills, qualifications, experience, and location. Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual salary only, and do not include bonus, equity or sales incentives, if applicable. In addition to base compensation, Meta offers benefits. Learn more about benefits at Meta.
Job ID: 84982341
Please refer to the company's website or job descriptions to learn more about them.
View Full Profile
More Jobs from Meta
Software Engineer (Technical Leadership) - Machine Learning
Bellevue, Washington, United States
12 hours ago
Research Scientist Intern, NMR Analysis Automation
Redmond, Washington, United States
12 hours ago
Japan Partner Enablement Manager
Tokyo, Japan
12 hours ago
View your connections
Jobs You May Like
Safety Index
60/100
60
Utilities
Basic
(Electricity, heating, cooling, water, garbage for 915 sq ft apartment)
$125
-
$400
$240
High-Speed Internet
$50
-
$150
$70
Transportation
Gasoline
(1 gallon)
$3.10
Taxi Ride
(1 mile)
$2.04
Data is collected and updated regularly using reputable sources, including corporate websites and governmental reporting institutions.
Loading...
