Server Operations Engineer
About Centific
Centific is a frontier AI data foundry that curates diverse, high-quality data, using our purpose-built technology platforms to empower the Magnificent Seven and our enterprise clients with safe, scalable AI deployment. Our team includes more than 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers. We harness the power of an integrated solution ecosystem—comprising industry-leading partnerships and 1.8 million vertical domain experts in more than 230 markets—to create contextual, multilingual, pre-trained datasets; fine-tuned, industry-specific LLMs; and RAG pipelines supported by vector databases. Our zero-distance innovation™ solutions for GenAI can reduce GenAI costs by up to 80% and bring solutions to market 50% faster.
Our mission is to bridge the gap between AI creators and industry leaders by bringing best practices in GenAI to unicorn innovators and enterprise customers. We aim to help these organizations unlock significant business value by deploying GenAI at scale, helping to ensure they stay at the forefront of technological advancement and maintain a competitive edge in their respective markets. About Job
Service Description
Install the operating systems for new mounted servers
Responsible for reinstalling the server operation systems, solving the abnormalities.
Responsible for the daily server maintenance, troubleshooting, repair and follow-up break-fix of the server and other hardware. Maintain data on internal systems including asset management, ticketing, rack related data
Work with remote vendors/manufacturers or other teams to solve server batch failures and problems
Oncall on duty, responsible for dealing with the problems raised by the business owner side
Collect and check online assets status or issues
Erase drives or other configurations for retiring or relocating servers. Retrofit some tools by updating or writing scripts
Submit and track the part RMA or media destruction process if needed
Server network troubleshooting
Server lifecycle management including managing the performance of OxMs
Other server operation related work
Service Requirements
Bachelor's Degree in Computer science, Electrical engineering or any other relevant fields
Strong ability to work under pressure; Strong learning ability, broad technical interest; Strong sense of responsibility, full of enthusiasm for work
Good communication skills in English, Mandarin is preferred, good team work spirit; Ability to work independently
Knowledge of the interdependencies of Data Center functions and technologies, including Facilities
Familiarity with basic Data Analytic Skills
Experience with massive remote OS installation such as PXE boot
Can understand and run Bash Shell or Python scripts
Familiar with simple automation tools, such as Ansible
Familiar with Linux systems, able to locate server hardware/baremetal faults; Strong analytical and problem solving skills
Basic TCP/IP knowledge concepts - Subnetting, VLANs, DNS, IPv6; Ability to perform general troubleshooting
Knowledge of out-of-band/lights-out server communication methods, such as IPMI and NCSI
Strong documentation skills and habits
Centific is an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, citizenship status, age, mental or physical disability, medical condition, sex (including pregnancy), gender identity or expression, sexual orientation, marital status, familial status, veteran status, or any other characteristic protected by applicable law. We consider qualified applicants regardless of criminal histories, consistent with legal requirements.