Job Type: Contract
Work Mode: Remote
Main Technologies part of High-Performance Computing environments includes:
• Operating Systems: Linux – RedHat
• Provisioning: Redhat Image Builder, Redhat satellite
• Storage: GPFS from IBM
• Internal Network: Infiniband
• Workload balancer: Slurm,
• Ansible, Python, CD/CI, Jenkins, Gitlab
• Monitoring tools, Grafana, SNMP, LogicMonitor, Splunk
• Ticketing tools ServiceNow and Jira.
Role responsabilities:
• Service Request Management
• Configuration and Troubleshooting activities
• Automation: maintain and develop new code that help manage HPC environments as IaC.
• Installation of new environments (hardware installation coordination, OS deployment, science applications stack deployment and configuration)
• Decommissioning of old environments/systems.
• Provide responsive and knowledgeable support for end-users and administrators.
• Assist users with job scheduling, workload management, and resource allocation.
• Assist customer with migrations, upgrades, configuration changes, etc…
• Proposal and Implementation Management. Identifying new requirements, provide a detailed proposal outlining the implementation approach, plan, and associated charges.
• Conduct training sessions and create documentation to enhance user proficiency with HPC systems
• Maintain/improve monitoring features.
Soft skills:
• Fluent and effective communication in English.
• Adaptive to customer and service needs.
• Experience in long-term and mid-term projects execution.
• Team collaboration and leadership.
• Strategic Thinking. Ability to analyze business needs and design solutions aligned with organizational goals.
• Requirements Gathering. Identify customer pain points and translate them into system requirements.
• Solution Design. Create end-to-end architectures or processes across multiple systems/modules.
• Change Management. Plan and guide customer through technology and process change.
• Stakeholder Management. Communicate with different technical teams.
• Analytical and problem-solving ability. Propone and push for improvements.
• Adaptability to customer needs and constrains.
• Proactive, provide solutions to future problems.
• Adaptive, the person will need to work in both, reactive (with SLAs) and project tasks. Simple tasks and complex tasks.
• DevOps background and experience: automation (pipelines, GitLab, Ansible, IaC, CaaS (Kubernetes), Python, CI/CD...)