Site Reliability Engineer (x/f/m)

Full-time (Remote optional; EU time zones preferred)

Emphasis: AWS and Docker

Salary: €60K-80K

About Meshcapade

Meshcapade is a spin-off from the Max Planck Institute for Intelligent Systems in Tübingen, Germany and we are creating realistic human avatars for use in research, apparel, biomechanics, virtual reality and film. Using machine learning and computer vision, we model the nuances of human shape and movement. We can automatically convert photos, 3D & 4D scans, RGB-D sequences, Mocap and IMU data into realistic 3D humans. Our methods derive from state of the art, patented research methods. Our core product, digidoppel, is an online platform for the creation, modification, and delivery of our automated 3D human avatars and related assets. Our clients run the gamut of global names; a broad mix of tech, media, health and fitness, apparel, and education.

What we offer

We are a diverse team of passionate creators from a variety of backgrounds, seeking to change how people generate, think about, and make use of digital human avatars. Compensation and benefits are respectively competitive. Our offices are based in Tübingen, Germany. Remote is available, within the EU time zone or nearby. Work hours are flexible without advanced notice, with no scheduled meetings on Fridays. Our team is usually keen to help out with anything and everything they can, but we each have our specialisation, so collaboration is always fun and educational.

Description

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. While we are still small, we aim to do things well from the start to make it easy to scale the team and our products without causing issues for our customers. You will work closely with the core platform team to develop our infrastructure, visibility, deployment automation, and security solutions. You are expected to be learning on the job and taking ownership of your projects and solutions.

You will be helping to maintain and improve:

  • AWS infrastructure using EKS and ECS for compute, and deployed using terraform
  • Applications and cluster services running in EKS clusters using kustomize and helm, where an automated deployment strategy needs to be implemented.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
  • Logging, monitoring, authentication, and authorization solutions within our products.
  • Job scheduling and autoscaling solutions
  • Engage with the team for incident response and postmortems

Requirements

  • BS or higher in a technical field or meaningful experience.
  • 3-5 years of experience in a software engineering related field
  • Working experience with AWS, Azure or Google cloud (AWS preferred)
  • Golang and/or Python programming skills
  • Extensive experience with Docker, Kubernetes, containerised environments and cloud infrastructure
  • Experience with Infrastructure as Code and associated tools such as Terraform
  • Knowledge of observability stacks (Grafana, Prometheus, Loki)
  • Worked with Configuration Management and CI/CD pipeline tools such as Git, Gitlab CI/CD, etc
  • Ability to work in a fast paced environment and independently solve problems
  • Excellent communication and team working skills

Bonus Skills

Nice-to-haves, definitely not required

  • DevSecOps experience
  • Active involvement with open source technologies of the CNCF
  • Certified Kubernetes administrator or application developer
  • Linux systems and bash scripting experience

Apply here!