Categories
notes tech

auth0 jupyterhub authenticator

i’m sysadmin of some linux gpu servers in the phd working group and having to manually setup new users is a massive PITA.

a few options for replacements

  • Auth0 Authentication
  • KeyCloak OIDC / OAuth2 Authentication
    • https://www.keycloak.org/
    • functional+tested implementation up and running
    • only problem is existing user data — the user’s UID inside the spawned container depends on the order in which users start logging in, rather than ownership of local files.
    • could bind mount /home/{user} into container at /mnt/server then change $NB_UID for container user during usermod to match the UID of the user’s /home/{user}/.bashrc file — means another change to jupyterhub’s start.sh script in the Docker images.
    • Could also symlink /mnt/server to /home/{user}/server at end of start.sh with everything else in /home/{user} mounted in a docker volume? although docker volumes are NOT stored on the SSD for some servers because of disk space issue so user data wouldn’t be on the SSD :[
    • Better idea would probably be mount /home/{user} as before and change $NB_UID of $NB_USER to the uid of .bashrc (without chown-ing user files)
    • If no /home/{user} exists… then…. what happens? JupyterHub server does create the user account… so should be a simple case of look at uid of .bashrc … see it’s the same as $NB_UID… do nothing…?
    • BUT … JupyterHub will only create the /home/{user} directory inside its container (if using the completely containerised version) … which means that /home would still need to be mounted in the jupyterhub-{type} containers…. This will probably cause an absolute mess for user data though there will be multiple users that have files with UID=1000 etc. And because students can still SSH onto the servers it means they might be able to access other user’s data… :[
  • Native Authenticator
Categories
notes tech

sig-mlops

https://lists.cd.foundation/g/sig-mlops

This is a public list for the CDF MLOps SIG. All meetings and discussions are held in the open, and everyone is welcome to join. The current membership, calendar, and meeting documents can be found at https://github.com/cdfoundation/sig-mlops.

the Sig-MLOps 2021 roadmap speaks volumes to me and my experience as a machine learning phd student…

At this point in the development of the practice, it perhaps helps to understand that much of ML and AI research and development activity has been driven by Data Science rather than Computer Science teams. This specialisation has enabled great leaps in the ML field but at the same time means that a significant proportion of ML practitioners have never been exposed to the lessons of the past seventy years of managing software assets in commercial environments.

As we shall see, this can result in large conceptual gaps between what is involved in creating a viable proof of concept of a trained ML model on a Data Scientist’s laptop vs what it subsequently takes to be able to safely transition that asset into a commercial product in production environments. It is not therefore unfair to describe the current state of MLOps in 2020 as still on the early path towards maturity and to consider that much of the early challenge for adoption will be one of education and communication rather than purely technical refinements to tooling.

most other phd/professor type folks were not the least bit interested in dealing with these sorts of problems during development — basic codebase documentation often seems to be considered a waste of time based on the hundreds of github repos I’ve looked over.

it’s likely these are a subset of SEP fields and i’ve settled on calling the phenomena “Someone Else Will Clean Up the Mess” fields (or SEWCUM fields for short).