If The Data Will Not Come To The Astronomer (late 2019)...

Tucson Python Meetup, 3 December 2019

Adam Thornton

LSST

athornton@lsst.org

This Talk

Licensed under Creative Commons Attribution 4.0 International license (CC BY 4.0)

Overview

Large Synoptic Survey Telescope

Astronomical Analysis Methodologies

Interactive Notebook Environment



LSST

The Large Synoptic Survey Telescope

LSST is funded by the National Science Foundation and the Department of Energy, as well as a host of other generous donors, public and private.

LSST Science Goals

Just say "objects at every astronomical scale" and list the four. Don't describe them.

Data Collection Scale

Data Scale

Observations of Celestial Objects

LSST Resources

Depth

Deepest ground-based survey telescope

However...

Field of view

Depth isn't everything.

Image by Nate Lust, based on data from the HSC collaboration.

Camera

LSST Mirror Design

Larger mirrors are generally segmented rather than monolithic.

Cost

Site



Astronomical Status Quo

Historically, astronomical research has been done with:

Obvious Failure Modes

Data

A Different Way To Do Astronomy

Interactive vs. Batch

We expect that a researcher will use the "interactive notebook aspect of the LSST Science Platform" (by which we mean JupyterLab, or perhaps its successors) to perform this iteration. It is a rapid prototyping tool with the following characteristics:

What does this imply?

What Do We Want?

Let's imagine a better world:

Community Acceptance

The trickiest design goal is that we cannot make any user's life significantly worse than the status quo.

Obviously the current system isn't ideal:

But...it also gets the job done. The analysis software encodes literally hundreds, perhaps thousands, of astronomer-years of work on difficult problems. It is inherently complex.

We have to please several different groups of users.

User Community

Analysis Pipeline Consumers

We have this one covered. If you want to use the existing toolset to analyze collected data, and you're not coming to the project with a lot of prior experience or actively developing the pipeline software, we're delivering a far superior way to get your work done than the prior art.

User Community

Analysis Pipeline Developers

The LSST stack is big. No one works on the whole thing. The way it's developed is that someone takes a version (either a release version, approximately every 6 months, or a weekly build) and works on their own little corner of it in a conda or pip environment. We must support that.

User Community

Established Astronomers

The people who have tenure and bring in the grants already have a workflow that works well for them. Sure, it's based on FORTRAN IV and FITS files, but they've gotten really, really good at it.

In practice: you need a Terminal window that gives you shell access to something that looks like a Unix system. We mimic a system on which you have an unprivileged account, which is very familiar to academic users.

There is something of an Uncanny Valley problem here.

User Community

Security; generally, Operational Support

Image composition by Abbey Yacoe.

It's a fair cop, but if if we make it look like an existing multi-user system, where the user doesn't have root or sudo within the container, and has write access only to ${HOME} and scratch space but not the OS, and furthermore we show that we can completely characterize the container's contents, it's a much easier sell.

The Big Reveal

(Not actually a surprise to anyone here.)

Kubernetes + JupyterHub + JupyterLab

Banek et al., ADASS 2019: Why is the LSST Science Platform built on Kubernetes?

Abstraction and Layering

The Long Bet

Kubernetes will save astronomy.

Modularity

This lets you both have your cake and eat it. You get to use whatever insanely complex analysis framework you want wrapped inside a general-purpose, self-healing application architecture.

Presenting the Analysis Component

Replacing the payload is a matter of replacing the JupyterLab container that is spawned for the user. All you need is:

I would be flabbergasted if this approach were not portable to other physical sciences and very possibly to other (and very general) analytic problem spaces.

Parallelization

Resource Management

Scaling

Step one: Add more nodes to your cluster. (Or take some away.)

Step two: Change the replica counts in your deployments.

There is no step three.

Contributing

The Jupyter community is awesome.

JupyterLab has stabilized a lot. 1.0.x is current, and 2.0 is aimed at New Year 2020.



LSST JupyterLab Implementation

Overview

SQR-018 describes the architecture.

The complete implementation is available at GitHub.

Deployment

Just use Helm

Problem 1: Authentication

Authentication is annoying and hard. Let's outsource it.

[cilogon_screenshot]

Problem 2: Authorization

How do we restrict beyond "has a GitHub/NCSA account"?

Both have concepts of group memberships.

[auth_screenshot]

Problem 3: Global User Consistency

GitHub's user account ID fits into a 32-bit value. Each GitHub Organization also has an ID. There are our UID/GID maps.

CILogon + NCSA IDP does something similar.

Now you have globally consistent users and groups.

[uid_screenshot]

Problem 4: Persistent Storage

We have globally unique UIDs and GIDs.

[filesystem_screenshot]

Problem 5: User Access Restriction

Don't give your users sudo. Don't even give them passwords.

You already have globally-consistent UID and GIDs. Use a semiprivileged user to provision user with correct name/UID/GIDs via sudo at container startup. Start Jupyterlab as that user.

You're done.

Users can still override bits of the stack with pip install --user.

[sudo_screenshot]

Problem 6: User Resource Restriction

If you spawn users into individual namespaces:

[quota_screenshot]

Problem 7: Auditability and Maintainability

It's a container. You know how you built it (at least if you use particular package versions, not latest). It's repeatable and immutable. Have a bleeding-edge build, that floats, and an occasionally-updated-from-that version that pins all OS-level, Python-level, and JupyterLab-Extension-level components.

We look for regressions in the stack by creating an options form that scans our repository and presents a menu of recent builds. This also allows users to choose their risk tolerance.

[options_screenshot]

Problem 8: Startup Time and User Frustration

Our images are huge and take on the order of 15 minutes to pull.

Within, say, an hour and a half of building (which is usually in the middle of the night) each image is available on each node and therefore starts quickly.

[prepuller_screenshot]

JupyterLab Resources

SpaceForward
Right, Down, Page DownNext slide
Left, Up, Page UpPrevious slide
GGo to slide number
POpen presenter console
HToggle this help