An Astro-interlude

I decided to spend two weeks hopping from continent to continent to take part in back-to-back astro-statistics-tech events: the COIN Residency Program and AstroHackWeek. A year after having left the field, formally speaking, I’ve chosen to make astronomy my hobby, taking “leave” to do research. It’s maybe not entirely sensible, but I’m doing this on my own terms. This blog is a report on things I learned that sleep-deprived mostly-barefoot fortnight.

First, a little background about the events.

The Cosmostatistics Initiative (COIN) is a collaboration that began in 2014 as a section of the International Astronomical Association (IAA) and brings together people across the Astronomer–Statistician spectrum to do some left-of-field research introducing new data analytic, statistical, and visualisation techniques to the astronomy community. The Residence Program happens once a year: we hang out in an apartment for a week, do some intense work on 2-3 projects well into the wee hours, write-up half the papers, and still get some sun. This year we found ourselves in the lovely, warm, city of Budapest.

b10-coin-day3

Some of COIN on our day off to go sightseeing around Budapest. Credit: Pierre-Yves LaBlanche

AstroHackWeek (AHW), on the other hand, is a free-form event with elements of a workshop (pre-defined lectures) and a lot more making-it-up-as-we-go-along. Early on, 50 participants suggest topics they would like to learn about, identify one expert amongst the group and allow them to become teacher for an hour to a class of 10-20 (learning collectives are a brilliant idea!). Hack projects are the highlight, and are proposed both before and throughout the event; many of us will work on 2-4 at once. AHW also started in 2014, and was held this year at the Berkeley Institute for Data Science (BIDS).

b10-ahw-3github

AstroHackWeek getting settled in at GitHub HQ, San Francisco.

For completeness, I’m also going to mention dotAstronomy, a similar out-of-the-box unconference that started way back in 2008/9. It has evolved over the years, but by the time I attended dotAstro7 in Sydney in 2015, it had become a combination of idea-lectures, just one day of hack-projects, and a lot of unconference group discussions. More of the emphasis is on software/tech and education/communication.

OK, so here’s my brain-dump:

Mixture models

Mixture models are the result of combining models for different sub-populations or classes. This makes them relevant to both clustering classification routines and for dealing with outliers. You can never really tease the subpopulations apart; the point is to model the combined dataset. And maybe provide a probability for each data-point that it belongs to a specific class.

Hierarchical models

Some parameters of the model will be relevant to different subsets of the group. For example, for supernova data one needs to model individual light-curves (layer 1), properties of supernovae type Ia (layer 2), and cosmology (layer 3). I’m now convinced that at least half of all models are actually hierarchical, just not recognised and named as such.

Probabilistic Graphical Models

Probabilistic Graphical Models (PGM) are diagrams that are very helpful for communicating parametrizations of models. You have to learn the “notation”, but once you do, they make great visual aids (see an example in this paper). Parameters are described as distributions, data or constants. Relationships between parameters are noted. This is particularly good for describing hierarchical models.

Gaussian Processes

Making your covariance matrix Gaussian is the first step to modelling correlated errors. This is a complicated subject, and GPs certainly have limitations (maybe Gaussian isn’t appropriate!) but it’s better than just diagonal matrix, and besides, they have useful properties that make things easier to calculate.

Jupyter (IPython) Notebooks

This was the first time I actively used Jupyter Notebooks for writing python code, and I was pleasantly surprised by the interactive features and formatted commenting. Perfect for small pieces of code and teaching/demonstration. However, I do have some questions/gripes (please let me know if there are solutions) :

  • can you import a package/module written in a notebook? Sometimes we end up with a notebook version for development, and then a standard python file for importing.
  • can’t use all emacs commands meaning I have to do more clicking with the mouse, which is why I tend to avoid interactive editors in general.
  • how does one work collaboratively on the same notebook? Can git handle that?

To be fair, I have an old version of ipython notebook, so maybe these gripes no longer apply. I should talk to the Jupyter crew, one of whom I met at AHW.

Parallel programming in Python

I had thought that parallel programming wasn’t really possible in python: you could run code on multiple threads yes, but not really multiple cores. People use multiprocessing sometimes, but now I need to look into mpipool. Could be useful, if you have the mpiexec job launcher set up on your cluster.

Natural Language Processing & Web-scraping

Despite being astronomers-by-trade, you’ll often find us talking excitedly about everything fascinating from outside our field. At a hack-week, we’re happy to give anything a shot. So after free dinner and drinks at GitHub HQ , we dreamt up the Happiness Hack (under a different name) and within 2 hours, created this.

It was going to end there, but the next day, we drummed up interest from the group and ended up extending the hack to grab** and analyse participants’ commit messages, as a bit of a joke, I guess, but here you go.

**beautiful-soup : holy crap!! So powerful, so beautiful…

b10-mockturtle

Mock Turtle sings “beautiful soup”. Snippet of the drawing by Sir John Tenniel

Failing efficiently

Pair coding has been part of my life for the last few months, and I totally appreciate how it can really be more efficient despite the extra person investment. Just enough cooks. The small collaborations formed at both events worked wonderfully together, and several papers have been spawned. But really the big lesson, particularly from hacking at AHW, is that we benefit from learning to fail efficiently, because that sets us free to explore high risk projects. One person could hack away for weeks or months at an idea, while two or three people could declare it a lost cause in a mere day or two. Besides efficiency, this system prevents frustration and burn-out. Trying and failing was actively encouraged at AHW, and, better yet, demonstrated by senior participants.

Career transitions & Imposter Syndrome

Every time I meet with astronomers these days, the discussion turns to the process of leaving astronomy and imposter syndrome. The global community only really started talking about these on open forums about three years ago, and now it’s a recurring theme. At hack days/weeks, in particular, imposter syndrome is rife. Trying to prove your skills and worth and produce something spectacular on a short timescale is a recipe for mental health disaster. The pressure to dazzle with our hacking skillz certainly got to me back at dotAstro, but not as much this time, partly because the organisers made it a point to tackle the problem head-on (thank you!) and make the most of everyone’s diverse skill-sets, and partly because this time I knew better and put more emphasis on play and fun, and less on achieving goals.

 

So yeah, amongst the astronomy, statistics, computing, collaborating, hacking, and playing, I managed to learn a ton of stuff, see lovely places, and make new friends, which made the trip very worthwhile. My most important lesson, however, was:

Try not to doze off while on your laptop on the sofa near your colleagues, otherwise you end up with photos of creepy teddy bears watching you sleeping…

Advertisements

7 thoughts on “An Astro-interlude

  1. linfinit

    Hiya Mud!
    It’s good to hear about your experiences of leaving astronomy and of hackfests. I’ve “left” mathematics and am working the public service while still doing research as a “hobby”.
    Two points:
    1. Impostor syndrome and burnout were also featured at PyConAU in Melbourne this year.
    2. Have you tried Mpi4py? It worked well for me on Raijin and even on my 4 core PC.
    https://github.com/penguian/pysaft/tree/MPI?files=1

    Reply
    1. Matt

      +1 for mpi4py, straight forward interface at least for the basic tasks.

      I think I had imposter syndrome when I was in academia, except that when I read about other people’s imposter syndrome experience I get meta imposter syndrome where I feel my imposter syndrome wasn’t really up to standard and feel bad.

      Reply
      1. Joel Nothman

        I’m not sure what kind of parallelism you need, but you might want to look into IPython.parallel (while you’re using Jupyter), and dask, a promising newcomer.

      2. Madhura Killedar Post author

        @Matt meta imposter syndrome still counts! Double points even!
        @Joel our parallelism requirements were for a much bigger code that doesn’t use Jupyter. Wouldn’t have thought to try! I mean, where would it run?

    2. Madhura Killedar Post author

      Nice to meet another research hobbyist! (Edit: Paul, didn’t realise it was you! And didn’t know you’d “left” either – your hobby is clearly very productive!)
      1. yeah I do have the feeling that impostering is being discussed more universally (articles online certainly transcend research-fields), but I haven’t been to these other in-person meetings.
      2. no I haven’t… I get the feeling my colleagues decided against using it for our team-project for some reason but I’ll ask why not. I’d heard of it.

      Reply
  2. Joel Nothman

    Concurrently working on the same notebook isn’t necessarily a good idea, but git shouldn’t have a big problem with it. The main issue is that any merge conflicts will need to be resolved within their JSON representation, which of course you never look at as a user.

    As to importing a notebook as a library: you can’t do so without exporting it. But it’s not really intended for library development. I find that I use it to develop some functionality and then strip it out into a library (leaving just the use of that library in the notebook).

    And I haven’t really understood your description of hierarchical models, but I’m no astronomer… 🙂

    Reply
    1. Madhura Killedar Post author

      The notebook does seem better for dev and teaching, but I was kind of hoping it could be more. And besides even small codes are sometimes developed by more than one person (in a few months, we’ll be posting a notebook produced for a COIN project, but two of us made it, and we had to just send our files back and forth)

      No, I can’t imagine anyone using my “description” as a learning tool 🙂 OK, it might be worth me setting out a more formal description, and more universal examples, and diagrams, like a proper tutorial.

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s