Tag Archives: Bayes

The Mother of Invention

Our paper Simulation-based marginal likelihood for cluster strong lensing cosmology got accepted for publication a few days ago. The astronomy problem posed was about how we can use observations of the phenomenon of strong gravitational lensing by galaxy clusters to infer the values of cosmological parameters (or potentially the cosmological model itself). Each hypothetical universe would result in different galaxy cluster compactness and universe expansion history. Both consequences ultimately affect the Data: which galaxy clusters would be selected in a flux-limited survey, their mass, and their effectiveness as a gravitational lens.

Anyway, we hit a snag because one of the things you need to do this is to be able to say how likely you are to observe what you did under different hypothetical universe models (or something more specific, like the amount of dark matter) – this is the likelihood. Since we use large-scale computer simulations to model non-linear structure formation, we can’t write down a likelihood function – a common problem in fields like epidemiology, genetics, and geology as well.

Our solution ended up being something in the vein of approximate Bayesian computation: use summary statistics. Except in this case, the summary statistics have to be something you infer using the usual Bayesian approach and you end up with a joint PDF (rather than a set of scalar values) for a given dataset (whether real or simulated). Then instead of a kernel distance metric and threshold distance, you calculate the zero-shift cross-correlation of the summary statistics PDFs (ssPDFs) for the real and mock datasets – that’s your likelihood!


This paper is particularly important to me because

  • it was my last as a professional astronomer
  • it involved throwing away the original paper we had written, re-framing the question (away from the notions of tension and consistency with standard cosmology), and implementing creative ideas.
  • we also had the best (anonymous) reviewer. We were both stubborn and pedantic but utterly respectful. This is how the refereeing process should be!

The idea is finally out there and I can now get on with several follow-up validation tests and applications in other fields of research. We could (and hopefully will) demonstrate the full posterior inference/model comparison in the kind of research problem where the simulations are relatively quick or emulators are available. There’s also much work to be done in the choice of summary statistics and inference of ssPDFs. Plus the logistics of calculating the cross-correlation with discretely sampled functions will get tricky when the ssPDFs are higher-dimensional, so any help with that would be greatly appreciated!


An Astro-interlude

I decided to spend two weeks hopping from continent to continent to take part in back-to-back astro-statistics-tech events: the COIN Residency Program and AstroHackWeek. A year after having left the field, formally speaking, I’ve chosen to make astronomy my hobby, taking “leave” to do research. It’s maybe not entirely sensible, but I’m doing this on my own terms. This blog is a report on things I learned that sleep-deprived mostly-barefoot fortnight.

First, a little background about the events.

The Cosmostatistics Initiative (COIN) is a collaboration that began in 2014 as a section of the International Astronomical Association (IAA) and brings together people across the Astronomer–Statistician spectrum to do some left-of-field research introducing new data analytic, statistical, and visualisation techniques to the astronomy community. The Residence Program happens once a year: we hang out in an apartment for a week, do some intense work on 2-3 projects well into the wee hours, write-up half the papers, and still get some sun. This year we found ourselves in the lovely, warm, city of Budapest.


Some of COIN on our day off to go sightseeing around Budapest. Credit: Pierre-Yves LaBlanche

AstroHackWeek (AHW), on the other hand, is a free-form event with elements of a workshop (pre-defined lectures) and a lot more making-it-up-as-we-go-along. Early on, 50 participants suggest topics they would like to learn about, identify one expert amongst the group and allow them to become teacher for an hour to a class of 10-20 (learning collectives are a brilliant idea!). Hack projects are the highlight, and are proposed both before and throughout the event; many of us will work on 2-4 at once. AHW also started in 2014, and was held this year at the Berkeley Institute for Data Science (BIDS).


AstroHackWeek getting settled in at GitHub HQ, San Francisco.

For completeness, I’m also going to mention dotAstronomy, a similar out-of-the-box unconference that started way back in 2008/9. It has evolved over the years, but by the time I attended dotAstro7 in Sydney in 2015, it had become a combination of idea-lectures, just one day of hack-projects, and a lot of unconference group discussions. More of the emphasis is on software/tech and education/communication.

OK, so here’s my brain-dump:

Mixture models

Mixture models are the result of combining models for different sub-populations or classes. This makes them relevant to both clustering classification routines and for dealing with outliers. You can never really tease the subpopulations apart; the point is to model the combined dataset. And maybe provide a probability for each data-point that it belongs to a specific class.

Hierarchical models

Some parameters of the model will be relevant to different subsets of the group. For example, for supernova data one needs to model individual light-curves (layer 1), properties of supernovae type Ia (layer 2), and cosmology (layer 3). I’m now convinced that at least half of all models are actually hierarchical, just not recognised and named as such.

Probabilistic Graphical Models

Probabilistic Graphical Models (PGM) are diagrams that are very helpful for communicating parametrizations of models. You have to learn the “notation”, but once you do, they make great visual aids (see an example in this paper). Parameters are described as distributions, data or constants. Relationships between parameters are noted. This is particularly good for describing hierarchical models.

Gaussian Processes

Making your covariance matrix Gaussian is the first step to modelling correlated errors. This is a complicated subject, and GPs certainly have limitations (maybe Gaussian isn’t appropriate!) but it’s better than just diagonal matrix, and besides, they have useful properties that make things easier to calculate.

Jupyter (IPython) Notebooks

This was the first time I actively used Jupyter Notebooks for writing python code, and I was pleasantly surprised by the interactive features and formatted commenting. Perfect for small pieces of code and teaching/demonstration. However, I do have some questions/gripes (please let me know if there are solutions) :

  • can you import a package/module written in a notebook? Sometimes we end up with a notebook version for development, and then a standard python file for importing.
  • can’t use all emacs commands meaning I have to do more clicking with the mouse, which is why I tend to avoid interactive editors in general.
  • how does one work collaboratively on the same notebook? Can git handle that?

To be fair, I have an old version of ipython notebook, so maybe these gripes no longer apply. I should talk to the Jupyter crew, one of whom I met at AHW.

Parallel programming in Python

I had thought that parallel programming wasn’t really possible in python: you could run code on multiple threads yes, but not really multiple cores. People use multiprocessing sometimes, but now I need to look into mpipool. Could be useful, if you have the mpiexec job launcher set up on your cluster.

Natural Language Processing & Web-scraping

Despite being astronomers-by-trade, you’ll often find us talking excitedly about everything fascinating from outside our field. At a hack-week, we’re happy to give anything a shot. So after free dinner and drinks at GitHub HQ , we dreamt up the Happiness Hack (under a different name) and within 2 hours, created this.

It was going to end there, but the next day, we drummed up interest from the group and ended up extending the hack to grab** and analyse participants’ commit messages, as a bit of a joke, I guess, but here you go.

**beautiful-soup : holy crap!! So powerful, so beautiful…


Mock Turtle sings “beautiful soup”. Snippet of the drawing by Sir John Tenniel

Failing efficiently

Pair coding has been part of my life for the last few months, and I totally appreciate how it can really be more efficient despite the extra person investment. Just enough cooks. The small collaborations formed at both events worked wonderfully together, and several papers have been spawned. But really the big lesson, particularly from hacking at AHW, is that we benefit from learning to fail efficiently, because that sets us free to explore high risk projects. One person could hack away for weeks or months at an idea, while two or three people could declare it a lost cause in a mere day or two. Besides efficiency, this system prevents frustration and burn-out. Trying and failing was actively encouraged at AHW, and, better yet, demonstrated by senior participants.

Career transitions & Imposter Syndrome

Every time I meet with astronomers these days, the discussion turns to the process of leaving astronomy and imposter syndrome. The global community only really started talking about these on open forums about three years ago, and now it’s a recurring theme. At hack days/weeks, in particular, imposter syndrome is rife. Trying to prove your skills and worth and produce something spectacular on a short timescale is a recipe for mental health disaster. The pressure to dazzle with our hacking skillz certainly got to me back at dotAstro, but not as much this time, partly because the organisers made it a point to tackle the problem head-on (thank you!) and make the most of everyone’s diverse skill-sets, and partly because this time I knew better and put more emphasis on play and fun, and less on achieving goals.


So yeah, amongst the astronomy, statistics, computing, collaborating, hacking, and playing, I managed to learn a ton of stuff, see lovely places, and make new friends, which made the trip very worthwhile. My most important lesson, however, was:

Try not to doze off while on your laptop on the sofa near your colleagues, otherwise you end up with photos of creepy teddy bears watching you sleeping…

Rational Agents

Recently I attended the second ever Bayesian Young Statisticians’ Meeting (BAYSM`14) in Vienna, which was a really stimulating experience, and something pretty new for me, being my first non-astronomy conference. I won a prize for my talk too, which was pretty sweet!

BAYSM`14 venue

Swanky BAYSM`14 venue at WU Vienna designed by architect Zaha Hadid

During the two-day overview of theory and a variety of applications by the newest people in the field (read about the highlights over at the blogs of Ewan Cameron and Christian Robert), we heard from a few Keynote Speakers including Chris Holmes. In his talk, he mentioned the world of rational decision makers as envisioned by Leonard J. Savage in his 1954/1972 tome The Foundations of Statistics (adding that on my ‘to read’ list), and went on to describe the application of a loss function and minimax to avoid worst-case scenarios. Minimax isn’t the only approach to decision-making; I think other approaches  are more relevant to our behaviour, as I’ll describe later.

“If you lived your life according to minimax, you’d never get out of bed” – C. Holmes

Continue reading

We are all made of stats

Children are very good at science. They start with broad priors (anything is possible) and learn through collecting data (see picture below) what conclusions are supported best by the evidence. They experiment, make mistakes, and test the variations on a theme. They learn what is dangerous; they learn what is tasty; they learn how to speak.

Kids doing science

Kids doing science

Our responses to experiences are very similar to Bayesian reasoning. Take trust as an example. If some dudette off the street – let’s call her Margaret – were to recommend a movie, say Moon, we might not heed her words since we have no reason to think we’d have the same taste in movies as her, but if upon watching Moon we found that we quite enjoyed it – we’d be more likely to rely on Margaret’s next tip, say Wadjda. And if Wadjda was also to our liking, we’d probably trust Margaret’s advice when she suggests Fast & Furious 6 (oops). But that blunder would reduce our confidence in her next recommendation, etc. If we define our experience of the movie in binary terms such as “liked” and “disliked”, the situation resembles the classic coin-toss experiment in which one tries to determine if a coin is biased by flipping it many times.
Continue reading