Shoestring Psycholing

A language science blog

Getting on With Git

Back from a long hiatus, aiming for shorter posts.

Git really is great. I’ve been thinking about it for a long time, and I went through a period early this year in using it regularly while working on some personal projects, but then I stopped for a while, mostly because I went back to the office, where (I assumed) we didn’t have it.

Git is a type of version control software. For those of us who aren’t software developers by profession, this basically means that git is a kind of file management system that works like a combination “track changes,” “undo button,” and all-around time machine for an entire project. I’m more and more a believer that some kind of good version control is realy the foundation of reproducible research. More on that in other posts.

Kruschke Chapter 5, Part 3

Finally, it’s time for some actual examples of (very very simple) Bayesian analysis. The last part of the chapter gives a summary of “how to do Bayesian analysis,” and Kruschke walks through the skeleton of steps that you need to do in order to use the code and concepts developed up to this point to analyze some (binary categorical) data. If you are working through this book yourself, I highly recommend the exercises. There is a lot of pedagogical value there, so much that I think this chapter would be incomplete if you skipped them. In this post, I will walk through some analyses with some additional data, comparing the Bayesian results to a traditional analysis, and illustrating a few points I find interesting.

Kruschke Chapter 5, Part 2

So way back in November, I promised the next post would elaborate a bit on model comparison. This part of the chapter, and the accompanying exercises, blew my mind a little.

Back With Some Tips

Finally getting back to the blog after a long hiatus. I got a little hung up with putting together a good post for the next segment in the Kruschke/Bayesian saga, and I’ve been too busy with other things to have the time to put some polish on it. So I’ve decided instead to try out the other route and just post frequently, without worrying too much about polish. We’ll see how it goes. I will pick back up with Kruschke, but I want to prioritize posting something frequently, instead of getting too perfectionist about it.

One of the continuing themes I intend to post on is the use of various software tools, with a special focus on tools for reproducible research. So I’ll finish off today’s post with three random tips about two of my favorite pieces of software.

Kruschke Chapter 5, Part 1

So Chapter 5 is a big one. I’m going to end up breaking it across several posts. It’s a big one for a few reasons. It’s the first chapter in Part 2 of the book, representing the first “real” Bayesian analysis. In other words, we’re finally getting to the point where we will actually apply Bayesian analysis to some data! In the interest of making these blog posts maximally useful for myself (and maybe some readers), I’m going to go through some additional data sets of my own, parallel to the examples Kruschke uses.

The other thing I’m going to try to do, which will end up stretching out these posts a bit, is to re-work some of the code that Kruschke supplies. In a nutshell, I feel like Kruschke’s code is probably aimed pretty well at students, who may want to be able to complete the exercises in the book, but who may or may not (a) know much about R, or (b) want to be able to apply the functions to more general situations (i.e., other data). I think in order for the code to be more useful to me personally, I’d like to re-work it a bit. I’ve started a github repo for the Kruschke book here (see also the links on the side of the blog).

In this post, I’ll give an overview of the conceptual issues, and we’ll get to the code and actual analysis in following posts.

Really Reproducible Research

I think a lot about the related notions of reproducibility and replicability in research. I intend for these to be major reoccurring themes. Today I saw a big special issue of Perspectives on Psychological Science on the topic of reproducibility, so I thought it would be good to break up the Bayesian stuff with a short rant about reproducibility.

Kruschke Chapter 4

Woohoo! Bayes’ Rule! Or should it be Bayes’s? Kruschke goes with Bayes’, so I guess I will, too, but the linguist in me really things it ought to be Bayes’s, or at least pronounced that way. I’d certainly say Jonas’s rule, if I knew a guy name Jonas with a rule. Ok, sorry, back to Bayes and Kruschke.

I really like the initial examples he starts off with. For whatever reason, equations are a lot of hard work for me, even though I like them, and it’s hard for me to understand them deeply enough to have an intuitive feel for what they are saying. The rain/clouds example is way more accessible, and the playing card probabilities are a nice enough “toy” example that actual numbers can be calculated, so overall I think he’s done a great job in choosing examples.

The gist of Bayes’ Rule is that it sets up a relationship between conditional probabilities, allowing you to calculate something you want to know, from quantities that you already have (or can estimate). Back to the idea of beliefs as probabilities, Bayesian inference boils down to the idea of calculating beliefs (probability of some parameter) given data, which is a conditional probability, like calculating the probability of rain (a parameter) given clouds (data). And the point and magic of Bayes’ Rule is that this can be calculated as a function of other probabilities, which we can get easier access to directly.

Kruschke Chapter 3

This is the first really substantial chapter; the first two were really just a warm-up. Kruschke presents this chapter as a kind of intro to probability, but (naturally) he makes some very specific choices about what to approach and how to approach it. A lot of it comes across as kind of foreshadowing, which makes it a little hard to follow or grasp the point of at times. Ultimately, I think this chapter should go pretty quick the first read-through, to prime some ideas, but the ideas probably won’t really start to sink in until they get applied later.

Reproducibility of New Methods

One of the main themes I’d like to talk about in this blog is the notion of reproducibility in research. I’ll leave more in-depth discussion of the fundamental ideas for other posts; today I just want to comment on something that occurred to me while working on a paper re-submission, and I think it’s also really relevant to the ongoing Bayesian stats posts as well. The issue is: new methods are inherently less reproducible. Let me explain.

No, there is too much. Let me sum up.

Kruschke Chapter 2

(Here’s a link to the intro and Chapter 1 discussion)

In this chapter, Kruschke does a nice job of succinctly laying out the primary goals of statistical inferencing, and gives a pretty clear, intuitive description of what prior and posterior beliefs are about. To paraphrase in a nutshell, prior beliefs represent our beliefs (including level of uncertainty in those beliefs) before data collection or observation, and posterior beliefs reflect what we believe after taking data into account.