EEG/ERP, Sound-symbolism

Ideophones in Japanese modulate the P2 and late positive complex responses: MS Paint version

I just had my first paper published:

Lockwood, G., & Tuomainen, J. (2015). Ideophones in Japanese modulate the P2 and late positive complex responses. Language Sciences, 933. http://doi.org/10.3389/fpsyg.2015.00933

It’s completely open access, so have a look (and download the PDF, because it looks a lot nicer than the full text).

It’s a fuller, better version of my MSc thesis, which means that I’ve been working on this project on and off since about April 2013. Testing was done in June/July 2013 and November 2013. Early versions of this paper have been presented at an ideophone workshop in Tokyo in December 2013, a synaesthesia conference in Hamburg in February 2014, and a neurobiology of language conference in Amsterdam in August 2014. It was rejected once from one journal in August 2014, and was submitted to this journal in October 2014. It feels great to have it finally published, but also kind of anticlimactic, given that I’m focusing on some different research now.

I feel like the abstract and full article describe what’s going on quite well; this is a generally under-researched area within the (neuro)science of language as it is, so it’s written for the sizeable number of people who aren’t knowledgeable about ideophones in the first place. However, if you can’t explain your research using shoddy MS Paint figures, then you can’t explain it at all, so here goes.

Ideophones are “marked words which depict sensory imagery” (Dingemanse, 2012). In essence, this means that ideophones stick out compared to regular words, ideophones are real words (not just off the cuff onomatopoeia), ideophones try and imitate the thing they mean rather than just describing it, and ideophones mean things to do with sensory experiences. This sounds like onomatopoeia, but it’s a lot more than that. Ideophones have been kind of sidelined within traditional approaches to language because of a strange fluke whereby the original languages of academia (i.e. European languages, and especially French, German, and English) are from one of the very few language families across the world which don’t have ideophones. Since ideophones aren’t really present in the languages of the people who wrote about languages most often, those writers kind of just ignored them. The less well-known linguistic literature on ideophones has been going on for decades, and variously describes ideophones as vivid, quasi-synaesthetic, expressive, and so on.

What this boils down to is that for speakers of languages with ideophones, listening to somebody say a regular word is like this:

listening to a regular word

and listening to somebody say an ideophone is like this:

listening to an ideophone

Why, though?

Ideophones are iconic and/or sound-symbolic. These terms are slightly different but are often used interchangeably and both mean that there’s a link between the sound of something language-y (or the shape/form of something language-y in signed languages) and its meaning. This means that, when you’re listening to a regular word, you’re generally just relying on your existing knowledge of the combinations of sounds in your language to know what the meaning is:

regular word processing

…whereas when a speaker of a language with ideophones listens to an ideophone, they feel a rather more direct connection between what the ideophone sounds like and what the meaning of the ideophone is:

ideophone processing

These links between sound and meaning are known as cross-modal correspondences.

Thing is, it’s one thing for various linguists and speakers of languages with ideophones to identify and describe what’s happening; it’s quite another to see if that has any psycho/neurolinguistic basis. This is where my research comes in.

I took a set of Japanese ideophones (e.g. perapera, which means “fluently” when talking about somebody’s language skills; I certainly wish my Japanese was a lot more perapera) and compared them with regular Japanese words (e.g. ryuuchou-ni, which also means “fluently” when talking about somebody’s language skills, but isn’t an ideophone). My Japanese participants read sentences which were the same apart from swapping the ideophones and the arbitrary words around, like:

花子は ぺらぺらと フランス語を話す
Hanako speaks French fluently (where “fluently” = perapera).

花子は りゅうちょうに フランス語を話す
Hanako speaks French fluently (where “fluently” = ryuuchou-ni).

While they read these sentences, I used EEG (or electroencephalography) to measure their brain activity. This is done by putting a load of electrodes in a swimming cap like this:

electrode set up

After measuring a lot of participants reading a lot of sentences in the two conditions, I averaged them together to see if there was a difference between the two conditions… and indeed there was:

figure 1 from japanese natives paper

The red line shows the brain activity in response to the ideophones, and the blue line shows the brain activity in response to the arbitrary words. The red line is higher than the blue line at two important points; the peak at about 250ms after the word was presented (the P2 component), and the consistent bit for the last 400ms (the late positive complex).

Various other research has found that a higher P2 component is elicited by cross-modally congruent stimuli… i.e. this particular brain response is bigger to two things that match nicely (such as a high pitched sound and a small object). Finding this in response to the Japanese ideophones suggests that the brain recognises that the sounds of the ideophones cross-modally match the meanings of the ideophones much more than the sounds of the arbitrary words match the meanings of the arbitrary words. This may be why ideophones are experienced more vividly than arbitrary words.

higher P2 for ideophones

lower P2 for arbitrary words

As for the late positive complex, it’s hard to say. It could be that the cross-modal matching of sound and meaning in ideophones actually makes it harder for the brain to work out the ideophone’s role in a sentence because it has to do all the cross-modal sensory processing on top of all the grammatical stuff it’s doing in the first place. It’s very much up for discussion.

Advertisement
Standard
EEG/ERP

Putting the graph into electroencephalography

ERPists – click to jump to the main point of this blog, which is about plotting measures of confidence and variance. Or just read on from the start, because there’s a lot of highly proficient MS Paint figures in here.

UPDATE! The paper where this dataset comes from is now published in Collabra. You can read the paper here and download all the raw data and analysis scripts here.

ERP graphs are often subjected to daft plotting practices that make them highly frustrating to look at.

ERPing the derp

Negative voltage is often (but not always) plotted upwards, which is counterintuitive but generally justified with “oh but that’s how we’ve always done it”. Axes are rarely labelled, apart from a small key tucked away somewhere in the corner of the graph which still doesn’t give you precise temporal accuracy (which is kind of the point of using EEG in the first place). And finally, these graphs are often generated using ERP programmes then saved as particular file extensions, which then get cramped up or kind of blurry when resized to fit journals’ image criteria. This means that a typical ERP graph looks something a little like this:

typical erp graph

…and the graph is supposed to be interpreted something a little like this:

erp intuitive 2

…although realistically, reading a typical ERP graph is a bit more like this:

erp context

Some of these problems are to do with standard practices; others, due to lack of expertise in generating graphics; and more still are due to journal requirements, which generally specify that graphics must conform to a size which is too small to allow for proper visual inspection of somebody’s data, and also charge approximately four million dollars for the privilege of having these little graphs in colour because of printing costs despite the fact that nobody really reads actual print journals anymore.

Anyway. Many researchers grumble about these pitfalls, but accept that it comes with the territory.

However, one thing I’ve rarely heard discussed, and even more rarely seen plotted, is the representation of different statistical information in ERP graphs.

ERP graphs show the mean voltage across participants on the y-axis at each time point represented on the x-axis (although because of sampling rates, it generally isn’t a different mean voltage for each millisecond, it’s more often a mean voltage for every two milliseconds). Taking the mean readings across trials and across participants is exactly what ERPs are for – they average out the many, many random or irrelevant fluctuations in the EEG data to generate a relatively consistent measure of a brain response to a particular stimulus.

Decades of research have shown that many of these ERPs are reliably generated, so if you get a group of people to read two sentences – one where the sentence makes perfect sense, like the researcher wrote the blog, and one where the final word is replaced with something that’s kind of weird, like the researcher wrote the bicycle – you can bet that there will be a bigger (i.e. more negative) N400 after the kind of weird final words than the ones that make sense. The N400 is named like that because it’s a negative-going wave that normally peaks at around 400ms.

Well, that is, it’ll look like that when you average across the group. You’ll get a nice clean average ERP showing quite clearly what the effect is (I’ve plotted it with positive-up axes, with time points labelled in 100ms intervals, and with two different colours to show the conditions):

standard N400

But, the strength of the ERP – that it averages out noisy data – is also its major weakness. As Steve Levinson points out in a provocative and entertaining jibe at the cognitive sciences, individual variation is huge, both between different groups across the world and between the thirty or so undergraduates who are doing ERP studies for course credit or beer money. The original sin of the cognitive sciences is to deny the variation and diversity in human cognition in an attempt to find the universal human cognitive capabilities. This means that averaging across participants in ERP studies and plotting that average is quite misleading of what’s actually going on… even if the group average is totally predictable. To test this out, I had a look at the ERP plot of a study that I’m writing up now (and to generate my plots, I use R and the ggplot2 package, both of which are brilliant). When I average across all 29 participants and plot the readings from the electrode right in the middle of the top of the head, it looks like this:

Cz electrode (RG onset timelock + NO GUIDE)

There’s a fairly clear effect of the green condition; there’s a P3 followed by a late positivity. This comes out as hugely statistically significant using ANOVAs (the traditional tool of the ERPist) and cluster-based permutation tests in the FieldTrip toolbox (which is also brilliant).

But. What’s it like for individual participants? Below, I’ve plotted some of the participants where no trials were lost to artefacts, meaning that the ERPs for each participant are clearer since they’ve been averaged over all the experimental trials.

Here’s participant 9:

ppt09 Cz electrode (RG onset timelock + NO GUIDE)

Participant 9 reflects the group average quite well. The green line is much higher than the orange line, peaking at about 300ms, and then the green line is also more positive than the orange line for the last few hundred milliseconds. This is nice.

Here’s participant 13:

ppt13 Cz electrode (RG onset timelock + NO GUIDE)

Participant 13 is not reflective of the group average. There’s no P3 effect, and the late positivity effect is actually reversed between conditions. There might even be a P2 effect in the orange condition. Oh dear. I wonder if this individual variation will get lost in the averaging process?

Here’s participant 15:

ppt15 Cz electrode (RG onset timelock + NO GUIDE)

Participant 15 shows the P3 effect, albeit about 100ms later than participant 9 does, but there isn’t really a late positivity here. Swings and roundabouts, innit.

However, despite this variation, if I average the three of them together, I get a waveform that is relatively close to the group average:

ppt9-13-15 Cz electrode (RG onset timelock + NO GUIDE)

The P3 effect is fairly clear, although the late positivity isn’t… but then again, it’s only from three participants, and EEG studies should generally use at least 20-25 participants. It would also be ideal if participants could do hundreds or thousands of trials so that the ERPs for each participant are much more reliable, but this experiment took an hour and a half as it is; nobody wants to sit in a chair strapped into a swimming cap full of electrodes for a whole day.

So, on the one hand, this shows that ERPs from a tenth of the sample size can actually be quite reflective of the group average ERPs… but on the other hand, this shows that even ERPs averaged over only three participants can still obscure the highly divergent readings of one of them.

Now, if only there were a way of calculating an average, knowing how accurate that average is, and also knowing what the variation in the sample size is like…

…which, finally, brings me onto the main point of this blog:

Why do we only plot the mean across all participants when we could also include measures of confidence and variance?

In behavioural data, it’s relatively common to plot line graphs where the line is the mean across participants, while there’s also a shaded area around the line which typically shows 95% confidence intervals. Graphs with confidence intervals look a bit like this (although normally a bit less like an earthworm with a go-faster stripe on it):

graph with CIs

This is pretty useful in visualising data. It’s taking a statistical measure of how reliable the measurement is, and plotting it in a way that’s easy to see.

So. Why aren’t ERPs plotted with confidence intervals? The obvious stumbling point is the ridiculous requirements of journals (see above), which would make the shading quite hard to do. But, if we all realised that everything happens on the internet now, where colour printing isn’t a thing, then we could plot and publish ERPs that look like this:

Cz electrode (RG onset timelock + 95pc CIs + NO GUIDE)

It’s nice, isn’t it? It also makes it fairly clear where the main effects are; not only do the lines diverge, the shaded areas do too. This might even go some way towards addressing Steve Levinson’s valid concerns about cognitive science data ignoring individual data… although only within one population. My data was acquired from 18-30 year old Dutch university students, and cannot be generalised to, say, 75 year old illiterate Hindi speakers with any degree of certainty, let alone 95%.

This isn’t really measuring the variance within a sample, though. How can we plot an ERP graph which gives some indication of how participant 13 had a completely different response from participants 9 and 15? Well, we could try plotting it with the shaded areas showing one standard deviation either side of the mean instead. It looks like this:

Cz electrode (RG onset timelock + SDs + NO GUIDE)

…which, let’s face it, is pretty gross. The colours overlap a lot, and it’s just kind of messy. But, it’s still informative; it indicates a fair chunk of the variation within my 29 participants, and it’s still fairly clear where the main effects are.

Is this a valid way of showing ERP data? I quite like it, but I’m not sure if other ERP researchers would find this useful (or indeed sensible). I’m also not sure if I’ve missed something obvious about this which makes it impractical or incorrect. It could well be that the amplitudes at each time point aren’t normally distributed, which would require some more advanced approaches to showing confidence intervals, but it’s something to go on at least.

I’d love to hear people’s opinions in the comments below.

To summarise, then:

– ERP graphs aren’t all that great

– but they could be if we plotted them logically

– and they could be really great if we plotted more than just the sample mean

Standard