EEG/ERP, Sound-symbolism

Ideophones in Japanese modulate the P2 and late positive complex responses: MS Paint version

I just had my first paper published:

Lockwood, G., & Tuomainen, J. (2015). Ideophones in Japanese modulate the P2 and late positive complex responses. Language Sciences, 933.

It’s completely open access, so have a look (and download the PDF, because it looks a lot nicer than the full text).

It’s a fuller, better version of my MSc thesis, which means that I’ve been working on this project on and off since about April 2013. Testing was done in June/July 2013 and November 2013. Early versions of this paper have been presented at an ideophone workshop in Tokyo in December 2013, a synaesthesia conference in Hamburg in February 2014, and a neurobiology of language conference in Amsterdam in August 2014. It was rejected once from one journal in August 2014, and was submitted to this journal in October 2014. It feels great to have it finally published, but also kind of anticlimactic, given that I’m focusing on some different research now.

I feel like the abstract and full article describe what’s going on quite well; this is a generally under-researched area within the (neuro)science of language as it is, so it’s written for the sizeable number of people who aren’t knowledgeable about ideophones in the first place. However, if you can’t explain your research using shoddy MS Paint figures, then you can’t explain it at all, so here goes.

Ideophones are “marked words which depict sensory imagery” (Dingemanse, 2012). In essence, this means that ideophones stick out compared to regular words, ideophones are real words (not just off the cuff onomatopoeia), ideophones try and imitate the thing they mean rather than just describing it, and ideophones mean things to do with sensory experiences. This sounds like onomatopoeia, but it’s a lot more than that. Ideophones have been kind of sidelined within traditional approaches to language because of a strange fluke whereby the original languages of academia (i.e. European languages, and especially French, German, and English) are from one of the very few language families across the world which don’t have ideophones. Since ideophones aren’t really present in the languages of the people who wrote about languages most often, those writers kind of just ignored them. The less well-known linguistic literature on ideophones has been going on for decades, and variously describes ideophones as vivid, quasi-synaesthetic, expressive, and so on.

What this boils down to is that for speakers of languages with ideophones, listening to somebody say a regular word is like this:

listening to a regular word

and listening to somebody say an ideophone is like this:

listening to an ideophone

Why, though?

Ideophones are iconic and/or sound-symbolic. These terms are slightly different but are often used interchangeably and both mean that there’s a link between the sound of something language-y (or the shape/form of something language-y in signed languages) and its meaning. This means that, when you’re listening to a regular word, you’re generally just relying on your existing knowledge of the combinations of sounds in your language to know what the meaning is:

regular word processing

…whereas when a speaker of a language with ideophones listens to an ideophone, they feel a rather more direct connection between what the ideophone sounds like and what the meaning of the ideophone is:

ideophone processing

These links between sound and meaning are known as cross-modal correspondences.

Thing is, it’s one thing for various linguists and speakers of languages with ideophones to identify and describe what’s happening; it’s quite another to see if that has any psycho/neurolinguistic basis. This is where my research comes in.

I took a set of Japanese ideophones (e.g. perapera, which means “fluently” when talking about somebody’s language skills; I certainly wish my Japanese was a lot more perapera) and compared them with regular Japanese words (e.g. ryuuchou-ni, which also means “fluently” when talking about somebody’s language skills, but isn’t an ideophone). My Japanese participants read sentences which were the same apart from swapping the ideophones and the arbitrary words around, like:

花子は ぺらぺらと フランス語を話す
Hanako speaks French fluently (where “fluently” = perapera).

花子は りゅうちょうに フランス語を話す
Hanako speaks French fluently (where “fluently” = ryuuchou-ni).

While they read these sentences, I used EEG (or electroencephalography) to measure their brain activity. This is done by putting a load of electrodes in a swimming cap like this:

electrode set up

After measuring a lot of participants reading a lot of sentences in the two conditions, I averaged them together to see if there was a difference between the two conditions… and indeed there was:

figure 1 from japanese natives paper

The red line shows the brain activity in response to the ideophones, and the blue line shows the brain activity in response to the arbitrary words. The red line is higher than the blue line at two important points; the peak at about 250ms after the word was presented (the P2 component), and the consistent bit for the last 400ms (the late positive complex).

Various other research has found that a higher P2 component is elicited by cross-modally congruent stimuli… i.e. this particular brain response is bigger to two things that match nicely (such as a high pitched sound and a small object). Finding this in response to the Japanese ideophones suggests that the brain recognises that the sounds of the ideophones cross-modally match the meanings of the ideophones much more than the sounds of the arbitrary words match the meanings of the arbitrary words. This may be why ideophones are experienced more vividly than arbitrary words.

higher P2 for ideophones

lower P2 for arbitrary words

As for the late positive complex, it’s hard to say. It could be that the cross-modal matching of sound and meaning in ideophones actually makes it harder for the brain to work out the ideophone’s role in a sentence because it has to do all the cross-modal sensory processing on top of all the grammatical stuff it’s doing in the first place. It’s very much up for discussion.


Putting the graph into electroencephalography

ERPists – click to jump to the main point of this blog, which is about plotting measures of confidence and variance. Or just read on from the start, because there’s a lot of highly proficient MS Paint figures in here.

UPDATE! The paper where this dataset comes from is now published in Collabra. You can read the paper here and download all the raw data and analysis scripts here.

ERP graphs are often subjected to daft plotting practices that make them highly frustrating to look at.

ERPing the derp

Negative voltage is often (but not always) plotted upwards, which is counterintuitive but generally justified with “oh but that’s how we’ve always done it”. Axes are rarely labelled, apart from a small key tucked away somewhere in the corner of the graph which still doesn’t give you precise temporal accuracy (which is kind of the point of using EEG in the first place). And finally, these graphs are often generated using ERP programmes then saved as particular file extensions, which then get cramped up or kind of blurry when resized to fit journals’ image criteria. This means that a typical ERP graph looks something a little like this:

typical erp graph

…and the graph is supposed to be interpreted something a little like this:

erp intuitive 2

…although realistically, reading a typical ERP graph is a bit more like this:

erp context

Some of these problems are to do with standard practices; others, due to lack of expertise in generating graphics; and more still are due to journal requirements, which generally specify that graphics must conform to a size which is too small to allow for proper visual inspection of somebody’s data, and also charge approximately four million dollars for the privilege of having these little graphs in colour because of printing costs despite the fact that nobody really reads actual print journals anymore.

Anyway. Many researchers grumble about these pitfalls, but accept that it comes with the territory.

However, one thing I’ve rarely heard discussed, and even more rarely seen plotted, is the representation of different statistical information in ERP graphs.

ERP graphs show the mean voltage across participants on the y-axis at each time point represented on the x-axis (although because of sampling rates, it generally isn’t a different mean voltage for each millisecond, it’s more often a mean voltage for every two milliseconds). Taking the mean readings across trials and across participants is exactly what ERPs are for – they average out the many, many random or irrelevant fluctuations in the EEG data to generate a relatively consistent measure of a brain response to a particular stimulus.

Decades of research have shown that many of these ERPs are reliably generated, so if you get a group of people to read two sentences – one where the sentence makes perfect sense, like the researcher wrote the blog, and one where the final word is replaced with something that’s kind of weird, like the researcher wrote the bicycle – you can bet that there will be a bigger (i.e. more negative) N400 after the kind of weird final words than the ones that make sense. The N400 is named like that because it’s a negative-going wave that normally peaks at around 400ms.

Well, that is, it’ll look like that when you average across the group. You’ll get a nice clean average ERP showing quite clearly what the effect is (I’ve plotted it with positive-up axes, with time points labelled in 100ms intervals, and with two different colours to show the conditions):

standard N400

But, the strength of the ERP – that it averages out noisy data – is also its major weakness. As Steve Levinson points out in a provocative and entertaining jibe at the cognitive sciences, individual variation is huge, both between different groups across the world and between the thirty or so undergraduates who are doing ERP studies for course credit or beer money. The original sin of the cognitive sciences is to deny the variation and diversity in human cognition in an attempt to find the universal human cognitive capabilities. This means that averaging across participants in ERP studies and plotting that average is quite misleading of what’s actually going on… even if the group average is totally predictable. To test this out, I had a look at the ERP plot of a study that I’m writing up now (and to generate my plots, I use R and the ggplot2 package, both of which are brilliant). When I average across all 29 participants and plot the readings from the electrode right in the middle of the top of the head, it looks like this:

Cz electrode (RG onset timelock + NO GUIDE)

There’s a fairly clear effect of the green condition; there’s a P3 followed by a late positivity. This comes out as hugely statistically significant using ANOVAs (the traditional tool of the ERPist) and cluster-based permutation tests in the FieldTrip toolbox (which is also brilliant).

But. What’s it like for individual participants? Below, I’ve plotted some of the participants where no trials were lost to artefacts, meaning that the ERPs for each participant are clearer since they’ve been averaged over all the experimental trials.

Here’s participant 9:

ppt09 Cz electrode (RG onset timelock + NO GUIDE)

Participant 9 reflects the group average quite well. The green line is much higher than the orange line, peaking at about 300ms, and then the green line is also more positive than the orange line for the last few hundred milliseconds. This is nice.

Here’s participant 13:

ppt13 Cz electrode (RG onset timelock + NO GUIDE)

Participant 13 is not reflective of the group average. There’s no P3 effect, and the late positivity effect is actually reversed between conditions. There might even be a P2 effect in the orange condition. Oh dear. I wonder if this individual variation will get lost in the averaging process?

Here’s participant 15:

ppt15 Cz electrode (RG onset timelock + NO GUIDE)

Participant 15 shows the P3 effect, albeit about 100ms later than participant 9 does, but there isn’t really a late positivity here. Swings and roundabouts, innit.

However, despite this variation, if I average the three of them together, I get a waveform that is relatively close to the group average:

ppt9-13-15 Cz electrode (RG onset timelock + NO GUIDE)

The P3 effect is fairly clear, although the late positivity isn’t… but then again, it’s only from three participants, and EEG studies should generally use at least 20-25 participants. It would also be ideal if participants could do hundreds or thousands of trials so that the ERPs for each participant are much more reliable, but this experiment took an hour and a half as it is; nobody wants to sit in a chair strapped into a swimming cap full of electrodes for a whole day.

So, on the one hand, this shows that ERPs from a tenth of the sample size can actually be quite reflective of the group average ERPs… but on the other hand, this shows that even ERPs averaged over only three participants can still obscure the highly divergent readings of one of them.

Now, if only there were a way of calculating an average, knowing how accurate that average is, and also knowing what the variation in the sample size is like…

…which, finally, brings me onto the main point of this blog:

Why do we only plot the mean across all participants when we could also include measures of confidence and variance?

In behavioural data, it’s relatively common to plot line graphs where the line is the mean across participants, while there’s also a shaded area around the line which typically shows 95% confidence intervals. Graphs with confidence intervals look a bit like this (although normally a bit less like an earthworm with a go-faster stripe on it):

graph with CIs

This is pretty useful in visualising data. It’s taking a statistical measure of how reliable the measurement is, and plotting it in a way that’s easy to see.

So. Why aren’t ERPs plotted with confidence intervals? The obvious stumbling point is the ridiculous requirements of journals (see above), which would make the shading quite hard to do. But, if we all realised that everything happens on the internet now, where colour printing isn’t a thing, then we could plot and publish ERPs that look like this:

Cz electrode (RG onset timelock + 95pc CIs + NO GUIDE)

It’s nice, isn’t it? It also makes it fairly clear where the main effects are; not only do the lines diverge, the shaded areas do too. This might even go some way towards addressing Steve Levinson’s valid concerns about cognitive science data ignoring individual data… although only within one population. My data was acquired from 18-30 year old Dutch university students, and cannot be generalised to, say, 75 year old illiterate Hindi speakers with any degree of certainty, let alone 95%.

This isn’t really measuring the variance within a sample, though. How can we plot an ERP graph which gives some indication of how participant 13 had a completely different response from participants 9 and 15? Well, we could try plotting it with the shaded areas showing one standard deviation either side of the mean instead. It looks like this:

Cz electrode (RG onset timelock + SDs + NO GUIDE)

…which, let’s face it, is pretty gross. The colours overlap a lot, and it’s just kind of messy. But, it’s still informative; it indicates a fair chunk of the variation within my 29 participants, and it’s still fairly clear where the main effects are.

Is this a valid way of showing ERP data? I quite like it, but I’m not sure if other ERP researchers would find this useful (or indeed sensible). I’m also not sure if I’ve missed something obvious about this which makes it impractical or incorrect. It could well be that the amplitudes at each time point aren’t normally distributed, which would require some more advanced approaches to showing confidence intervals, but it’s something to go on at least.

I’d love to hear people’s opinions in the comments below.

To summarise, then:

– ERP graphs aren’t all that great

– but they could be if we plotted them logically

– and they could be really great if we plotted more than just the sample mean


Papers of the Year: 2014

I’m not really one for new year’s resolutions, but they are a useful crutch for getting things done sometimes. And so, 2015 will herald the dawn of a brand new academic blog, packed full of information and insights from the business end of sound-symbolism and synaesthesia research, along with a sprinkling of observations and anecdotes about life in early academia in general.

December, though, is a great time to start. What better way to begin a new blog than tapping into the buzzfeed zeitgeist and have a listicle with gifs?  Without further ado, I hereby present the moderately prestigious, barely anticipated, inaugural annual Papers of the Year awards listicle. In no particular order, here are the five most interesting and/or important papers I’ve read this year.

1. Behme (2014). “A ‘Galilean’ Science of Language.” Journal of Linguistics 50, no. 03: 671–704. doi:10.1017/S0022226714000061.

(.pdf here)


Far more august minds than mine have spilled lot of virtual ink over Behme’s book review … well, I say book review, but it’s more like a brief section on Chomsky’s book The Science of Language which is then used as a launchpad to critically assess Chomsky’s entire scholarship. From the strictly academic side of things, I’d say that the majority of the criticism is justified, although I’m not sure I agree with Behme’s rather absolutist stance that ignoring or discarding any single piece of evidence that conflicts with your theory is absolutely reprehensible and invalidates your entire research programme. To do so on a massive scale is of course problematic, but I think there is a little more leeway in linguistics than Behme makes out. This is also a really interesting paper because of the reactions it inspires. We had a journal club session in the Neurobiology of Language department at MPI about this paper, and it was fascinating to see people’s opinions about the tone and style. Some (myself included) believe that reviews like this are perfectly fine if the author accepts that they have to stand behind their rather direct points of view; others feel that the tone was aggressive and that there’s no place in science for this kind of attack. Either way, it’s beautifully written and addresses some hugely important and uncomfortable truths about the science of language and The Science of Language.

2. Revill, Namy, DeFife, and Nygaard (2014). “Cross-Linguistic Sound Symbolism and Crossmodal Correspondence: Evidence from fMRI and DTI.” Brain and Language 128, no. 1: 18–24. doi:10.1016/j.bandl.2013.11.002.

(no free .pdf available)

excited duck

I’ve been reading and re-reading this paper quite a lot this year. It’s an fMRI study on sound-symbolism which finds increased activation for sound-symbolic words in the left superior parietal cortex, which the authors take to mean the engagement of cross-modal sensory integration networks. That is to say, it seems that monolingual native English speakers are able to integrate sound and sensory meaning when the sound of the word naturally fits the meaning. My experiments use a similar approach with EEG, so it was very exciting to read a paper which independently expressed the same kind of ideas using a different imaging technique. Sadly, the wider behavioural experiment which they used to test the stimuli hasn’t been published yet – I’m interested to see the variation in the words they used, as some words were from languages without much sound-symbolism (Dutch, for example), while other words were from languages with lots of ideophones (e.g. Yoruba). I’m looking forward to reading about that in more detail.

3. Skipper (2014). “Echoes of the Spoken Past: How Auditory Cortex Hears Context during Speech Perception.” Philosophical Transactions of the Royal Society B: Biological Sciences 369, no. 1651: 20130297. doi:10.1098/rstb.2013.0297.

(open access paper available here)

husky hearing questioning

This paper addresses context beyond language and asks why neuroimaging meta-analyses show that the auditory cortex is less active (and sometimes deactivated) when people listen to meaningful speech compared to less meaningful sounds. Skipper’s model suggests that the auditory cortex doesn’t “listen” to speech, but instead matches the input to predictions made from context; the closer the prediction matches the input, the less error checking there is, and consequently the less activation of the auditory cortex there is. The role of the auditory cortex, therefore, is to confirm or deny internal predictions about the identity of sounds. When predictions originating from PVF-SP (posterior ventral frontal regions for speech perception) regions are accurate, no error signal is generated in the auditory cortex and so less processing is required. More accurate predictions could be generated from verbal and non-verbal context (indeed, Skipper argues that verbal and non-verbal is a false distinction), resulting in less error signal, and therefore less metabolic expenditure (suggesting a metabolic conservation basis for the existence of the predictive model).

It’s interesting, and definitely plausible, but I think he goes too far. He throws the baby out with the bathwater when arguing against the necessity of traditional linguistic units; just because context (rather than specifically phonemes, syllables, etc.) seems to be the basis for predictions and error checking, that doesn’t mean that well-attested traditional linguistic units aren’t important or aren’t there. Indeed, if they’re not important, why are they there, and why are they so consistently distinctive?

Linguistic reservations aside, this is one of the most interesting ideas I’ve read this year.

4. Perniss and Vigliocco (2014). “The Bridge of Iconicity: From a World of Experience to the Experience of Language.” Philosophical Transactions of the Royal Society B: Biological Sciences 369, no. 1651: 20130300. doi:10.1098/rstb.2013.0300.

(open access paper available here)

Another paper from the special edition of Phil.Trans.Royal Society B on language as a multimodal phenomenon. I like how the three functions of iconicity are made clear here: displacement, referentiality, and embodiment. I also like how an attempt is made at categorising and more precisely defining iconicity, as pinning it down precisely has been quite tricky and different researchers use different terms in different ways. Their definition of iconicity has undergone a (welcome) narrowing compared to their definition in Perniss et al. (2010); they now equate it directly to sound-symbolism (which I’m not sure I fully agree with), and define it as “putatively universal as well as language-specific mappings between given sounds and properties of referents”. This version of iconicity does not include systematicity, or any “non-arbitrary mappings achieved simply through regularity or systematicity of mappings between phonology and meaning”. I’m neutral on this. Certainly, statistical sound-symbolism is different from sensory sound-symbolism, but where do we draw the line between conventionalised language-specific sound-symbolism and statistical sound-symbolism? How is it possible to differentiate them, given that language-specific sound-symbolism will also be statistically overrepresented with certain concepts? Moreover, what are phonaesthemes now? Can you distinguish between statistical phonaesthemes and sensory phonaesthemes which are also very common? This paper goes further than most in terms of categorising and defining the casserole of concepts related to iconicity and it defines the state and purpose of iconicity very well.

5. Shin and Kim (2014). “Both ‘나’ and ‘な’ Are Yellow: Cross-Linguistic Investigation in Search of the Determinants of Synesthetic Color.” Neuropsychologia. doi:10.1016/j.neuropsychologia.2014.09.032.

(no free .pdf available)

adventure time nice fist bump

This is a study of four trilingual Korean-Japanese-English speakers who also have grapheme-colour synaesthesia (which wins the award of “most niche participant group of 2014” for me). They found that all four of them had broadly similar colours for the same characters across languages, and that the effect was more strongly driven by sound rather than the visual features of the characters. This means that grapheme-colour synaesthesia seems to be driven by the sounds of the graphemes more than their shapes. This is rather an exciting find, because it hints that a previously non-linguistic phenomenon may well be rooted in language, and this may have interesting implications for the processing of cross-modal correspondences in language in non-synaesthetes too.