I'm doing a corpus analysis to look at how word repetition is influenced by the phrase it appears in. I got some preliminary results with a small number of items but I really need to scale things up. I'm thinking of switching my preprocessing code to Python because R is just too goddamn slow for this sort of thing. I've also been itching to try Julia and this project might be a good excuse to do that. Has anyone here used Julia before?
Can you expand on what you're actually trying to do? I'm having trouble parsing your first sentence. As far as switching to Python, I know one of my old professors working on big data stuff really liked it, so there's at least that endorsement. Unfortunately I can give more than that. I like my languages statically typed.
Yeah that was not nearly enough detail! I do research in the field of psycholinguistics and I'm currently designing an experiment that tests how we use mental representations of phrases during language processing and production. In this case, I've hypothesized that words which appear in highly frequent phrases undergo less "lexical priming" -- that is, activating a strong multiword representation reduces the amount of activation for any one of the individual words within the single phrase. (Assuming that "representation strength" scales in a nice way with expression frequency, which we have some evidence for). This is based on some provocative data I got in a previous experiment, but that experiment wasn't explicitly designed to test this hypothesis so I'm designing another one! As a first step, I'm analyzing naturally occurring speech data in a corpus of telephone conversations to see if my hypothesis is supported in natural data, then if it's promising I'm going to do a tightly controlled experiment in the lab. Unfortunately, doing fancy calculations on hundreds of thousands of rows is a huge pain in the ass for the language I'm most comfortable with, R. R is great for statistical analysis but is painfully slow otherwise...
Idioms are difficult to use to study this precisely because they have different possible meaning. I'm talking about phrases that appear to still be fully compositional, but are very frequent with words that co-occur together more than expected by chance - some examples are "parmesan cheese", " academic achievement ", " good job", etc.