April 20, 2008
How to Measure Whether One Language is More Efficient Than Another
Mark Liberman has a couple of fascinating recent posts on comparing the vocabulary and on comparing the efficiency of different languages, over at Language Log:
Alex Baumans described a bilingual magazine's problems in equalizing space and word-count allocations between Dutch and French...Alex's discussion of Dutch compounds underlines a point that I made in the earlier post, namely that spaces are not a very helpful way to define the boundaries of words, especially in comparisons across languages. But what I'd like to follow up on today is his observation about comparisons of word and character counts.
As discussed in a post a few years ago ("One world, how many bytes?", 8/5/2005), based on a variety of large collections of English-Chinese parallel texts, English texts are larger than their Chinese counterparts by a factor of between 1.37 and 2.27 before compression, or 1.19 to 1.41 after compression.
My impression is that there are several different factors at work here -- but they don't seem to me to account fully for the differences in length, especially in comparing compressed texts.
Posted by Robin Varghese at 09:59 AM | Permalink





Comments
And an even more recent one: http://languagelog.ldc.upenn.edu/nll/?p=22
Posted by: anon | Apr 20, 2008 1:19:17 PM
Considering this as a matter of efficiency implies that "information" can be separated from the media it is transmitted through. This may be true on a superficial level, as in translating articles HVAC repair from one language to another, but we should be careful not to infer a pure, uncorrupted semantic essence (like the LOT) that may not actually exist. Not to get all Marshall McLuhan, but the so-called inefficiency is part of the message.
Posted by: Chris Schoen | Apr 21, 2008 3:05:30 PM
Post a comment