July 22, 2008
Wordle
If you give Wordle some text, it generates a "word cloud" based on word frequencies in the text, which can be quite nice looking. I entered the entire text of Hamlet and got this:
Try it. [Via Carl Zimmer.]
Posted by Abbas Raza at 09:11 AM | Permalink










Comments
It would be interesting to explore how difficult it would be to discover a partial inverse. That is, given a wordle, what is the gist of the original piece?
Posted by: J.A. Paulos | Jul 22, 2008 1:09:40 PM
Well, judging from this wordle, Hamlet has something to do with Hamlet.
(In all seriousness, J.A. Paulos' idea is quite neat.)
Posted by: April | Jul 22, 2008 1:21:31 PM
Also, the program appears to be caps-sensitive. Interesting.
Posted by: April | Jul 22, 2008 1:23:30 PM
The best 3QD post ever. OK, I deal with word frequencies on a daily basis.
Posted by: Chandan Narayan | Jul 22, 2008 5:40:48 PM
This is so cool...but what if it should fall into the wrong hands and just anybody...and I really mean anybody...starts making concrete poetry and text deconstructions that look like groovy, Sixties, Saul Bass titles...what kind of world will that create? I guess we'll know by this time tomorrow.
P.S. I've already input a John Donne poem and an H.P.Lovecraft story-...this is so much fun. Hmmm...where can I get the entire text to "Last Exit to Brooklyn"?
Posted by: Pete Chapman | Jul 22, 2008 7:59:18 PM
It can be fun to play with google tranlator. Here are some lines from The Love Song of J. Alfred Prufock. I then had it translated into Chinese and back into English:
LET us go then, you and I,
When the evening is spread out against the sky
Like a patient etherised upon a table;
Let us go, through certain half-deserted streets,
The muttering retreats 5
Of restless nights in one-night cheap hotels
And sawdust restaurants with oyster-shells:
Streets that follow like a tedious argument
Of insidious intent
To lead you to an overwhelming question … 10
Oh, do not ask, “What is it?”
Let us go and make our visit.
After Chinese translation:
Let's go, and then, you and I,
When scattered on the evening sky
Like a patient etherised, a table;
Let's go, through certain half of the abandoned streets,
The muttering retreats 5
Disturbed nights, one night in a cheap hotel
And sawdust restaurants with oyster-shell:
Follow-up streets like a lengthy argument
Sinister intention
Lead you to an overwhelming problem… 10
Oh, do not ask, "What is this" »
Let's go and make our visit.
Posted by: Jared | Jul 22, 2008 9:48:50 PM
I'll go out on a limb and say that finding any meaningful inverse of these things is going to be impossible. The results of a simple word frequency count are simply too skewed in favor of stuff like names and extremely common words (articles, prepositions, pronouns, etc.) and other semantic detritus that do nothing in and of themselves to establish narrative context.
Now, you might be able to pull something together if you had a near-complete frequency list to work with, but at that point you pretty much have the complete text anyway, just scrambled up.
Posted by: Chris | Jul 22, 2008 11:25:35 PM
I couldn't resist, and keyed in 10 words that were meaningful to me, nouns all. I think I must have been hoping wordle had a word association program -- if so I was disappointed, because I got back those same 10 words arranged into a jazzy graphic, with each word the same size. So syntax must be important in cuing it, and repeated appearances of any word.
Posted by: Elatia Harris | Jul 22, 2008 11:39:16 PM
Personally, I think that instead of an abstract in a scientific article, we should just put the Wordle of the text at the start. It would be more fun. At least, I'm going to put one on my next poster at a conference. (If I can remember to do so, about 9 months from now.)
Posted by: Robert the Red | Jul 23, 2008 6:34:15 AM
I just discovered this today at Edge of the American West. Very addictive.
Thought you might like some Obama wordles...
http://www.flickr.com/photos/25652913@N03/tags/obamawordles/
Posted by: josh w | Jul 23, 2008 6:56:52 AM
Jared;
I pretty much agree with you.
April;
In the program's settings you can make it ignore the upper or lower case or deal with case as written.
Elatia;
Syntax isn't even in the running. It's all about word frequency which is why the Hamlet example works as well as it does. Other plays by Shakespeare look very similar; character names arranged by size per number of speeches with lots of "thy" and "thou" (which is funny because the program has a default in the Layout menu to ignore "common English words"- if it didn't you'd end up with every word cloud being dominated by articles and pronouns- for Shakespeare "thy" and "thou" were common). In the Donne poem that I used "one" was the dominant word but he was using it like a pronoun as often as he meant it as a number which was common to the English of his time. There is an editing feature which allows you to tweak the results by omitting certain words.
So maybe poems and fiction aren't the most interesting things to feed the program. Pop songs work very well especially if the title is in the chorus. The result is pretty much the gist of the piece. Political speeches written after the advent of television would give you a goofy readout of the speaker's obsessions and whatever point is trying to be driven home. So in a sense you do get a very crude form of word association if you think of the speaker as a repetitious neurotic.
If I understand the partial inverse idea at all; to make the program really interesting you'd also have to look for the unique words- the one's of low frequency but high impact. Enter "Richard III" and be reasonably sure that his "horse" in "a horse, a horse my kingdom for a horse" really does show up and not get shunted aside by a sea of thy, thou and methinks. And for the reasons Jared stated that just ain't gonna happen in this version of Wordle; you'd need genuine Artificial Intelligence to pull that off and why do I think the first A.I. will either be a doctor or a fighter pilot and not an English professor.
At present Wordle is a very amusing bean counter. Fun to play with on a mindless pop ditty or "a tedious argument of insidious intent".
Posted by: Pete Chapman | Jul 23, 2008 7:06:26 AM
Thanks, Pete!
Elatia, it only measures word frequency, as far as I can gather. Perhaps you could do something like enter a personal journal and see how many times those meaningful words show up--if at all--in your writing. That could be pretty neat.
Posted by: April | Jul 23, 2008 9:50:02 AM
Pete and April, thank you! I think I'll tell it all the stuff I'd rather not think about, entering x 3 those things I really can't bear, x 2 subjects that are rather inimical to peace of mind, and x 1 that which is merely very annoying. Perhaps this way it would generate a Priorities Cloud that would remind me to address some truly wearying problems.
Posted by: Elatia Harris | Jul 23, 2008 10:56:44 AM
Post a comment