Word Cloud From Google Chats

by wtlebo in Circuits > Art

8871 Views, 5 Favorites, 0 Comments

Word Cloud From Google Chats

snap3056.jpg
I created a word cloud from all of my google chats between my wife and I over 1 year. I used wordle .net to generate the cloud, getting the chats to a single text file was the tricky part. If you have any other tips or tricks to make this easier please let me know...


...after going through this again, you might be able to skip steps 7 & 8 if you create a label in gmail that is all chats from a single person. then you can use the thunderbird export tool to put the entire folder into one text file:  "Tools"-> "ImportExportTools" -> "Export all messages in the folder" -> "Plain Text Format (one file)".  I haven't tested this, but it might work.

Enable IMAP in GMail

snap3057.jpg
In Gmail settings, under "Forwarding and POP/IMAP", select "Enable IMAP"

Set Chats to Be Shown in IMAP

snap3058.jpg
In gmail settings, find "Chats" and select "Show in IMAP"

Install Thunderbird

snap3062.jpg
Install Thunderbird, set it up to you gmail account, make sure it is configured as IMAP and not POP

Subscribe to "chats" in Thunderbird

snap3059.jpg
Make sure you are subscribed to "Chats" in the folder tree on the left side of Thunderbird (right click to change)

Install Export Tools Into Thunderbird

snap3063.jpg
Install Thunderbird add-on ImportExportTools

Export All Chats to a Directory

snap3061.jpg
In thunderbird with the "Chats" folder highlighted, go to "Tools"-> "ImportExportTools" -> "Export all messages in the folder" -> "Plain Text Format" and select a folder to save it to.

Prepare Text Files From a Single Person Into a Single Directory.

snap3066.jpg
snap3064.jpg
Copy only chats from desired person into their on directory. Search that directory for a particular persons name and copy those files only to a simple directory like "C:\Chats"

Run batch renaming file to remove spaces from file names and shorten them. Save the "rename.bat" file in that same directory.. [i got it here]. In the command prompt go to C:\Chats and type in:  rename.bat -files " " "_" C:\Chats in order to get rid of the spaces. You may also want to replace the name of the person in the file with initials to make it smaller. rename.bat -files " Bob Dole " "bd" C:\Chats

Downloads

Combine All Chats Into a Single Text File.

snap3067.jpg
From the start menu -> run -> type in "cmd" and enter. 
Navigate to the correct folder using "cd .." and cd "Chats", etc...
Use DOS command:  copy /a *.txt allchats.txt
This combines all of the text files into a single file called "allchats.txt"

Edit Chat Text File to Get Rid of Some Common Words/expressions

snap3068.jpg
Open "allchats.txt" in Notepad++ (or similar) and:
 - use find/Replace to remove strings like " me ", " AM ", " AM:", " PM ", " PM:", " minutes ", etc...notice the spaces and colons in there. I ended up going back to this step several times, otherwise some words are really big. You can also remove words from the wordle site.
 - optional: select all and right click to change all to upper or lower case.

Run Text Through Wordle.net Website.

snap3069.jpg
copy all text into wordle.net website.

After doing this you can set different coler/text options. You can also remove words by right clicking them.