Dealing with emojis in mined social media data can be tricky for a number of reasons. First, you have to decode them and then… well I guess that is it. After you decode them there is a number of cool things you can look at though!
As mentioned, if you are working with social media data, chances are there will be emojis in that data. You can ‘transform’ these emojis into prose using this code as well as a CSV file I’ve put together of what all of the emojis look like in R. (The idea for this comes from Jessica Peterka-Bonetta’s work – she has a list of emojis as well, but it does not include the newest batch of emojis, Unicode Version 9.0, nor the different skin color options for human-based emojis). If you use this emoji list for your own research, please make sure to acknowledge both myself and Jessica.
Load in the CSV file. You want to make sure it is located in the correct working directory so R can find it when you tell it to read it in.
tweets=read.csv("Col_Sep_INSTACORPUS.csv", header=T)
emoticons <- read.csv("Decoded Emojis Col Sep.csv", header = T)
To transform the emojis, you first need to transform your tweet data into ASCII:
tweets$text <- iconv(tweets$text, from = "latin1", to = "ascii",
sub = "byte")
To ‘count’ the emojis you do a find and replace using the CSV file of ‘Decoded Emojis’ as a reference. Here I am using the DataCombine package. What this does is identifies emojis in the tweeted Instagram posts and then replaces them with a prose version. I used whatever description pops up when hovering one’s cursor over an emoji on an Apple emoji keyboard. If not completely the same as other platforms, it provides enough information to find the emoji in question if you are not sure which one was used in the post.
library(DataCombine)
tweets <- FindReplace(data = tweets, Var = "text",
replaceData = emoticons,
from = "R_Encoding", to = "Name",
exact = FALSE)
Now I’m going to subset the data to just look at those posts that have emojis in them. I got help in doing this from here. Again I use my emoji dictionary available here.
emoticons <- read.csv("Decoded Emojis Col Sep.csv", header = T)
emogrepl <- grepl(paste(emoticons$Name, collapse = "|"), tweets$text)
emogreplDF<-as.data.frame(emogrepl)
tweets$ID7 <- 1:nrow(tweets)
emogreplDF$ID7 <- 1:nrow(emogreplDF)
tweets <- merge(tweets,emogreplDF,by="ID7")
emosub <- tweets[tweets$emogrepl == "TRUE", ]
Now that you have a subset of emojis you can compare posts with emojis vs. posts without etc. etc.!
How about subsetting BY emoji? Let’s look just at posts that have certain emojis in them, like the red heart emoji or the face with tears of joy.
First we do pattern matching and replacement. The first command looks through the text of the emosub data frame and finds all instances in which the string ‘HEAVYBLACKHEART’ is present and then generates a list of T/F values
heartgrepl <- grepl(paste(" HEAVYBLACKHEART "), emosub$text)
# Turn that list of T/F values into a data frame so we can link it back to the original posts
heartgreplDF<-as.data.frame(heartgrepl)
# Make a new row so as to smush them together (the T/F designation and your data frame of posts)
emosub$ID7 <- 1:nrow(emosub)
heartgreplDF$ID7 <- 1:nrow(heartgreplDF)
emosub <- merge(emosub,heartgreplDF,by="ID7")
redheart <- emosub[emosub$heartgrepl == "TRUE", ]
Let’s do the same with FACEWITHTEARSOFJOY
lolfacegrepl <- grepl(paste(" FACEWITHTEARSOFJOY "), emosub$text)
lolfacegreplDF<-as.data.frame(lolfacegrepl)
emosub$ID7 <- 1:nrow(emosub)
lolfacegreplDF$ID7 <- 1:nrow(lolfacegreplDF)
emosub <- merge(emosub,lolfacegreplDF,by="ID7")
lolface <- emosub[emosub$lolfacegrepl == "TRUE", ]
Now FACEWITHHEARTSHAPEDEYES
hearteyesgrepl <- grepl(paste(" SMILINGFACEWITHHEARTSHAPEDEYES "), emosub$text)
hearteyesgreplDF<-as.data.frame(hearteyesgrepl)
emosub$ID7 <- 1:nrow(emosub)
hearteyesgreplDF$ID7 <- 1:nrow(hearteyesgreplDF)
emosub <- merge(emosub,hearteyesgreplDF,by="ID7")
hearteyes <- emosub[emosub$hearteyesgrepl == "TRUE", ]
Sparkles!!!!
sparklesgrepl <- grepl(paste(" SPARKLES "), emosub$text)
sparklesgreplDF<-as.data.frame(sparklesgrepl)
emosub$ID7 <- 1:nrow(emosub)
sparklesgreplDF$ID7 <- 1:nrow(sparklesgreplDF)
emosub <- merge(emosub,sparklesgreplDF,by="ID7")
sparkles <- emosub[emosub$sparklesgrepl == "TRUE", ]
Face savouring delicious food!!!!!!!!!!!!!!!
savourfoodgrepl <- grepl(paste(" FACESAVOURINGDELICIOUSFOOD "), emosub$text)
savourfoodgreplDF<-as.data.frame(savourfoodgrepl)
emosub$ID7 <- 1:nrow(emosub)
savourfoodgreplDF$ID7 <- 1:nrow(savourfoodgreplDF)
emosub <- merge(emosub,savourfoodgreplDF,by="ID7")
savourfood <- emosub[emosub$savourfoodgrepl == "TRUE", ]
Let’s have a little fun and try to map where some of these emojis occur. I am using the emoGG package.
# devtools::install_github("dill/emoGG")
library(emoGG)
# Find the emojis we want to use for a graph (might take a few times to get your search query right)
emoji_search("heart face")
## emoji code keyword
## 1 grinning 1f600 face
## 5 grin 1f601 face
## 9 joy 1f602 face
## 15 smiley 1f603 face
## 19 smile 1f604 face
## 26 sweat_smile 1f605 face
## 35 laughing 1f606 face
## 37 innocent 1f607 face
## 46 wink 1f609 face
## 50 blush 1f60a face
## 58 relaxed 263a face
## 66 yum 1f60b face
## 69 relieved 1f60c face
## 74 heart_eyes 1f60d face
## 81 sunglasses 1f60e face
## 86 smirk 1f60f face
## 94 expressionless 1f611 face
## 102 sweat 1f613 face
## 107 pensive 1f614 face
## 112 confused 1f615 face
## 117 confounded 1f616 face
## 124 kissing 1f617 face
## 128 kissing_heart 1f618 face
## 134 kissing_smiling_eyes 1f619 face
## 138 kissing_closed_eyes 1f61a face
## 144 stuck_out_tongue 1f61b face
## 150 stuck_out_tongue_winking_eye 1f61c face
## 156 stuck_out_tongue_closed_eyes 1f61d face
## 161 disappointed 1f61e face
## 165 worried 1f61f face
## 169 angry 1f620 face
## 176 cry 1f622 face
## 181 persevere 1f623 face
## 186 triumph 1f624 face
## 191 disappointed_relieved 1f625 face
## 195 frowning 1f626 face
## 198 anguished 1f627 face
## 201 fearful 1f628 face
## 207 weary 1f629 face
## 213 sleepy 1f62a face
## 221 grimacing 1f62c face
## 224 sob 1f62d face
## 230 open_mouth 1f62e face
## 234 hushed 1f62f face
## 237 cold_sweat 1f630 face
## 239 scream 1f631 face
## 243 astonished 1f632 face
## 247 flushed 1f633 face
## 251 sleeping 1f634 face
## 259 no_mouth 1f636 face
## 261 mask 1f637 face
## 513 ear 1f442 face
## 514 ear 1f442 hear
## 515 ear 1f442 sound
## 516 ear 1f442 listen
## 526 kiss 1f48b face
## 1467 heart 2764 love
## 1468 heart 2764 like
## 1469 heart 2764 valentines
## 1502 cupid 1f498 heart
## 1667 art 1f3a8 design
## 1668 art 1f3a8 paint
## 1669 art 1f3a8 draw
## 2733 a 1f170 red-square
## 2734 a 1f170 alphabet
## 2735 a 1f170 letter
## 3358 bowtie face
## 3366 neckbeard face
# We find the code "1f60d" for the smiling face with heart shaped eyes. Let's try to graph this on a map!
# Using the ggmap package here
map <- get_map(location = 'Capp St. and 20th, San Francisco,
California', zoom = 15)
lat <- hearteyes$latitude
lon <- hearteyes$longitude
# Without the background
# mapPointshearteyes <- ggplot(hearteyes, aes(lon,lat)) + geom_emoji(emoji="1f60d")
mapPointshearteyes <- ggmap(map) + geom_emoji(aes(x = lon, y = lat),
data=hearteyes, emoji="1f60d")
mapPointshearteyes
Now let’s try multiple emojis at once (help from here).
# Can we do this with plain old layering?
# emoji_search("sparkles")
# sparkles = "2728"
# red heart = "2764"
mapPointsmulti <- ggmap(map) + geom_emoji(aes(x = lon, y = lat),
data=hearteyes, emoji="1f60d") +
geom_emoji(aes(x=sparkles$longitude, y=sparkles$latitude),
data=sparkles, emoji="2728") +
geom_emoji(aes(x=redheart$longitude, y=redheart$latitude),
data=redheart, emoji="2764")
mapPointsmulti
How about emojis that are associated with food?
# apparently called the 'yum' emoji: 1f60b
mapPointssavourface <- ggmap(map) + geom_emoji(aes(x=savourfood$longitude,y=savourfood$latitude),
data=savourfood, emoji="1f60b")
mapPointssavourface