Thesis Time!

Introduction

The following is all of the code used to run analyses used in my dissertation.

packs = c("twitteR","RCurl","RJSONIO","stringr","ggplot2","devtools","DataCombine","ggmap","topicmodels","slam","Rmpfr","tm","stringr","wordcloud","plyr","tidytext","dplyr","tidyr","xlsx","ggrepel","lubridate","purrr","broom", "wordcloud","emoGG","ldatuning")

lapply(packs, library, character.only=T)

Getting the Data

To collect data, I used the twitteR package. I’m interested in the Mission District neighborhood in San Francisco, California. I obtain a set of coordinates using Google maps and plug that into the ‘geocode’ parameter and then set a radius of 1 kilometer. I know from experience that I only get around 1,000 - 2,000 posts per time I do this, so I set the number of tweets (n) I would like to get from Twitter at ‘7,000’.

# key = "YOUR KEY HERE"
# secret = "YOUR SECRET HERE"

# tok = "YOUR TOK HERE"
# tok_sec = "YOUR TOK_SEC HERE"

twitter_oauth <- setup_twitter_oauth(key, secret, tok, tok_sec)

# To collect tweets
geo <- searchTwitter('',n=7000, geocode='37.76,-122.42,1km',
                     retryOnRateLimit=1)

Now I want to identify emojis and separate just those posts that came from Instagram. I then save those to a CSV file and compile it by copy-pasting by hand to get a corpus.

# Now you have a list of tweets. Lists are very difficult to deal with in R, so you convert this into a data frame:
geoDF<-twListToDF(geo)

Chances are there will be emojis in your Twitter data. You can ‘transform’ these emojis into prose using this code as well as a CSV file I’ve put together of what all of the emojis look like in R. (The idea for this comes from Jessica Peterka-Bonetta’s work – she has a list of emojis as well, but it does not include the newest batch of emojis, Unicode Version 9.0, nor the different skin color options for human-based emojis). If you use this emoji list for your own research, please make sure to acknowledge both myself and Jessica.

Load in the CSV file. You want to make sure it is located in the correct working directory so R can find it when you tell it to read it in.

emoticons <- read.csv("Decoded Emojis Col Sep.csv", header = T)

To transform the emojis, you first need to transform the tweet data into ASCII:

geoDF$text <- iconv(geoDF$text, from = "latin1", to = "ascii", 
                    sub = "byte")

To ‘count’ the emojis you do a find and replace using the CSV file of ‘Decoded Emojis’ as a reference. Here I am using the DataCombine package. What this does is identifies emojis in the tweets and then replaces them with a prose version. I used whatever description pops up when hovering one’s cursor over an emoji on an Apple emoji keyboard. If not completely the same as other platforms, it provides enough information to find the emoji in question if you are not sure which one was used in the post.

data <- FindReplace(data = geoDF, Var = "text", 
                      replaceData = emoticons,
                      from = "R_Encoding", to = "Name", 
                      exact = FALSE)

Now might be a good time to save this file, perhaps in CSV format with the date of when the data was collected:

write.csv(data,file=paste("ALL",Sys.Date(),".csv"))

Subset to just those posts that come from Instagram Now you have a data frame which you can manipulate in various ways. For my research, I’m just interested in posts that have occured on Instagram. (Why not just access them via Instagram’s API you ask? Long story short: they are very very conservative about providing access for academic research). I’ve found a work-around which is filtering mined tweets by those that have Instagram as a source:

data <- data[data$statusSource == "<a href=\"http://instagram.com\" rel=\"nofollow\">Instagram</a>", ]

#Save this file
write.csv(data,file=paste("INSTA",Sys.Date(),".csv"))

Having done this for eight months, we have a nice corpus! Let’s load that in.

Analyzing the Data

Preparing data for Topic Modeling and Sentiment Analysis

The data need to be processed a bit more in order to analyze them. Let’s try from the start with Silge and Robinson.

# Get rid of stuff particular to the data (here encodings of links and such)
# Most of these are characters I don't have encodings for (other scripts, etc.)

tweets$text = gsub("Just posted a photo","", tweets$text)
tweets$text = gsub( "<.*?>", "", tweets$text)

# Get rid of super frequent spam posters
tweets <- tweets[! tweets$screenName %in% c("4AMSOUNDS","BruciusTattoo","LionsHeartSF","hermesalchemist","Mrsourmash","AaronTheEra","AmnesiaBar","audreymose2","audreymosez","Bernalcutlery","blncdbrkfst","BrunosSF","chiddythekidd","ChurchChills","deeXiepoo","fabricoutletsf","gever","miramirasf","papalote415","HappyHoundsMasg","faern_me"),]

# If you want to combine colors, run this at least 3 times over to make sure it 'sticks'
# coltweets <- tweets
# coltweets$text <- gsub(" COLONE ", "COLONE", coltweets$text)
# coltweets$text <- gsub(" COLTWO ", "COLTWO", coltweets$text)
# coltweets$text <- gsub(" COLTHREE ", "COLTHREE", coltweets$text)
# coltweets$text <- gsub(" COLFOUR ", "COLFOUR", coltweets$text)
# coltweets$text <- gsub(" COLFIVE ", "COLFIVE", coltweets$text)

# Let's just use this for now. Maybe good to keep these things together
# tweets <- coltweets

# This makes a larger list of stop words combining those from the tm package and tidy text -- even though the tm package stop word list is pretty small anyway, just doing this just in case
data(stop_words)
mystopwords <- c(stopwords('english'),stop_words$word, stopwords('spanish'))

# Now for Silge and Robinson's code. What this is doing is getting rid of 
# URLs, re-tweets (RT) and ampersands. This also gets rid of stop words 
# without having to get rid of hashtags and @ signs by using 
# str_detect and filter! 
reg <- "([^A-Za-z_\\d#@']|'(?![A-Za-z_\\d#@]))"
tidy_tweets <- tweets %>% 
  filter(!str_detect(text, "^RT")) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&amp;|&lt;|&gt;|RT|https", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% mystopwords,
         str_detect(word, "[a-z]"))

Frequency analysis and Sentiment analysis

freq <- tidy_tweets %>% 
  group_by(latitude,longitude) %>% 
  count(word, sort = TRUE) %>% 
  left_join(tidy_tweets %>% 
              group_by(latitude,longitude) %>% 
              summarise(total = n())) %>%
  mutate(freq = n/total)

The n here is the total number of times this term has shown up, and the total is how many terms there are present in a particular coordinate. Now we have a representation of terms, their frequency and their position. Now I might want to plot this somehow… one way would be to try to plot the most frequent terms (n > 50) (Some help on how to do this was taken from here and here)

freq2 <- subset(freq, n > 50) 

map <- get_map(location = 'Valencia St. and 20th, San Francisco,
               California', zoom = 15)

freq2$longitude<-as.numeric(freq2$longitude)
freq2$latitude<-as.numeric(freq2$latitude)

mapPoints <- ggmap(map) + geom_jitter(alpha = 0.1, size = 2.5, width = 0.25, height = 0.25) +
  geom_label_repel(data = freq2, aes(x = longitude, y = latitude, label = word),size = 3)

Let’s zoom into that main central area to see what’s going on!

map2 <- get_map(location = 'Valencia St. and 19th, San Francisco,
               California', zoom = 16)
mapPoints2 <- ggmap(map2) + geom_jitter(alpha = 0.1, size = 2.5, width = 0.25, height = 0.25) +
  geom_label_repel(data = freq2, aes(x = longitude, y = latitude, label = word),size = 3)

What about 24th?

# Have to go a bit bigger to get more terms
freq3 <- subset(freq, n > 15) 

map3 <- get_map(location = 'Folsom St. and 24th, San Francisco,
               California', zoom = 16)
mapPoints3 <- ggmap(map3) + geom_jitter(alpha = 0.1, size = 2.5, width = 0.25, height = 0.25) +
  geom_label_repel(data = freq3, aes(x = longitude, y = latitude, label = word),size = 3)

Sentiment analysis

# We can also look at counts of negative and positive words
bingsenti <- sentiments %>%
  filter(lexicon =="bing")

bing_word_counts <- tidy_tweets %>%
  inner_join(bingsenti) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()
# If you wanted to look at these
# bing_word_counts

# Now we can graph these
bing_word_counts %>%
  filter(n > 25) %>%
  mutate(n = ifelse(sentiment == "negative", -n, n)) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = sentiment)) +
  geom_bar(alpha = 0.8, stat = "identity") +
  labs(y = "Contribution to sentiment",
       x = NULL) +
  coord_flip()

Word Cloud

In order to do a word cloud we need a document term matrix. This will also be used for topic modeling later.

# First have to make a document term matrix, which involves a few steps
tidy_tweets %>%
  count(document, word, sort=TRUE)

tweet_words <- tidy_tweets %>%  
  count(document, word) %>%
  ungroup()

total_words <- tweet_words %>% 
  group_by(document) %>% 
  summarize(total = sum(n))

post_words <- left_join(tweet_words, total_words)

dtm <- post_words %>% 
  cast_dtm(document, word, n)

freqw = data.frame(sort(colSums(as.matrix(dtm)), decreasing=TRUE))
wordcloud(rownames(freqw), freqw[,1], max.words=100, 
          colors=brewer.pal(1, "Dark2"))

Emojis

What if I want to look at just those posts that have emojis in them? Or specific emojis in general?

# Identify emojis
emoticons <- read.csv("Decoded Emojis Col Sep.csv", header = T)
# This also takes time so I will not run it, but this is how you go through and identify emojis in your corpus and 'tag' whether or not they are there!
# emogrepl <- grepl(paste(emoticons$Name, collapse = "|"), tweets$text)
# save(emogrepl,file=paste("emo.Rda"))
# Emo here: https://www.dropbox.com/s/fqlvqfnx0n8npf2/emo.Rda?dl=0
load("emo.Rda")
emogreplDF<-as.data.frame(emogrepl)
tweets$id <- 1:nrow(tweets)
emogreplDF$id <- 1:nrow(emogreplDF)
tweets <- merge(tweets,emogreplDF,by="id")
emosub <- tweets[tweets$emogrepl == "TRUE", ]

# to get JUST emojis, no text

tidy_emos <- emosub %>% 
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&amp;|&lt;|&gt;|RT|https", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% mystopwords,
         str_detect(word, "[a-z]"))

# Have to do this so they will recognize each other
tidy_emoticons <- emoticons %>% 
  mutate(Name = str_replace_all(Name, "https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&amp;|&lt;|&gt;|RT|https", "")) %>%
  unnest_tokens(word, Name, token = "regex", pattern = reg) %>%
  filter(!word %in% mystopwords,
         str_detect(word, "[a-z]"))

# I think a semi_join will work: "Return all rows from X where there are matching rows in Y, just keeping columns from X" (http://stat545.com/bit001_dplyr-cheatsheet.html)

emoonly <- semi_join(tidy_emos, tidy_emoticons, by="word")

freqe <- emoonly %>% 
  group_by(latitude,longitude) %>% 
  count(word, sort = TRUE) %>% 
  left_join(emoonly %>% 
              group_by(latitude,longitude) %>% 
              summarise(total = n())) %>%
  mutate(freq = n/total)

# freqe

# Map it
freqe2 <- subset(freqe, n > 20) 

map <- get_map(location = 'Valencia St. and 20th, San Francisco,
               California', zoom = 15)

freqe2$longitude<-as.numeric(freqe2$longitude)
freqe2$latitude<-as.numeric(freqe2$latitude)

mapPointse <- ggmap(map) + geom_jitter(alpha = 0.1, size = 2.5, width = 0.25, height = 0.25) +
  geom_label_repel(data = freqe2, aes(x = longitude, y = latitude, label = word),size = 3)

mapPointse

Mapping Emojis

To visualize emojis in our corpus, we use the emoGG package. (See also here!) I will do a map of the most common emoji (SPARKLES) and ones related to food. This might be better on a subset so we can try that too…

# Let's do coffee, the egg pan thing, face savouring delicious food + ice cream?
# ice cream 1f368
# To find the codes for each emoji:
# emoji_search("ice_cream")
# First create a subset of just those that have ICE CREAM emoji present
icecreamg <- grepl(paste(" ICECREAM "), emosub$text)
icecreamgD<-as.data.frame(icecreamg)
emosub$ID7 <- 1:nrow(emosub)
icecreamgD$ID7 <- 1:nrow(icecreamgD)
emosub <- merge(emosub,icecreamgD,by="ID7")
icecream <- emosub[emosub$icecreamg == "TRUE", ]

# Same for 'Face Savouring Delicious Food'
# savourfood: 1f60b
savourfoodgrepl <- grepl(paste(" FACESAVOURINGDELICIOUSFOOD "), emosub$text)
savourfoodgreplDF<-as.data.frame(savourfoodgrepl)
emosub$ID7 <- 1:nrow(emosub)
savourfoodgreplDF$ID7 <- 1:nrow(savourfoodgreplDF)
emosub <- merge(emosub,savourfoodgreplDF,by="ID7")
savourfood <- emosub[emosub$savourfoodgrepl == "TRUE", ]

#coffee: 2615
hotbevg <- grepl(paste(" HOTBEVERAGE "), emosub$text)
hotbevgD<-as.data.frame(hotbevg)
emosub$id <- 1:nrow(emosub)
hotbevgD$id <- 1:nrow(hotbevgD)
emosub <- merge(emosub,hotbevgD,by="id")
coffee <- emosub[emosub$hotbevg == "TRUE", ] 

#knifeandfork: 1f374
mackg <- grepl(paste(" FORKANDKNIFE "), emosub$text)
mackgD<-as.data.frame(mackg)
emosub$id <- 1:nrow(emosub)
mackgD$id <- 1:nrow(mackgD)
emosub <- merge(emosub,mackgD,by="id")
mack <- emosub[emosub$mackg == "TRUE", ]

#cooking: # Frying pan egg - Food
# 1f373
cookg <- grepl(paste(" COOKING "), emosub$text)
cookgD<-as.data.frame(cookg)
emosub$id <- 1:nrow(emosub)
cookgD$id <- 1:nrow(cookgD)
emosub <- merge(emosub,cookgD,by="id")
cook <- emosub[emosub$cookg == "TRUE", ]


# Map this
foodmap <- ggmap(map) + geom_emoji(aes(x = longitude, y = latitude), 
                                     data=savourfood, emoji="1f60b") +
                              geom_emoji(aes(x=longitude, y=latitude),
                                     data=cook, emoji="1f373") +
                              geom_emoji(aes(x=longitude, y=latitude),
                                     data=coffee, emoji="2615") +
                              geom_emoji(aes(x=longitude, y=latitude),
                                     data=mack, emoji="1f374") +
                              geom_emoji(aes(x=longitude, y=latitude),
                                     data=icecream, emoji="1f368")

foodmap

# Artist palette
#1f3a8
arg <- grepl(paste(" ARTISTPALETTE "), emosub$text)
argD<-as.data.frame(arg)
emosub$id <- 1:nrow(emosub)
argD$id <- 1:nrow(argD)
emosub <- merge(emosub,argD,by="id")
art <- emosub[emosub$arg == "TRUE", ]

artmap <- ggmap(map) + geom_emoji(aes(x = longitude, y = latitude), 
                                     data=art, emoji="1f3a8")

artmap

sparklesgrepl <- grepl(paste(" SPARKLES "), emosub$text)
sparklesgreplDF<-as.data.frame(sparklesgrepl)
emosub$ID7 <- 1:nrow(emosub)
sparklesgreplDF$ID7 <- 1:nrow(sparklesgreplDF)
emosub <- merge(emosub,sparklesgreplDF,by="ID7")
sparkles <- emosub[emosub$sparklesgrepl == "TRUE", ]

sparkplug <- ggmap(map) + geom_emoji(aes(x = longitude, y = latitude), 
                                     data=sparkles, emoji="2728")

sparkplug

Topic Modeling

LDA Tuning

Before running a topic model, I am going to try the LDA tuning package to assess what might be a good number of topics.

# devtools::install_github("nikita-moor/ldatuning")
# install.packages("ldatuning")
library("ldatuning")
library("topicmodels")
# I will not run this at the moment because it takes forever!
# result <- FindTopicsNumber(
#   dtm,
#   topics = seq(from = 2, to = 15, by = 1),
#   metrics = c("Griffiths2004", "CaoJuan2009", "Arun2010", "Deveaud2014"),
#   method = "Gibbs",
#   control = list(seed = 77),
#   mc.cores = 2L,
#   verbose = TRUE
# )

When this finally finishes running, we will do the following to look at graphs of results to see ‘best’ topic number. I guess you want that range which is minimize at its lowest and maximize at its highest. So match those up.

# From here: https://www.dropbox.com/s/qplfwb0pazmk7c1/ldatuning.RData?dl=0
load("ldatuning.RData")

FindTopicsNumber_plot(result)

From this, it appears that the maximum and minimum peak points are about 22. I’ll use that as my number of topics.

# load("dtm.Rda")
# Set parameters for Gibbs sampling (parameters those used in
# Grun and Hornik 2011)
# burnin <- 4000
# iter <- 2000
# thin <- 500
# seed <-list(2003,5,63,100001,765)
# nstart <- 5
# best <- TRUE
# k <- 22
# This also takes a while to run, so will just load in results
# lda <-LDA(dtm,k, method="Gibbs", 
#              control=list(nstart=nstart, seed = seed, best=best, 
#                           burnin = burnin, iter = iter, thin=thin))
# 
# # Save this (so you don't have to keep running it all the time)
# save(lda,file=paste("LDA",k,".Rda"))

# Let's check out the results

# test_lda_td <- tidy(test_lda)
# From here: https://www.dropbox.com/s/4fp81smd3dbrpd6/LDA%2022%20.Rda?dl=0
load("LDA 22 .Rda")
# Make it tidy to visualize it, etc.
lda_td <- tidy(lda)

# To graph these results (too many for now, looks messy)
# lda_top_terms <- lda_td %>%
#   group_by(topic) %>%
#   top_n(10, beta) %>%
#   ungroup() %>%
#   arrange(topic, -beta)
# 
# top_terms <- lda_top_terms %>%
#   mutate(term = reorder(term, beta)) %>%
#   ggplot(aes(term, beta, fill = factor(topic))) +
#   geom_bar(stat = "identity", show.legend = FALSE) +
#   facet_wrap(~ topic, scales = "free") +
#   coord_flip()
# 
# top_terms

Pairing this back with original tweets

Pair back this information with the original tweets to see how topics are distribtued, learn more about what each topic entails, etc.

# Also to link things back
# Look at results
# Maybe a little easier to see than tidy graph
lda.topics <- as.matrix(topics(lda))
terms(lda,10)

##       Topic 1       Topic 2 Topic 3     Topic 4        Topic 5   
##  [1,] "happy"       "home"  "day"       "amazing"      "bar"     
##  [2,] "birthday"    "live"  "beautiful" "chapel"       "time"    
##  [3,] "party"       "black" "favorite"  "ready"        "taqueria"
##  [4,] "fire"        "rose"  "check"     "friend"       "friday"  
##  [5,] "friends"     "music" "days"      "special"      "taco"    
##  [6,] "weekend"     "sweet" "putt"      "water"        "sushi"   
##  [7,] "holiday"     "real"  "urban"     "awesome"      "baby"    
##  [8,] "family"      "days"  "heath"     "flour"        "theater" 
##  [9,] "paypopper"   "sf"    "lovely"    "@thechapelsf" "lunch"   
## [10,] "celebrating" "basil" "ceramics"  "shot"         "ramen"   
##       Topic 6              Topic 7         Topic 8    Topic 9    
##  [1,] "facewithtearsofjoy" "love"          "night"    "francisco"
##  [2,] "armory"             "heavyblackhea" "tonight"  "san"      
##  [3,] "alamo"              "city"          "saturday" "mission"  
##  [4,] "drafthouse"         "building"      "tomorrow" "district" 
##  [5,] "club"               "guys"          "bay"      "#igerssf" 
##  [6,] "life"               "people"        "monday"   "yeah"     
##  [7,] "video"              "time"          "playing"  "cookie"   
##  [8,] "fun"                "techo"         "books"    "streets"  
##  [9,] "kink"               "trip"          "free"     "dr"       
## [10,] "posted"             "hard"          "amnesia"  "beer"     
##       Topic 10                     Topic 11       Topic 12 
##  [1,] "tartine"                    "food"         "dinner" 
##  [2,] "manufactory"                "week"         "photo"  
##  [3,] "bakery"                     "dog"          "bear"   
##  [4,] "stop"                       "cheese"       "lazy"   
##  [5,] "@sfmanufactory"             "school"       "foreign"
##  [6,] "cream"                      "wineglass"    "cinema" 
##  [7,] "bread"                      "trick"        "ladies" 
##  [8,] "ice"                        "perfect"      "#repost"
##  [9,] "facesavouringdeliciousfood" "tour"         "miss"   
## [10,] "pizza"                      "forkandknife" "painted"
##       Topic 13                          Topic 14    Topic 15          
##  [1,] "colone"                          "san"       "#sanfrancisco"   
##  [2,] "sparkles"                        "mission"   "#sf"             
##  [3,] "coltwo"                          "francisco" "#mission"        
##  [4,] "colthree"                        "district"  "#missiondistrict"
##  [5,] "twoheas"                         "#igerssf"  "#california"     
##  [6,] "okhandsign"                      "fran"      "#dolorespark"    
##  [7,] "personraisinghandsincelebration" "reading"   "#bayarea"        
##  [8,] "personwithfoldedhands"           "#sfo"      "#themission"     
##  [9,] "colfour"                         "break"     "#usa"            
## [10,] "signofthehorns"                  "bright"    "#sanfran"        
##       Topic 16                       Topic 17           Topic 18  
##  [1,] "smilingfacewithheashapedeyes" "park"             "mission" 
##  [2,] "studios"                      "dolores"          "street"  
##  [3,] "finally"                      "sf"               "art"     
##  [4,] "sunday"                       "blacksunwithrays" "valencia"
##  [5,] "yesterday"                    "#dolorespark"     "24th"    
##  [6,] "pacific"                      "palmtree"         "station" 
##  [7,] "studio"                       "afternoon"        "st"      
##  [8,] "southern"                     "bridgeatnight"    "16th"    
##  [9,] "pretty"                       "sunny"            "cha"     
## [10,] "fur"                          "summer"           "gray"    
##       Topic 19        Topic 20     Topic 21    Topic 22     
##  [1,] "alley"         "kitchen"    "time"      "coffee"     
##  [2,] "clarion"       "#foodporn"  "elbo"      "morning"    
##  [3,] "#sf"           "restaurant" "chocolate" "cafe"       
##  [4,] "#streetart"    "super"      "fun"       "hotbeverage"
##  [5,] "#art"          "story"      "mission"   "barrel"     
##  [6,] "#graffiti"     "brunch"     "christmas" "#coffee"    
##  [7,] "#clarionalley" "chicken"    "theatre"   "tea"        
##  [8,] "#mural"        "#food"      "house"     "shop"       
##  [9,] "mural"         "thai"       "hot"       "ritual"     
## [10,] "link"          "craftsman"  "church"    "acrylic"

# Check at top 50 terms in each topic
# lda.terms <- as.matrix(terms(lda,15))
# Save as CSV file to look at a bit closer
# write.csv(lda.terms,file=paste("TIDY_LDA",k,"TopicstoTerms.csv"))

# Actual probabilities
topicProbabilities <- as.data.frame(lda@gamma)
# write.csv(topicProbabilities,
#          file=paste("TIDYLDA",k,"TopicProbabilities.csv"))

#Write out the topics to a data frame so you can work with them
test <- as.data.frame(lda.topics)
# We won't label these topics bc too many, difficult to label. If you wanted to label, however, this is how you would do it. 
# a<-c('Evaluation', 'Food','Performance Promos', 'Leisure', 'Places',
# 'Nightlife', 'Activism/Campaigns','Art','Outdoors','Service/Product Promos')
# b<-c(1,2,3,4,5,6,7,8,9,10)
# namesdf<-data.frame("Name"=a,"Number"=b)
# test$V1<-as.factor(test$V1)
# newtopics <- FindReplace(data = test, Var = "V1", replaceData = namesdf,
#                        from = "Number", to = "Name", exact = TRUE)

#Merge topics with tweet corpus
tweets$id <- 1:nrow(tweets)
test$id <- 1:nrow(test)
tweets <- merge(tweets,test,by="id")
# Save this
# save(tweets,file=paste("tweets",Sys.Date(),".Rda"))
# load("tweets 2017-03-22 .Rda")

#Merge topic probabilities with tweet corpus
topicProbabilities$id <- 1:nrow(topicProbabilities)
tweets <- merge(tweets, topicProbabilities,by="id")

Visualizing Topic Model Results

You can now map your posts and see where assigned topics are happening!

tweets$longitude<-as.numeric(tweets$longitude)
tweets$latitude <- as.numeric(tweets$latitude)
tweets$V1.x <- factor(tweets$V1.x)
Topics<-tweets$V1.x
mapPointstopics <- ggmap(map) + geom_point(aes(x = longitude, y = latitude, 
                                         color=Topics), 
                                     data=tweets, alpha=0.5, size = 3)

mapPointstopics

What a mess!

How about over time?

Visualizing the data

We can also look at WHEN the posts were generated. We can make a graph of post frequency over time.Graphs constructed with help from here, here, here, here, here, here, here and here.

tweets$created2 <- as.POSIXct(tweets$created, format="%m/%d/%Y %H:%M")
tweets$created3<-format(tweets$created2,'%H:%M:%S')
d3 <- as.data.frame(table(tweets$created3))
d3 <- d3[order(d3$Freq, decreasing=T), ]
names(d3) <- c("created3","freq3")
tweets <- merge(tweets,d3,by="created3")
tweets$created3 <- as.POSIXct(tweets$created3, format="%H:%M:%S")
minutes <- 60

Topics<-tweets$V1.x
Time <- tweets$created3

ggplot(tweets, aes(Time, color = Topics)) + 
  geom_freqpoly(binwidth=60*minutes)

# For a more general trend
ggplot(tweets, aes(Time)) + 
  geom_freqpoly(binwidth=60*minutes)

Matching tweets with LL data

What we are trying to do is to match up locations in the physical LL with the digital LL and then find the most common topic associated with a physical location. Because we do not have exact matches, we will try the fuzzyjoin package.

library(fuzzyjoin)
library(dplyr)
pairsdf <- ll %>%
  geo_inner_join(tweets, unit='km',distance_col="distance") %>%
  filter(distance <= 0.018288)

# What does this look like on a map?

# mapPointsall <- ggmap(map) + geom_point(aes(x = longitude.x, y = latitude.x), 
#                                     data=pairsdf, alpha=0.5)
# mapPointsall

Now I have a data frame with a row of each time a post has occurred in a 30 foot vicinity of an LL object. What I would like to do is figure out the most common topic that is associated with a particular sign. We’ll use the idea of ‘mode’ here with our topics and the group_by() function from dplyr as suggested here.

As R does not have a built in function for mode, we build one. Code for this available here.

# To get the mode
getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

# Tell R your topic categories are a number so it can deal with them
pairsdf$V1.x<- as.numeric(pairsdf$V1.x)

# Now calculate things about the topics per sign
topicmode <- pairsdf%>%
group_by(SIGN_ID)%>% 
summarise(Mode = getmode(V1.x))

Let’s now combine this with our other data, but just include those instances that have a topic assigned (not all signs got a corresponding tweet)

topicsigns <- inner_join(ll, topicmode, by = "SIGN_ID")

This is kind of messy, so let’s subset the data frame to just have the things we are interested in. Help from here.

topicsigns <- topicsigns[,c("SIGN_ID","latitude","longitude","LOCATION","LANGUAGE","COMMUNICATIVE_ROLE","MATERIALITY","CONTEXT_FRAME","YELP","CLOSED","Mode")]    # get all rows, only relevant columns

# Rename columns so they make more sense (help from here: http://stackoverflow.com/questions/21502465/replacement-for-rename-in-dplyr/26146202#26146202)
topicsigns <- rename(topicsigns, Topic = Mode)

GAMs!

Now onto statistics. We want to see what has the most influence on language displayed in a sign. Let’s use a generalized additive model.

library(mgcv)
# Let's visualize our LL data
# We want to change the order on the plot so it's easier to look at (help from http://stackoverflow.com/questions/12774210/how-do-you-specifically-order-ggplot2-x-axis-instead-of-alphabetical-order)
ll$LANGUAGE <- as.character(ll$LANGUAGE)
Language <- factor(ll$LANGUAGE, levels=c("English", "Eng_Span",'Equal','Spanish', 'Span_Eng',"Other (Chinese)","Other (Thai)","Other (Tagalog)"))

# Different colors help from http://stackoverflow.com/questions/19778612/change-color-for-two-geom-point-in-ggplot2

# Colorblind palette (help from http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#a-colorblind-friendly-palette)
# cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
# mapPoints <- ggmap(map) + geom_point(aes(x = lon, y = lat,color=Language),data=newdata, alpha = 0.7, size=2) + scale_colour_manual(values=cbPalette)

mapll <- get_map(location = 'Van Ness and 22nd, San Francisco,
               California', zoom = 15)


ll$longitude <- as.numeric(ll$longitude)
ll$latitude <- as.numeric(ll$latitude)

Longitude <- ll$longitude
Latitude <- ll$latitude

mapPointsll <- ggmap(mapll) + geom_point(aes(x = Longitude, y = Latitude,color=Language),data=ll, size=1.5) + scale_colour_manual(values = c("Spanish" = "blue", "English" = "magenta", "Eng_Span" = "red", "Span_Eng" = "#339900", "Equal" = "orange", "Other (Chinese)"="purple","Other (Thai)" ="#FFCC00","Other (Tagalog)" = "grey" ))

mapPointsll

Model Selection

Generalized Linear Models (Logistic Regression, Multinomial Logistic Regression)
Pros: Enable a categorical output
Cons: Difficult to capture nonlinear patterns, involves transformation (logit)
Cons: Difficult to include coordinates
Generalized Additive Model
Pros: Relationship between IV and DV not assumed to be linear
Pros: Can deal with coordinates with a smooth! Allows the trend of DV to be summarized as a function of more than one IV (latitude and longitude)
Pros: Can deal with my weird time distribution with a smooth as well!

Results

Let’s explore the multinom and see what it can tell us about all of these things

# Plotting the data (help on how to manipulate this graph from here: http://r.789695.n4.nabble.com/Ordering-of-stack-in-ggplot-package-ggplot2-td3917159.html)

# Overall counts

dat <- data.frame(table(ll$LOCATION,ll$LANGUAGE))
dat$Var1 <- factor(dat$Var1, levels = c("Mission", "24th", "Valencia","18th"))

dat$Var2 <- factor(dat$Var2, levels = c("English", "Eng_Span", "Equal","Span_Eng","Spanish","Other (Chinese)","Other (Thai)","Other (Tagalog)"))

names(dat) <- c("Location","Language","Count")
# levels(dat$Language)
ggplot(data=dat, aes(x=Location, y=Count, fill=Language)) + geom_bar(stat="identity")

# Now percentages
please=prop.table(table(ll$LOCATION, ll$LANGUAGE))
please2 <- data.frame(please)
please2$Var1 <- factor(please2$Var1, levels = c("Mission", "24th", "Valencia","18th"))

please2$Var2 <- factor(please2$Var2, levels = c("English", "Eng_Span", "Equal","Span_Eng","Spanish","Other (Chinese)","Other (Thai)","Other (Tagalog)"))

names(please2) <- c("Location","Language", "Frequency")
# Help from http://stackoverflow.com/questions/9563368/create-stacked-percent-barplot-in-r
library(scales)
ggplot(please2,aes(x = Location, y = Frequency,fill = Language)) + 
    geom_bar(position = "fill",stat = "identity") + 
    scale_y_continuous(labels = percent_format())

Multinomial Logistic Regression

The results of this are so ugly – the p value also has to computed separately. But here is how it is done.

library(nnet)
ll$LANGUAGE <- as.factor(ll$LANGUAGE)
multi <- multinom(LANGUAGE ~ LOCATION, data=ll)

## # weights:  40 (28 variable)
## initial  value 2145.983671 
## iter  10 value 1081.616009
## iter  20 value 1027.458362
## iter  30 value 1025.668541
## iter  40 value 1025.609658
## iter  50 value 1025.585372
## final  value 1025.584871 
## converged

summary(multi)

## Call:
## multinom(formula = LANGUAGE ~ LOCATION, data = ll)
## 
## Coefficients:
##                   (Intercept) LOCATION24th LOCATIONMission
## English          3.713392e+00   -1.6606712      -1.1414267
## Equal            9.161193e-01   -1.6633643      -1.2345010
## Other (Chinese) -9.539999e+00   -2.5206831       7.8354395
## Other (Tagalog) -1.005856e+01   -2.9770588       6.9682681
## Other (Thai)    -9.266575e+00    6.3219050      -0.4741931
## Span_Eng        -1.264373e-04    0.3878855       0.3748708
## Spanish          1.945735e+00   -0.3576242      -0.3272004
##                 LOCATIONValencia
## English               -0.1162223
## Equal                -10.6594671
## Other (Chinese)       -2.5346037
## Other (Tagalog)       -2.6799698
## Other (Thai)          -3.8007603
## Span_Eng             -10.0641616
## Spanish               -1.7226796
## 
## Std. Errors:
##                 (Intercept) LOCATION24th LOCATIONMission LOCATIONValencia
## English           0.7156184    0.7559734       0.7490253        0.8768839
## Equal             0.8366090    0.9293296       0.8988160       65.2725028
## Other (Chinese)  83.3773119  126.7018959      83.3790835      225.3583589
## Other (Tagalog) 108.0554714  189.2203510     108.0603056      311.1603627
## Other (Thai)     72.7242951   72.7315331      77.8555077      351.5400629
## Span_Eng          0.9999461    1.0431848       1.0375929       76.6336199
## Spanish           0.7558726    0.7966966       0.7910811        1.0105941
## 
## Residual Deviance: 2051.17 
## AIC: 2107.17

# Get p vals and coefficients
z <- summary(multi)$coefficients/summary(multi)$standard.errors 
p <- (1 - pnorm(abs(z), 0, 1)) * 2 
p

##                  (Intercept) LOCATION24th LOCATIONMission LOCATIONValencia
## English         2.113511e-07   0.02803958       0.1275380       0.89455708
## Equal           2.734997e-01   0.07347739       0.1696048       0.87027660
## Other (Chinese) 9.089052e-01   0.98412746       0.9251301       0.99102639
## Other (Tagalog) 9.258344e-01   0.98744717       0.9485841       0.99312804
## Other (Thai)    8.986075e-01   0.93073423       0.9951404       0.99137365
## Span_Eng        9.998991e-01   0.71002077       0.7178835       0.89551562
## Spanish         1.004847e-02   0.65351549       0.6791585       0.08826521

# Get the odds and coefficients
# exp(coef(multi))

GAMs!

Let’s turn to GAMs to look at LL distributions.

gamELL= gam(I(LANGUAGE=="English")~s(latitude,longitude, k=60) + YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED,family=binomial, data=ll)

gamSLL= gam(I(LANGUAGE=="Spanish")~s(latitude,longitude, k=60) + YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED,family=binomial, data=ll)

gamESLL= gam(I(LANGUAGE=="Eng_Span")~s(latitude,longitude, k=60) + YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED,family=binomial, data=ll)

gamSELL= gam(I(LANGUAGE=="Span_Eng")~s(latitude,longitude, k=60) + YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED,family=binomial, data=ll)

gamEQLL= gam(I(LANGUAGE=="Equal")~s(latitude,longitude, k=60) + YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED,family=binomial, data=ll)

GAM diagnostics

concurvity(gamELL)

##               para s(latitude,longitude)
## worst    0.9993302             0.6214292
## observed 0.9993302             0.3477924
## estimate 0.9993302             0.3560434

concurvity(gamSLL)

##               para s(latitude,longitude)
## worst    0.9993302             0.6214292
## observed 0.9993302             0.2714755
## estimate 0.9993302             0.3560434

concurvity(gamESLL)

##               para s(latitude,longitude)
## worst    0.9993302             0.6214292
## observed 0.9993302             0.1284416
## estimate 0.9993302             0.3560434

concurvity(gamSELL)

##               para s(latitude,longitude)
## worst    0.9993302             0.6214292
## observed 0.9993302             0.3752692
## estimate 0.9993302             0.3560434

concurvity(gamEQLL)

##               para s(latitude,longitude)
## worst    0.9993302             0.6214292
## observed 0.9993302             0.2880259
## estimate 0.9993302             0.3560434

gam.check(gamELL)

## 
## Method: UBRE   Optimizer: outer newton
## full convergence after 2 iterations.
## Gradient range [2.937159e-09,2.937159e-09]
## (score 0.07031652 & scale 1).
## Hessian positive definite, eigenvalue range [0.002898905,0.002898905].
## Model rank =  106 / 106 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                           k'    edf k-index p-value
## s(latitude,longitude) 59.000 21.456   0.881       0

gam.check(gamSLL)

## 
## Method: UBRE   Optimizer: outer newton
## full convergence after 3 iterations.
## Gradient range [-3.796155e-10,-3.796155e-10]
## (score -0.1045091 & scale 1).
## Hessian positive definite, eigenvalue range [0.00330123,0.00330123].
## Model rank =  106 / 106 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                           k'    edf k-index p-value
## s(latitude,longitude) 59.000 35.459   0.887       0

gam.check(gamESLL)

## 
## Method: UBRE   Optimizer: outer newton
## full convergence after 2 iterations.
## Gradient range [-5.881589e-09,-5.881589e-09]
## (score -0.5780793 & scale 1).
## Hessian positive definite, eigenvalue range [0.001517681,0.001517681].
## Model rank =  106 / 106 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                           k'    edf k-index p-value
## s(latitude,longitude) 59.000 18.706   0.949     0.5

gam.check(gamSELL)

## 
## Method: UBRE   Optimizer: outer newton
## full convergence after 4 iterations.
## Gradient range [-6.671218e-08,-6.671218e-08]
## (score -0.5312013 & scale 1).
## Hessian positive definite, eigenvalue range [0.001353063,0.001353063].
## Model rank =  106 / 106 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                          k'   edf k-index p-value
## s(latitude,longitude) 59.00  6.55    0.92    0.18

gam.check(gamEQLL)

## 
## Method: UBRE   Optimizer: outer newton
## full convergence after 3 iterations.
## Gradient range [-8.346794e-10,-8.346794e-10]
## (score -0.7151074 & scale 1).
## Hessian positive definite, eigenvalue range [0.001447556,0.001447556].
## Model rank =  106 / 106 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                           k'    edf k-index p-value
## s(latitude,longitude) 59.000  7.110   0.917    0.24

GAM results

# English
summary(gamELL)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## I(LANGUAGE == "English") ~ s(latitude, longitude, k = 60) + YELP + 
##     COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED
## 
## Parametric coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                    44.0237 51896.9721   0.001
## YELP                                            0.7416     0.1410   5.260
## COMMUNICATIVE_ROLEEstablishment_Description    -1.0166     0.4430  -2.295
## COMMUNICATIVE_ROLEEstablishment_Name           -0.8597     0.4229  -2.033
## COMMUNICATIVE_ROLEGraffiti                      0.1732     0.7926   0.219
## COMMUNICATIVE_ROLEinformation                 -23.1788 24511.2101  -0.001
## COMMUNICATIVE_ROLEInformation                  -1.3411     0.4417  -3.036
## COMMUNICATIVE_ROLEInstructions                  0.3908     1.4229   0.275
## COMMUNICATIVE_ROLELeaflet                      -1.1885     1.4932  -0.796
## COMMUNICATIVE_ROLESlogan                       -0.9822     1.3350  -0.736
## COMMUNICATIVE_ROLEStreet_Signs                  0.5547     1.0081   0.550
## COMMUNICATIVE_ROLETrademark                    18.4011 48196.1446   0.000
## MATERIALITYHand_Written                         0.6733     0.8570   0.786
## MATERIALITYHome_Printed                         0.5714     0.8598   0.665
## MATERIALITYPermanent                            1.7312     1.0364   1.670
## MATERIALITYProfessionally_Printed               1.4481     0.8451   1.714
## CONTEXT_FRAMEAuto_Mechanic                     -0.6429 51897.1058   0.000
## CONTEXT_FRAMEBakery                           -24.9360 19246.8495  -0.001
## CONTEXT_FRAMEBar                              -22.3820 19246.8495  -0.001
## CONTEXT_FRAMEBeauty_Hair_Salon                -23.2895 19246.8494  -0.001
## CONTEXT_FRAMEBusiness                         -22.4638 19246.8494  -0.001
## CONTEXT_FRAMECafe                              -1.4916 20584.5040   0.000
## CONTEXT_FRAMEClothing                         -23.2780 19246.8495  -0.001
## CONTEXT_FRAMECommentary                       -22.1075 19246.8495  -0.001
## CONTEXT_FRAMEExternal                         -22.5786 19246.8494  -0.001
## CONTEXT_FRAMEFlier                            -45.8507 39139.1784  -0.001
## CONTEXT_FRAMEGallery_Museum                   -24.1304 19246.8495  -0.001
## CONTEXT_FRAMEGrocery                           -2.7168 29903.7031   0.000
## CONTEXT_FRAMEGrocery_Liquor_Store             -23.5920 19246.8494  -0.001
## CONTEXT_FRAMEGym_Fitness_Studio                 0.0302 28792.2798   0.000
## CONTEXT_FRAMEHardware                          -1.9022 29882.6610   0.000
## CONTEXT_FRAMEHotel                            -22.6879 19246.8495  -0.001
## CONTEXT_FRAMEInstitution                      -23.0377 19246.8494  -0.001
## CONTEXT_FRAMEJewelry_Store                    -22.3474 19246.8495  -0.001
## CONTEXT_FRAMELandromat                        -22.3911 19246.8495  -0.001
## CONTEXT_FRAMEMenu                              -5.0706 51897.1058   0.000
## CONTEXT_FRAMEMovie_Theater                    -21.4862 19246.8495  -0.001
## CONTEXT_FRAMENightclub                        -22.0964 19246.8495  -0.001
## CONTEXT_FRAMENotary_Financial_Services        -23.2806 19246.8494  -0.001
## CONTEXT_FRAMEResidential                      -45.3584 32847.9910  -0.001
## CONTEXT_FRAMERestaurant                       -23.8055 19246.8494  -0.001
## CONTEXT_FRAMEShop                             -22.5694 19246.8494  -0.001
## CONTEXT_FRAMESpecialty_Foods                  -22.0218 19246.8495  -0.001
## CONTEXT_FRAMESupermarket                      -23.3643 19246.8494  -0.001
## CONTEXT_FRAMETravel_Agency                    -47.3202 28896.6233  -0.002
## CLOSEDFALSE                                   -21.1499 48196.0008   0.000
## CLOSEDTRUE                                    -21.8697 48196.0008   0.000
##                                             Pr(>|z|)    
## (Intercept)                                  0.99932    
## YELP                                        1.44e-07 ***
## COMMUNICATIVE_ROLEEstablishment_Description  0.02174 *  
## COMMUNICATIVE_ROLEEstablishment_Name         0.04206 *  
## COMMUNICATIVE_ROLEGraffiti                   0.82698    
## COMMUNICATIVE_ROLEinformation                0.99925    
## COMMUNICATIVE_ROLEInformation                0.00239 ** 
## COMMUNICATIVE_ROLEInstructions               0.78358    
## COMMUNICATIVE_ROLELeaflet                    0.42607    
## COMMUNICATIVE_ROLESlogan                     0.46189    
## COMMUNICATIVE_ROLEStreet_Signs               0.58216    
## COMMUNICATIVE_ROLETrademark                  0.99970    
## MATERIALITYHand_Written                      0.43208    
## MATERIALITYHome_Printed                      0.50631    
## MATERIALITYPermanent                         0.09483 .  
## MATERIALITYProfessionally_Printed            0.08661 .  
## CONTEXT_FRAMEAuto_Mechanic                   0.99999    
## CONTEXT_FRAMEBakery                          0.99897    
## CONTEXT_FRAMEBar                             0.99907    
## CONTEXT_FRAMEBeauty_Hair_Salon               0.99903    
## CONTEXT_FRAMEBusiness                        0.99907    
## CONTEXT_FRAMECafe                            0.99994    
## CONTEXT_FRAMEClothing                        0.99904    
## CONTEXT_FRAMECommentary                      0.99908    
## CONTEXT_FRAMEExternal                        0.99906    
## CONTEXT_FRAMEFlier                           0.99907    
## CONTEXT_FRAMEGallery_Museum                  0.99900    
## CONTEXT_FRAMEGrocery                         0.99993    
## CONTEXT_FRAMEGrocery_Liquor_Store            0.99902    
## CONTEXT_FRAMEGym_Fitness_Studio              1.00000    
## CONTEXT_FRAMEHardware                        0.99995    
## CONTEXT_FRAMEHotel                           0.99906    
## CONTEXT_FRAMEInstitution                     0.99904    
## CONTEXT_FRAMEJewelry_Store                   0.99907    
## CONTEXT_FRAMELandromat                       0.99907    
## CONTEXT_FRAMEMenu                            0.99992    
## CONTEXT_FRAMEMovie_Theater                   0.99911    
## CONTEXT_FRAMENightclub                       0.99908    
## CONTEXT_FRAMENotary_Financial_Services       0.99903    
## CONTEXT_FRAMEResidential                     0.99890    
## CONTEXT_FRAMERestaurant                      0.99901    
## CONTEXT_FRAMEShop                            0.99906    
## CONTEXT_FRAMESpecialty_Foods                 0.99909    
## CONTEXT_FRAMESupermarket                     0.99903    
## CONTEXT_FRAMETravel_Agency                   0.99869    
## CLOSEDFALSE                                  0.99965    
## CLOSEDTRUE                                   0.99964    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df Chi.sq  p-value    
## s(latitude,longitude) 21.46  28.48  64.21 0.000144 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.266   Deviance explained =   28%
## UBRE = 0.070317  Scale est. = 1         n = 1032

# To get odds ratios (commented out for clarity)
# exp(coef(gamE))

# Spanish
summary(gamSLL)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## I(LANGUAGE == "Spanish") ~ s(latitude, longitude, k = 60) + YELP + 
##     COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED
## 
## Parametric coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                 -5.187e+01  6.352e+05   0.000
## YELP                                        -6.091e-01  1.745e-01  -3.490
## COMMUNICATIVE_ROLEEstablishment_Description  5.948e-01  5.321e-01   1.118
## COMMUNICATIVE_ROLEEstablishment_Name         5.442e-01  5.122e-01   1.063
## COMMUNICATIVE_ROLEGraffiti                  -2.188e+00  1.106e+00  -1.978
## COMMUNICATIVE_ROLEinformation               -1.127e-01  1.416e+00  -0.080
## COMMUNICATIVE_ROLEInformation                3.734e-01  5.304e-01   0.704
## COMMUNICATIVE_ROLEInstructions               3.725e-01  1.602e+00   0.233
## COMMUNICATIVE_ROLELeaflet                   -1.945e-01  1.445e+00  -0.135
## COMMUNICATIVE_ROLESlogan                     2.172e-01  1.458e+00   0.149
## COMMUNICATIVE_ROLEStreet_Signs              -8.371e-01  1.373e+00  -0.610
## COMMUNICATIVE_ROLETrademark                 -2.306e+01  5.871e+05   0.000
## MATERIALITYHand_Written                     -9.566e-01  9.062e-01  -1.056
## MATERIALITYHome_Printed                     -1.217e+00  9.193e-01  -1.324
## MATERIALITYPermanent                        -2.994e+00  1.254e+00  -2.388
## MATERIALITYProfessionally_Printed           -2.166e+00  8.988e-01  -2.410
## CONTEXT_FRAMEAuto_Mechanic                   9.136e-01  6.351e+05   0.000
## CONTEXT_FRAMEBakery                          2.950e+01  2.420e+05   0.000
## CONTEXT_FRAMEBar                             2.672e+01  2.420e+05   0.000
## CONTEXT_FRAMEBeauty_Hair_Salon               2.752e+01  2.420e+05   0.000
## CONTEXT_FRAMEBusiness                        2.676e+01  2.420e+05   0.000
## CONTEXT_FRAMECafe                            1.552e+00  2.580e+05   0.000
## CONTEXT_FRAMEClothing                        2.849e+01  2.420e+05   0.000
## CONTEXT_FRAMECommentary                      2.818e+01  2.420e+05   0.000
## CONTEXT_FRAMEExternal                        2.729e+01  2.420e+05   0.000
## CONTEXT_FRAMEFlier                           5.829e+01  4.805e+05   0.000
## CONTEXT_FRAMEGallery_Museum                  2.657e+01  2.420e+05   0.000
## CONTEXT_FRAMEGrocery                         3.085e+00  3.651e+05   0.000
## CONTEXT_FRAMEGrocery_Liquor_Store            2.732e+01  2.420e+05   0.000
## CONTEXT_FRAMEGym_Fitness_Studio             -6.323e-02  3.655e+05   0.000
## CONTEXT_FRAMEHardware                        1.891e+00  3.722e+05   0.000
## CONTEXT_FRAMEHotel                           1.565e+00  2.992e+05   0.000
## CONTEXT_FRAMEInstitution                     2.809e+01  2.420e+05   0.000
## CONTEXT_FRAMEJewelry_Store                   2.580e+01  2.420e+05   0.000
## CONTEXT_FRAMELandromat                       2.648e+01  2.420e+05   0.000
## CONTEXT_FRAMEMenu                            6.346e+00  6.351e+05   0.000
## CONTEXT_FRAMEMovie_Theater                   2.658e+01  2.420e+05   0.000
## CONTEXT_FRAMENightclub                       1.394e+00  3.179e+05   0.000
## CONTEXT_FRAMENotary_Financial_Services       2.759e+01  2.420e+05   0.000
## CONTEXT_FRAMEResidential                     5.575e+01  3.985e+05   0.000
## CONTEXT_FRAMERestaurant                      2.805e+01  2.420e+05   0.000
## CONTEXT_FRAMEShop                            2.635e+01  2.420e+05   0.000
## CONTEXT_FRAMESpecialty_Foods                 2.655e+01  2.420e+05   0.000
## CONTEXT_FRAMESupermarket                     2.723e+01  2.420e+05   0.000
## CONTEXT_FRAMETravel_Agency                   3.183e+01  2.420e+05   0.000
## CLOSEDFALSE                                  2.481e+01  5.873e+05   0.000
## CLOSEDTRUE                                   2.529e+01  5.873e+05   0.000
##                                             Pr(>|z|)    
## (Intercept)                                 0.999935    
## YELP                                        0.000482 ***
## COMMUNICATIVE_ROLEEstablishment_Description 0.263683    
## COMMUNICATIVE_ROLEEstablishment_Name        0.288004    
## COMMUNICATIVE_ROLEGraffiti                  0.047951 *  
## COMMUNICATIVE_ROLEinformation               0.936542    
## COMMUNICATIVE_ROLEInformation               0.481422    
## COMMUNICATIVE_ROLEInstructions              0.816079    
## COMMUNICATIVE_ROLELeaflet                   0.892933    
## COMMUNICATIVE_ROLESlogan                    0.881545    
## COMMUNICATIVE_ROLEStreet_Signs              0.541943    
## COMMUNICATIVE_ROLETrademark                 0.999969    
## MATERIALITYHand_Written                     0.291186    
## MATERIALITYHome_Printed                     0.185457    
## MATERIALITYPermanent                        0.016922 *  
## MATERIALITYProfessionally_Printed           0.015944 *  
## CONTEXT_FRAMEAuto_Mechanic                  0.999999    
## CONTEXT_FRAMEBakery                         0.999903    
## CONTEXT_FRAMEBar                            0.999912    
## CONTEXT_FRAMEBeauty_Hair_Salon              0.999909    
## CONTEXT_FRAMEBusiness                       0.999912    
## CONTEXT_FRAMECafe                           0.999995    
## CONTEXT_FRAMEClothing                       0.999906    
## CONTEXT_FRAMECommentary                     0.999907    
## CONTEXT_FRAMEExternal                       0.999910    
## CONTEXT_FRAMEFlier                          0.999903    
## CONTEXT_FRAMEGallery_Museum                 0.999912    
## CONTEXT_FRAMEGrocery                        0.999993    
## CONTEXT_FRAMEGrocery_Liquor_Store           0.999910    
## CONTEXT_FRAMEGym_Fitness_Studio             1.000000    
## CONTEXT_FRAMEHardware                       0.999996    
## CONTEXT_FRAMEHotel                          0.999996    
## CONTEXT_FRAMEInstitution                    0.999907    
## CONTEXT_FRAMEJewelry_Store                  0.999915    
## CONTEXT_FRAMELandromat                      0.999913    
## CONTEXT_FRAMEMenu                           0.999992    
## CONTEXT_FRAMEMovie_Theater                  0.999912    
## CONTEXT_FRAMENightclub                      0.999997    
## CONTEXT_FRAMENotary_Financial_Services      0.999909    
## CONTEXT_FRAMEResidential                    0.999888    
## CONTEXT_FRAMERestaurant                     0.999908    
## CONTEXT_FRAMEShop                           0.999913    
## CONTEXT_FRAMESpecialty_Foods                0.999912    
## CONTEXT_FRAMESupermarket                    0.999910    
## CONTEXT_FRAMETravel_Agency                  0.999895    
## CLOSEDFALSE                                 0.999966    
## CLOSEDTRUE                                  0.999966    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df Chi.sq p-value   
## s(latitude,longitude) 35.46   43.9  72.91 0.00415 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.243   Deviance explained = 29.5%
## UBRE = -0.10451  Scale est. = 1         n = 1032

# Odds ratios
# exp(coef(gamES))

# Mostly English with Some Spanish
# Spanish
summary(gamESLL)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## I(LANGUAGE == "Eng_Span") ~ s(latitude, longitude, k = 60) + 
##     YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + 
##     CLOSED
## 
## Parametric coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                 -7.440e+01  4.502e+05   0.000
## YELP                                         4.208e-03  2.567e-01   0.016
## COMMUNICATIVE_ROLEEstablishment_Description  9.883e-01  1.150e+00   0.860
## COMMUNICATIVE_ROLEEstablishment_Name         1.158e+00  1.112e+00   1.041
## COMMUNICATIVE_ROLEGraffiti                   2.157e+00  1.442e+00   1.496
## COMMUNICATIVE_ROLEinformation               -2.275e+01  2.287e+05   0.000
## COMMUNICATIVE_ROLEInformation                1.488e+00  1.131e+00   1.316
## COMMUNICATIVE_ROLEInstructions              -2.221e+01  1.465e+05   0.000
## COMMUNICATIVE_ROLELeaflet                   -1.970e+01  1.068e+05   0.000
## COMMUNICATIVE_ROLESlogan                     3.117e+00  1.712e+00   1.821
## COMMUNICATIVE_ROLEStreet_Signs              -2.277e+01  7.992e+04   0.000
## COMMUNICATIVE_ROLETrademark                 -2.167e+01  4.078e+05   0.000
## MATERIALITYHand_Written                      2.143e+01  8.507e+04   0.000
## MATERIALITYHome_Printed                      2.057e+01  8.507e+04   0.000
## MATERIALITYPermanent                         2.155e+01  8.507e+04   0.000
## MATERIALITYProfessionally_Printed            2.131e+01  8.507e+04   0.000
## CONTEXT_FRAMEAuto_Mechanic                   1.795e+00  4.421e+05   0.000
## CONTEXT_FRAMEBakery                          2.402e+01  1.707e+05   0.000
## CONTEXT_FRAMEBar                             3.610e-01  1.884e+05   0.000
## CONTEXT_FRAMEBeauty_Hair_Salon               2.429e+01  1.707e+05   0.000
## CONTEXT_FRAMEBusiness                        2.354e+01  1.707e+05   0.000
## CONTEXT_FRAMECafe                            5.885e-01  1.834e+05   0.000
## CONTEXT_FRAMEClothing                        6.496e-02  1.935e+05   0.000
## CONTEXT_FRAMECommentary                      2.744e-01  1.982e+05   0.000
## CONTEXT_FRAMEExternal                        2.429e+01  1.707e+05   0.000
## CONTEXT_FRAMEFlier                           2.057e+01  3.517e+05   0.000
## CONTEXT_FRAMEGallery_Museum                  2.447e+01  1.707e+05   0.000
## CONTEXT_FRAMEGrocery                         6.510e-01  2.611e+05   0.000
## CONTEXT_FRAMEGrocery_Liquor_Store            2.496e+01  1.707e+05   0.000
## CONTEXT_FRAMEGym_Fitness_Studio              7.352e-01  2.615e+05   0.000
## CONTEXT_FRAMEHardware                        6.482e-01  2.636e+05   0.000
## CONTEXT_FRAMEHotel                           2.572e+01  1.707e+05   0.000
## CONTEXT_FRAMEInstitution                     1.031e+00  1.925e+05   0.000
## CONTEXT_FRAMEJewelry_Store                   2.431e+01  1.707e+05   0.000
## CONTEXT_FRAMELandromat                       2.533e+01  1.707e+05   0.000
## CONTEXT_FRAMEMenu                            1.820e+00  4.421e+05   0.000
## CONTEXT_FRAMEMovie_Theater                   1.285e-01  2.191e+05   0.000
## CONTEXT_FRAMENightclub                       2.516e+01  1.707e+05   0.000
## CONTEXT_FRAMENotary_Financial_Services       5.360e-01  1.964e+05   0.000
## CONTEXT_FRAMEResidential                     8.533e-01  2.642e+05   0.000
## CONTEXT_FRAMERestaurant                      2.431e+01  1.707e+05   0.000
## CONTEXT_FRAMEShop                            2.440e+01  1.707e+05   0.000
## CONTEXT_FRAMESpecialty_Foods                 9.096e-01  2.038e+05   0.000
## CONTEXT_FRAMESupermarket                     2.522e+01  1.707e+05   0.000
## CONTEXT_FRAMETravel_Agency                   1.447e+00  2.498e+05   0.000
## CLOSEDFALSE                                  2.440e+01  4.078e+05   0.000
## CLOSEDTRUE                                   2.556e+01  4.078e+05   0.000
##                                             Pr(>|z|)  
## (Intercept)                                   0.9999  
## YELP                                          0.9869  
## COMMUNICATIVE_ROLEEstablishment_Description   0.3900  
## COMMUNICATIVE_ROLEEstablishment_Name          0.2980  
## COMMUNICATIVE_ROLEGraffiti                    0.1348  
## COMMUNICATIVE_ROLEinformation                 0.9999  
## COMMUNICATIVE_ROLEInformation                 0.1881  
## COMMUNICATIVE_ROLEInstructions                0.9999  
## COMMUNICATIVE_ROLELeaflet                     0.9999  
## COMMUNICATIVE_ROLESlogan                      0.0686 .
## COMMUNICATIVE_ROLEStreet_Signs                0.9998  
## COMMUNICATIVE_ROLETrademark                   1.0000  
## MATERIALITYHand_Written                       0.9998  
## MATERIALITYHome_Printed                       0.9998  
## MATERIALITYPermanent                          0.9998  
## MATERIALITYProfessionally_Printed             0.9998  
## CONTEXT_FRAMEAuto_Mechanic                    1.0000  
## CONTEXT_FRAMEBakery                           0.9999  
## CONTEXT_FRAMEBar                              1.0000  
## CONTEXT_FRAMEBeauty_Hair_Salon                0.9999  
## CONTEXT_FRAMEBusiness                         0.9999  
## CONTEXT_FRAMECafe                             1.0000  
## CONTEXT_FRAMEClothing                         1.0000  
## CONTEXT_FRAMECommentary                       1.0000  
## CONTEXT_FRAMEExternal                         0.9999  
## CONTEXT_FRAMEFlier                            1.0000  
## CONTEXT_FRAMEGallery_Museum                   0.9999  
## CONTEXT_FRAMEGrocery                          1.0000  
## CONTEXT_FRAMEGrocery_Liquor_Store             0.9999  
## CONTEXT_FRAMEGym_Fitness_Studio               1.0000  
## CONTEXT_FRAMEHardware                         1.0000  
## CONTEXT_FRAMEHotel                            0.9999  
## CONTEXT_FRAMEInstitution                      1.0000  
## CONTEXT_FRAMEJewelry_Store                    0.9999  
## CONTEXT_FRAMELandromat                        0.9999  
## CONTEXT_FRAMEMenu                             1.0000  
## CONTEXT_FRAMEMovie_Theater                    1.0000  
## CONTEXT_FRAMENightclub                        0.9999  
## CONTEXT_FRAMENotary_Financial_Services        1.0000  
## CONTEXT_FRAMEResidential                      1.0000  
## CONTEXT_FRAMERestaurant                       0.9999  
## CONTEXT_FRAMEShop                             0.9999  
## CONTEXT_FRAMESpecialty_Foods                  1.0000  
## CONTEXT_FRAMESupermarket                      0.9999  
## CONTEXT_FRAMETravel_Agency                    1.0000  
## CLOSEDFALSE                                   1.0000  
## CLOSEDTRUE                                    0.9999  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df Chi.sq p-value
## s(latitude,longitude) 18.71  25.06  23.19   0.568
## 
## R-sq.(adj) =  0.041   Deviance explained = 20.5%
## UBRE = -0.57808  Scale est. = 1         n = 1032

# Odds ratios
# exp(coef(gamES))

# Mostly Spanish with some English
summary(gamSELL)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## I(LANGUAGE == "Span_Eng") ~ s(latitude, longitude, k = 60) + 
##     YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + 
##     CLOSED
## 
## Parametric coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                 -1.219e+02  3.702e+07   0.000
## YELP                                        -9.442e-01  2.748e-01  -3.436
## COMMUNICATIVE_ROLEEstablishment_Description  1.607e+00  1.093e+00   1.470
## COMMUNICATIVE_ROLEEstablishment_Name         1.024e+00  1.079e+00   0.949
## COMMUNICATIVE_ROLEGraffiti                  -2.759e+01  1.917e+07   0.000
## COMMUNICATIVE_ROLEinformation                2.203e+00  1.728e+00   1.274
## COMMUNICATIVE_ROLEInformation                1.501e+00  1.109e+00   1.354
## COMMUNICATIVE_ROLEInstructions              -9.506e-01  2.834e+07   0.000
## COMMUNICATIVE_ROLELeaflet                   -2.553e+01  3.663e+07   0.000
## COMMUNICATIVE_ROLESlogan                    -3.002e+01  2.740e+07   0.000
## COMMUNICATIVE_ROLEStreet_Signs              -2.743e+01  3.036e+06   0.000
## COMMUNICATIVE_ROLETrademark                 -2.870e+01  6.711e+07   0.000
## MATERIALITYHand_Written                      3.043e+01  2.167e+07   0.000
## MATERIALITYHome_Printed                      3.140e+01  2.167e+07   0.000
## MATERIALITYPermanent                         3.166e+01  2.167e+07   0.000
## MATERIALITYProfessionally_Printed            3.120e+01  2.167e+07   0.000
## CONTEXT_FRAMEAuto_Mechanic                   4.002e+01  7.351e+07   0.000
## CONTEXT_FRAMEBakery                          7.124e+01  3.001e+07   0.000
## CONTEXT_FRAMEBar                             7.258e+01  3.001e+07   0.000
## CONTEXT_FRAMEBeauty_Hair_Salon               7.214e+01  3.001e+07   0.000
## CONTEXT_FRAMEBusiness                        7.179e+01  3.001e+07   0.000
## CONTEXT_FRAMECafe                            4.060e+01  3.221e+07   0.000
## CONTEXT_FRAMEClothing                        7.157e+01  3.001e+07   0.000
## CONTEXT_FRAMECommentary                      4.359e+01  4.121e+07   0.000
## CONTEXT_FRAMEExternal                        4.199e+01  3.107e+07   0.000
## CONTEXT_FRAMEFlier                           6.552e+01  6.704e+07   0.000
## CONTEXT_FRAMEGallery_Museum                  7.347e+01  3.001e+07   0.000
## CONTEXT_FRAMEGrocery                         4.239e+01  4.502e+07   0.000
## CONTEXT_FRAMEGrocery_Liquor_Store            7.224e+01  3.001e+07   0.000
## CONTEXT_FRAMEGym_Fitness_Studio              3.940e+01  4.502e+07   0.000
## CONTEXT_FRAMEHardware                        4.143e+01  4.502e+07   0.000
## CONTEXT_FRAMEHotel                           4.115e+01  3.676e+07   0.000
## CONTEXT_FRAMEInstitution                     3.949e+01  3.373e+07   0.000
## CONTEXT_FRAMEJewelry_Store                   7.331e+01  3.001e+07   0.000
## CONTEXT_FRAMELandromat                       7.152e+01  3.001e+07   0.000
## CONTEXT_FRAMEMenu                            4.482e+01  7.351e+07   0.000
## CONTEXT_FRAMEMovie_Theater                   4.067e+01  3.826e+07   0.000
## CONTEXT_FRAMENightclub                       4.036e+01  3.929e+07   0.000
## CONTEXT_FRAMENotary_Financial_Services       7.042e+01  3.001e+07   0.000
## CONTEXT_FRAMEResidential                     4.161e+01  5.024e+07   0.000
## CONTEXT_FRAMERestaurant                      7.267e+01  3.001e+07   0.000
## CONTEXT_FRAMEShop                            7.233e+01  3.001e+07   0.000
## CONTEXT_FRAMESpecialty_Foods                 4.140e+01  3.580e+07   0.000
## CONTEXT_FRAMESupermarket                     7.308e+01  3.001e+07   0.000
## CONTEXT_FRAMETravel_Agency                   7.418e+01  3.001e+07   0.000
## CLOSEDFALSE                                  1.528e+01  6.242e+03   0.002
## CLOSEDTRUE                                   1.474e+01  6.242e+03   0.002
##                                             Pr(>|z|)    
## (Intercept)                                  1.00000    
## YELP                                         0.00059 ***
## COMMUNICATIVE_ROLEEstablishment_Description  0.14161    
## COMMUNICATIVE_ROLEEstablishment_Name         0.34273    
## COMMUNICATIVE_ROLEGraffiti                   1.00000    
## COMMUNICATIVE_ROLEinformation                0.20249    
## COMMUNICATIVE_ROLEInformation                0.17569    
## COMMUNICATIVE_ROLEInstructions               1.00000    
## COMMUNICATIVE_ROLELeaflet                    1.00000    
## COMMUNICATIVE_ROLESlogan                     1.00000    
## COMMUNICATIVE_ROLEStreet_Signs               0.99999    
## COMMUNICATIVE_ROLETrademark                  1.00000    
## MATERIALITYHand_Written                      1.00000    
## MATERIALITYHome_Printed                      1.00000    
## MATERIALITYPermanent                         1.00000    
## MATERIALITYProfessionally_Printed            1.00000    
## CONTEXT_FRAMEAuto_Mechanic                   1.00000    
## CONTEXT_FRAMEBakery                          1.00000    
## CONTEXT_FRAMEBar                             1.00000    
## CONTEXT_FRAMEBeauty_Hair_Salon               1.00000    
## CONTEXT_FRAMEBusiness                        1.00000    
## CONTEXT_FRAMECafe                            1.00000    
## CONTEXT_FRAMEClothing                        1.00000    
## CONTEXT_FRAMECommentary                      1.00000    
## CONTEXT_FRAMEExternal                        1.00000    
## CONTEXT_FRAMEFlier                           1.00000    
## CONTEXT_FRAMEGallery_Museum                  1.00000    
## CONTEXT_FRAMEGrocery                         1.00000    
## CONTEXT_FRAMEGrocery_Liquor_Store            1.00000    
## CONTEXT_FRAMEGym_Fitness_Studio              1.00000    
## CONTEXT_FRAMEHardware                        1.00000    
## CONTEXT_FRAMEHotel                           1.00000    
## CONTEXT_FRAMEInstitution                     1.00000    
## CONTEXT_FRAMEJewelry_Store                   1.00000    
## CONTEXT_FRAMELandromat                       1.00000    
## CONTEXT_FRAMEMenu                            1.00000    
## CONTEXT_FRAMEMovie_Theater                   1.00000    
## CONTEXT_FRAMENightclub                       1.00000    
## CONTEXT_FRAMENotary_Financial_Services       1.00000    
## CONTEXT_FRAMEResidential                     1.00000    
## CONTEXT_FRAMERestaurant                      1.00000    
## CONTEXT_FRAMEShop                            1.00000    
## CONTEXT_FRAMESpecialty_Foods                 1.00000    
## CONTEXT_FRAMESupermarket                     1.00000    
## CONTEXT_FRAMETravel_Agency                   1.00000    
## CLOSEDFALSE                                  0.99805    
## CLOSEDTRUE                                   0.99812    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df Chi.sq p-value  
## s(latitude,longitude) 6.546  9.113  15.16  0.0895 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0607   Deviance explained = 19.7%
## UBRE = -0.5312  Scale est. = 1         n = 1032

# Odds ratios
# exp(coef(gamSE))

# Equal
summary(gamEQLL)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## I(LANGUAGE == "Equal") ~ s(latitude, longitude, k = 60) + YELP + 
##     COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED
## 
## Parametric coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                 -8.083e+01  2.224e+06   0.000
## YELP                                        -1.041e+00  4.483e-01  -2.322
## COMMUNICATIVE_ROLEEstablishment_Description -3.665e-01  1.036e+00  -0.354
## COMMUNICATIVE_ROLEEstablishment_Name        -1.530e-01  9.492e-01  -0.161
## COMMUNICATIVE_ROLEGraffiti                   1.354e+00  1.562e+00   0.867
## COMMUNICATIVE_ROLEinformation                3.844e+00  1.737e+00   2.212
## COMMUNICATIVE_ROLEInformation                1.634e+00  9.271e-01   1.762
## COMMUNICATIVE_ROLEInstructions              -2.572e+01  6.156e+05   0.000
## COMMUNICATIVE_ROLELeaflet                   -2.445e+01  4.656e+05   0.000
## COMMUNICATIVE_ROLESlogan                    -2.640e+01  6.700e+05   0.000
## COMMUNICATIVE_ROLEStreet_Signs               2.472e+00  2.320e+00   1.066
## COMMUNICATIVE_ROLETrademark                 -2.287e+01  2.033e+06   0.000
## MATERIALITYHand_Written                      2.674e+01  4.162e+05   0.000
## MATERIALITYHome_Printed                      2.704e+01  4.162e+05   0.000
## MATERIALITYPermanent                         2.494e+01  4.162e+05   0.000
## MATERIALITYProfessionally_Printed            2.639e+01  4.162e+05   0.000
## CONTEXT_FRAMEAuto_Mechanic                  -2.323e+00  2.184e+06   0.000
## CONTEXT_FRAMEBakery                          2.667e+01  7.987e+05   0.000
## CONTEXT_FRAMEBar                             2.817e-01  8.847e+05   0.000
## CONTEXT_FRAMEBeauty_Hair_Salon              -6.883e-01  8.407e+05   0.000
## CONTEXT_FRAMEBusiness                        2.493e+01  7.987e+05   0.000
## CONTEXT_FRAMECafe                           -2.876e-01  8.560e+05   0.000
## CONTEXT_FRAMEClothing                       -1.417e+00  8.909e+05   0.000
## CONTEXT_FRAMECommentary                     -1.986e+00  9.232e+05   0.000
## CONTEXT_FRAMEExternal                        2.376e+01  7.987e+05   0.000
## CONTEXT_FRAMEFlier                           2.326e+01  1.709e+06   0.000
## CONTEXT_FRAMEGallery_Museum                  2.653e+01  7.987e+05   0.000
## CONTEXT_FRAMEGrocery                         8.718e-01  1.289e+06   0.000
## CONTEXT_FRAMEGrocery_Liquor_Store            2.614e+01  7.987e+05   0.000
## CONTEXT_FRAMEGym_Fitness_Studio             -1.957e+00  1.271e+06   0.000
## CONTEXT_FRAMEHardware                       -3.103e-01  1.247e+06   0.000
## CONTEXT_FRAMEHotel                           2.717e+01  7.987e+05   0.000
## CONTEXT_FRAMEInstitution                     2.406e+01  7.987e+05   0.000
## CONTEXT_FRAMEJewelry_Store                  -1.158e+00  9.500e+05   0.000
## CONTEXT_FRAMELandromat                      -8.332e-01  9.580e+05   0.000
## CONTEXT_FRAMEMenu                            3.488e-01  2.184e+06   0.000
## CONTEXT_FRAMEMovie_Theater                  -1.482e+00  1.035e+06   0.000
## CONTEXT_FRAMENightclub                      -2.489e-01  1.057e+06   0.000
## CONTEXT_FRAMENotary_Financial_Services       2.650e+01  7.987e+05   0.000
## CONTEXT_FRAMEResidential                    -1.179e+00  1.269e+06   0.000
## CONTEXT_FRAMERestaurant                      2.383e+01  7.987e+05   0.000
## CONTEXT_FRAMEShop                            2.496e+01  7.987e+05   0.000
## CONTEXT_FRAMESpecialty_Foods                -1.884e-01  9.262e+05   0.000
## CONTEXT_FRAMESupermarket                     2.449e+01  7.987e+05   0.000
## CONTEXT_FRAMETravel_Agency                  -1.314e+00  1.210e+06   0.000
## CLOSEDFALSE                                  2.602e+01  2.033e+06   0.000
## CLOSEDTRUE                                   1.650e+00  2.046e+06   0.000
##                                             Pr(>|z|)  
## (Intercept)                                   1.0000  
## YELP                                          0.0202 *
## COMMUNICATIVE_ROLEEstablishment_Description   0.7235  
## COMMUNICATIVE_ROLEEstablishment_Name          0.8720  
## COMMUNICATIVE_ROLEGraffiti                    0.3862  
## COMMUNICATIVE_ROLEinformation                 0.0269 *
## COMMUNICATIVE_ROLEInformation                 0.0781 .
## COMMUNICATIVE_ROLEInstructions                1.0000  
## COMMUNICATIVE_ROLELeaflet                     1.0000  
## COMMUNICATIVE_ROLESlogan                      1.0000  
## COMMUNICATIVE_ROLEStreet_Signs                0.2866  
## COMMUNICATIVE_ROLETrademark                   1.0000  
## MATERIALITYHand_Written                       0.9999  
## MATERIALITYHome_Printed                       0.9999  
## MATERIALITYPermanent                          1.0000  
## MATERIALITYProfessionally_Printed             0.9999  
## CONTEXT_FRAMEAuto_Mechanic                    1.0000  
## CONTEXT_FRAMEBakery                           1.0000  
## CONTEXT_FRAMEBar                              1.0000  
## CONTEXT_FRAMEBeauty_Hair_Salon                1.0000  
## CONTEXT_FRAMEBusiness                         1.0000  
## CONTEXT_FRAMECafe                             1.0000  
## CONTEXT_FRAMEClothing                         1.0000  
## CONTEXT_FRAMECommentary                       1.0000  
## CONTEXT_FRAMEExternal                         1.0000  
## CONTEXT_FRAMEFlier                            1.0000  
## CONTEXT_FRAMEGallery_Museum                   1.0000  
## CONTEXT_FRAMEGrocery                          1.0000  
## CONTEXT_FRAMEGrocery_Liquor_Store             1.0000  
## CONTEXT_FRAMEGym_Fitness_Studio               1.0000  
## CONTEXT_FRAMEHardware                         1.0000  
## CONTEXT_FRAMEHotel                            1.0000  
## CONTEXT_FRAMEInstitution                      1.0000  
## CONTEXT_FRAMEJewelry_Store                    1.0000  
## CONTEXT_FRAMELandromat                        1.0000  
## CONTEXT_FRAMEMenu                             1.0000  
## CONTEXT_FRAMEMovie_Theater                    1.0000  
## CONTEXT_FRAMENightclub                        1.0000  
## CONTEXT_FRAMENotary_Financial_Services        1.0000  
## CONTEXT_FRAMEResidential                      1.0000  
## CONTEXT_FRAMERestaurant                       1.0000  
## CONTEXT_FRAMEShop                             1.0000  
## CONTEXT_FRAMESpecialty_Foods                  1.0000  
## CONTEXT_FRAMESupermarket                      1.0000  
## CONTEXT_FRAMETravel_Agency                    1.0000  
## CLOSEDFALSE                                   1.0000  
## CLOSEDTRUE                                    1.0000  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                        edf Ref.df Chi.sq p-value
## s(latitude,longitude) 7.11  9.628  15.22   0.102
## 
## R-sq.(adj) =  0.144   Deviance explained = 31.5%
## UBRE = -0.71511  Scale est. = 1         n = 1032

# Odds ratios
# exp(coef(gamEQ))

We can see from these results a lot of information – what is significant, deviance explained, coefficients, etc. But it is also useful to plot probabilities.

# Plot probabilities? (Adapted from http://myweb.uiowa.edu/pbreheny/publications/visreg.pdf)
library(visreg)
# We will just look at those flagged as 'significant'
# Probability of English by coordinate
visreg2d(gamELL, "longitude", "latitude", plot.type="image")

# Spanish
visreg2d(gamSLL, "longitude", "latitude", plot.type="image")

Combining LL and Instagram Data

Remember to remind R that your ‘Mode’ is actually a category, not a continuous variable.

topicsigns$YELP <- as.factor(topicsigns$YELP)
topicsigns$Topic <- as.factor(topicsigns$Topic)

# Subset to get rid of Trademark with has no observations
topicsigns<-subset(topicsigns, COMMUNICATIVE_ROLE=="Establishment_Name" | COMMUNICATIVE_ROLE =="Establishment_Description"| COMMUNICATIVE_ROLE=="Graffiti"| COMMUNICATIVE_ROLE=="Advertisement"| COMMUNICATIVE_ROLE=="Information"| COMMUNICATIVE_ROLE=="Instructions"| COMMUNICATIVE_ROLE=="Leaflet"| COMMUNICATIVE_ROLE=="Slogan"| COMMUNICATIVE_ROLE=="Street_Signs")

# On to the first GAM!
# We have adjusted k to 40.
# English
gamE= gam(I(LANGUAGE=="English")~s(latitude,longitude, k=60) + YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED + Topic,family=binomial, data=topicsigns)

# Spanish
gamS= gam(I(LANGUAGE=="Spanish")~s(latitude,longitude, k=60) + YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED + Topic,family=binomial, data=topicsigns)

# Mostly English with some Spanish
gamES = gam(I(LANGUAGE=="Eng_Span")~s(latitude,longitude, k=60) + YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED + Topic,family=binomial, data=topicsigns)

# Mostly Spanish with some English
gamSE = gam(I(LANGUAGE=="Span_Eng")~s(latitude,longitude, k=60) + YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED + Topic,family=binomial, data=topicsigns)

# Equal
gamEQ = gam(I(LANGUAGE=="Equal")~s(latitude,longitude, k=60) + YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED + Topic,family=binomial, data=topicsigns)

Model checking

It’s also time to check for concurvity, to see if “a smooth term in [my] model could be approximated by one or more of the other smooth terms in the model”. I think I am potentially at risk for this perhaps as “this is often the case when a smooth of space is included in a model, along with smooths of other covariates that also vary more or less smoothly in space”.

concurvity(gamE)

##               para s(latitude,longitude)
## worst    0.9995429             0.8130711
## observed 0.9995429             0.4924878
## estimate 0.9995429             0.4007821

concurvity(gamS)

##               para s(latitude,longitude)
## worst    0.9995429             0.8130711
## observed 0.9995429             0.4332782
## estimate 0.9995429             0.4007821

concurvity(gamES)

##               para s(latitude,longitude)
## worst    0.9995429             0.8130711
## observed 0.9995429             0.3382393
## estimate 0.9995429             0.4007821

concurvity(gamSE)

##               para s(latitude,longitude)
## worst    0.9995429             0.8130711
## observed 0.9995429             0.5602935
## estimate 0.9995429             0.4007821

concurvity(gamEQ)

##               para s(latitude,longitude)
## worst    0.9995429             0.8130711
## observed 0.9995429             0.4685437
## estimate 0.9995429             0.4007821

Concurvity measures suggest that my smooths are okay – they are all pretty far away from 1.

Now onto gam.check to look at more diagnostics. There is an issue here where the k-index is less than 1 for these models, but this doesn’t get solved until k is up to around 300 or so, which would not be the best solution (would make the model prone to over-fitting!). So, while these suggest k is too low, I keep k as is to not over fit the model.

gam.check(gamE)

## 
## Method: UBRE   Optimizer: outer newton
## full convergence after 3 iterations.
## Gradient range [2.738027e-09,2.738027e-09]
## (score 0.04053542 & scale 1).
## Hessian positive definite, eigenvalue range [0.001719122,0.001719122].
## Model rank =  126 / 126 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                           k'    edf k-index p-value
## s(latitude,longitude) 59.000 20.107   0.938    0.02

gam.check(gamS)

## 
## Method: UBRE   Optimizer: outer newton
## full convergence after 9 iterations.
## Gradient range [2.463495e-07,2.463495e-07]
## (score -0.172996 & scale 1).
## Hessian positive definite, eigenvalue range [0.002485206,0.002485206].
## Model rank =  126 / 126 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                          k'   edf k-index p-value
## s(latitude,longitude) 59.00 57.96    1.07    0.98

gam.check(gamES)

## 
## Method: UBRE   Optimizer: outer newton
## full convergence after 3 iterations.
## Gradient range [-5.25853e-09,-5.25853e-09]
## (score -0.5617174 & scale 1).
## Hessian positive definite, eigenvalue range [0.002522951,0.002522951].
## Model rank =  126 / 126 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                          k'   edf k-index p-value
## s(latitude,longitude) 59.00 17.64    1.11       1

gam.check(gamSE)

## 
## Method: UBRE   Optimizer: outer newton
## full convergence after 7 iterations.
## Gradient range [-6.972236e-07,-6.972236e-07]
## (score -0.5066816 & scale 1).
## Hessian positive definite, eigenvalue range [6.96943e-07,6.96943e-07].
## Model rank =  126 / 126 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                          k'   edf k-index p-value
## s(latitude,longitude) 59.00  2.00    1.03    0.84

gam.check(gamEQ)

## 
## Method: UBRE   Optimizer: outer newton
## step failed after 35 iterations.
## Gradient range [3.18731e-05,3.18731e-05]
## (score -0.7286916 & scale 1).
## Hessian positive definite, eigenvalue range [0.001106209,0.001106209].
## Model rank =  126 / 126 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                          k'   edf k-index p-value
## s(latitude,longitude) 59.00 12.76    1.26       1

Results

# English
summary(gamE)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## I(LANGUAGE == "English") ~ s(latitude, longitude, k = 60) + YELP + 
##     COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED + 
##     Topic
## 
## Parametric coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                  4.990e+01  5.036e+05   0.000
## YELP1                                        2.984e-01  3.220e-01   0.927
## YELP2                                        1.588e+00  4.304e-01   3.688
## YELP3                                        2.082e+00  1.254e+00   1.660
## YELP4                                        2.518e+01  2.451e+05   0.000
## COMMUNICATIVE_ROLEEstablishment_Description -1.026e+00  6.068e-01  -1.690
## COMMUNICATIVE_ROLEEstablishment_Name        -7.267e-01  5.730e-01  -1.268
## COMMUNICATIVE_ROLEGraffiti                   8.023e-01  1.078e+00   0.744
## COMMUNICATIVE_ROLEInformation               -8.694e-01  5.884e-01  -1.478
## COMMUNICATIVE_ROLEInstructions               2.497e+01  1.296e+05   0.000
## COMMUNICATIVE_ROLELeaflet                   -3.804e-01  1.913e+00  -0.199
## COMMUNICATIVE_ROLESlogan                    -1.360e+00  1.391e+00  -0.978
## COMMUNICATIVE_ROLEStreet_Signs               1.046e+00  1.450e+00   0.722
## MATERIALITYHand_Written                      6.629e-01  1.495e+00   0.443
## MATERIALITYHome_Printed                      7.129e-01  1.525e+00   0.468
## MATERIALITYPermanent                         2.042e+00  1.721e+00   1.186
## MATERIALITYProfessionally_Printed            1.567e+00  1.500e+00   1.044
## CONTEXT_FRAMEBakery                         -2.696e+01  3.561e+05   0.000
## CONTEXT_FRAMEBar                            -2.457e+01  3.561e+05   0.000
## CONTEXT_FRAMEBeauty_Hair_Salon              -2.574e+01  3.561e+05   0.000
## CONTEXT_FRAMEBusiness                       -2.529e+01  3.561e+05   0.000
## CONTEXT_FRAMECafe                            1.666e-02  3.606e+05   0.000
## CONTEXT_FRAMEClothing                       -2.576e+01  3.561e+05   0.000
## CONTEXT_FRAMECommentary                     -2.541e+01  3.561e+05   0.000
## CONTEXT_FRAMEExternal                       -2.548e+01  3.561e+05   0.000
## CONTEXT_FRAMEFlier                          -5.344e+01  4.362e+05   0.000
## CONTEXT_FRAMEGallery_Museum                 -2.710e+01  3.561e+05   0.000
## CONTEXT_FRAMEGrocery_Liquor_Store           -2.641e+01  3.561e+05   0.000
## CONTEXT_FRAMEGym_Fitness_Studio              4.657e-01  3.964e+05   0.000
## CONTEXT_FRAMEHardware                       -1.595e+00  4.191e+05   0.000
## CONTEXT_FRAMEHotel                          -2.606e+01  3.561e+05   0.000
## CONTEXT_FRAMEInstitution                    -2.557e+01  3.561e+05   0.000
## CONTEXT_FRAMEJewelry_Store                  -2.339e+01  3.561e+05   0.000
## CONTEXT_FRAMELandromat                      -4.914e-01  4.305e+05   0.000
## CONTEXT_FRAMEMenu                           -5.225e+00  5.036e+05   0.000
## CONTEXT_FRAMEMovie_Theater                  -2.367e+01  3.561e+05   0.000
## CONTEXT_FRAMENightclub                      -4.828e-01  4.089e+05   0.000
## CONTEXT_FRAMENotary_Financial_Services      -2.855e+01  3.561e+05   0.000
## CONTEXT_FRAMEResidential                    -7.777e+01  4.673e+05   0.000
## CONTEXT_FRAMERestaurant                     -2.664e+01  3.561e+05   0.000
## CONTEXT_FRAMEShop                           -2.552e+01  3.561e+05   0.000
## CONTEXT_FRAMESpecialty_Foods                -2.583e+01  3.561e+05   0.000
## CONTEXT_FRAMESupermarket                    -2.544e+01  3.561e+05   0.000
## CONTEXT_FRAMETravel_Agency                  -5.426e+01  3.901e+05   0.000
## CLOSEDFALSE                                 -2.463e+01  3.561e+05   0.000
## CLOSEDTRUE                                  -2.430e+01  3.561e+05   0.000
## Topic2                                      -1.673e-01  5.049e-01  -0.331
## Topic3                                       3.205e-01  6.109e-01   0.525
## Topic4                                       9.554e-01  4.934e-01   1.936
## Topic5                                       3.184e-01  5.598e-01   0.569
## Topic6                                       3.246e-01  7.153e-01   0.454
## Topic7                                       1.838e+00  1.023e+00   1.798
## Topic8                                       1.427e+00  8.291e-01   1.721
## Topic9                                       7.251e-01  5.262e-01   1.378
## Topic10                                      1.284e+00  6.455e-01   1.989
## Topic11                                      3.955e-01  1.101e+00   0.359
## Topic12                                     -9.880e-02  6.304e-01  -0.157
## Topic13                                      2.616e+01  1.375e+05   0.000
## Topic14                                      2.721e+01  1.291e+05   0.000
## Topic15                                     -1.580e+00  1.345e+00  -1.175
## Topic16                                      2.033e+00  9.328e-01   2.179
## Topic17                                      7.820e-01  7.100e-01   1.101
## Topic18                                     -1.331e-01  7.560e-01  -0.176
## Topic19                                     -4.293e-01  6.751e-01  -0.636
## Topic20                                      2.557e+01  1.684e+05   0.000
## Topic21                                     -1.319e+00  9.210e-01  -1.433
## Topic22                                      1.759e+00  1.237e+00   1.421
##                                             Pr(>|z|)    
## (Intercept)                                 0.999921    
## YELP1                                       0.353962    
## YELP2                                       0.000226 ***
## YELP3                                       0.096956 .  
## YELP4                                       0.999918    
## COMMUNICATIVE_ROLEEstablishment_Description 0.091033 .  
## COMMUNICATIVE_ROLEEstablishment_Name        0.204729    
## COMMUNICATIVE_ROLEGraffiti                  0.456689    
## COMMUNICATIVE_ROLEInformation               0.139481    
## COMMUNICATIVE_ROLEInstructions              0.999846    
## COMMUNICATIVE_ROLELeaflet                   0.842426    
## COMMUNICATIVE_ROLESlogan                    0.328013    
## COMMUNICATIVE_ROLEStreet_Signs              0.470596    
## MATERIALITYHand_Written                     0.657435    
## MATERIALITYHome_Printed                     0.640091    
## MATERIALITYPermanent                        0.235460    
## MATERIALITYProfessionally_Printed           0.296401    
## CONTEXT_FRAMEBakery                         0.999940    
## CONTEXT_FRAMEBar                            0.999945    
## CONTEXT_FRAMEBeauty_Hair_Salon              0.999942    
## CONTEXT_FRAMEBusiness                       0.999943    
## CONTEXT_FRAMECafe                           1.000000    
## CONTEXT_FRAMEClothing                       0.999942    
## CONTEXT_FRAMECommentary                     0.999943    
## CONTEXT_FRAMEExternal                       0.999943    
## CONTEXT_FRAMEFlier                          0.999902    
## CONTEXT_FRAMEGallery_Museum                 0.999939    
## CONTEXT_FRAMEGrocery_Liquor_Store           0.999941    
## CONTEXT_FRAMEGym_Fitness_Studio             0.999999    
## CONTEXT_FRAMEHardware                       0.999997    
## CONTEXT_FRAMEHotel                          0.999942    
## CONTEXT_FRAMEInstitution                    0.999943    
## CONTEXT_FRAMEJewelry_Store                  0.999948    
## CONTEXT_FRAMELandromat                      0.999999    
## CONTEXT_FRAMEMenu                           0.999992    
## CONTEXT_FRAMEMovie_Theater                  0.999947    
## CONTEXT_FRAMENightclub                      0.999999    
## CONTEXT_FRAMENotary_Financial_Services      0.999936    
## CONTEXT_FRAMEResidential                    0.999867    
## CONTEXT_FRAMERestaurant                     0.999940    
## CONTEXT_FRAMEShop                           0.999943    
## CONTEXT_FRAMESpecialty_Foods                0.999942    
## CONTEXT_FRAMESupermarket                    0.999943    
## CONTEXT_FRAMETravel_Agency                  0.999889    
## CLOSEDFALSE                                 0.999945    
## CLOSEDTRUE                                  0.999946    
## Topic2                                      0.740359    
## Topic3                                      0.599904    
## Topic4                                      0.052811 .  
## Topic5                                      0.569532    
## Topic6                                      0.649956    
## Topic7                                      0.072237 .  
## Topic8                                      0.085337 .  
## Topic9                                      0.168237    
## Topic10                                     0.046698 *  
## Topic11                                     0.719444    
## Topic12                                     0.875468    
## Topic13                                     0.999848    
## Topic14                                     0.999832    
## Topic15                                     0.240117    
## Topic16                                     0.029330 *  
## Topic17                                     0.270734    
## Topic18                                     0.860228    
## Topic19                                     0.524893    
## Topic20                                     0.999879    
## Topic21                                     0.151954    
## Topic22                                     0.155201    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df Chi.sq p-value  
## s(latitude,longitude) 20.11  26.84  41.76  0.0355 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.335   Deviance explained = 38.2%
## UBRE = 0.040535  Scale est. = 1         n = 700

# To get odds ratios (commented out for clarity)
# exp(coef(gamE))

# Spanish
summary(gamS)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## I(LANGUAGE == "Spanish") ~ s(latitude, longitude, k = 60) + YELP + 
##     COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED + 
##     Topic
## 
## Parametric coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                 -6.405e+02  9.491e+07   0.000
## YELP1                                       -3.300e-01  5.326e-01  -0.620
## YELP2                                       -2.298e+00  8.522e-01  -2.696
## YELP3                                        5.881e-02  2.561e+00   0.023
## YELP4                                       -4.461e+01  4.745e+07   0.000
## COMMUNICATIVE_ROLEEstablishment_Description  2.528e+00  1.290e+00   1.959
## COMMUNICATIVE_ROLEEstablishment_Name         2.189e+00  1.264e+00   1.732
## COMMUNICATIVE_ROLEGraffiti                  -1.206e+00  1.800e+00  -0.670
## COMMUNICATIVE_ROLEInformation                1.447e+00  1.239e+00   1.168
## COMMUNICATIVE_ROLEInstructions              -2.627e+02  3.032e+07   0.000
## COMMUNICATIVE_ROLELeaflet                    1.924e+00  2.592e+00   0.742
## COMMUNICATIVE_ROLESlogan                     2.383e+00  2.011e+00   1.185
## COMMUNICATIVE_ROLEStreet_Signs              -1.667e+02  1.628e+07   0.000
## MATERIALITYHand_Written                      3.010e+00  3.356e+00   0.897
## MATERIALITYHome_Printed                      2.265e+00  3.597e+00   0.630
## MATERIALITYPermanent                        -1.213e+00  3.832e+00  -0.317
## MATERIALITYProfessionally_Printed            6.240e-02  3.447e+00   0.018
## CONTEXT_FRAMEBakery                          3.139e+02  6.711e+07   0.000
## CONTEXT_FRAMEBar                             2.685e+02  6.946e+07   0.000
## CONTEXT_FRAMEBeauty_Hair_Salon               3.114e+02  6.711e+07   0.000
## CONTEXT_FRAMEBusiness                        3.129e+02  6.711e+07   0.000
## CONTEXT_FRAMECafe                            2.752e+02  6.838e+07   0.000
## CONTEXT_FRAMEClothing                        3.139e+02  6.711e+07   0.000
## CONTEXT_FRAMECommentary                      3.167e+02  6.711e+07   0.000
## CONTEXT_FRAMEExternal                        3.146e+02  6.711e+07   0.000
## CONTEXT_FRAMEFlier                           3.613e+02  8.219e+07   0.000
## CONTEXT_FRAMEGallery_Museum                  3.003e+02  6.711e+07   0.000
## CONTEXT_FRAMEGrocery_Liquor_Store            3.121e+02  6.711e+07   0.000
## CONTEXT_FRAMEGym_Fitness_Studio              2.746e+02  7.503e+07   0.000
## CONTEXT_FRAMEHardware                        2.675e+02  8.219e+07   0.000
## CONTEXT_FRAMEHotel                           2.680e+02  7.249e+07   0.000
## CONTEXT_FRAMEInstitution                     3.120e+02  6.711e+07   0.000
## CONTEXT_FRAMEJewelry_Store                   3.081e+02  6.711e+07   0.000
## CONTEXT_FRAMELandromat                       7.204e+01  8.268e+07   0.000
## CONTEXT_FRAMEMenu                            6.990e+03  9.491e+07   0.000
## CONTEXT_FRAMEMovie_Theater                   3.129e+02  6.711e+07   0.000
## CONTEXT_FRAMENightclub                       2.521e+02  7.749e+07   0.000
## CONTEXT_FRAMENotary_Financial_Services       3.141e+02  6.711e+07   0.000
## CONTEXT_FRAMEResidential                     4.009e+02  8.878e+07   0.000
## CONTEXT_FRAMERestaurant                      3.149e+02  6.711e+07   0.000
## CONTEXT_FRAMEShop                            3.119e+02  6.711e+07   0.000
## CONTEXT_FRAMESpecialty_Foods                 3.495e+02  7.045e+07   0.000
## CONTEXT_FRAMESupermarket                     3.127e+02  6.711e+07   0.000
## CONTEXT_FRAMETravel_Agency                   3.212e+02  6.711e+07   0.000
## CLOSEDFALSE                                  5.548e+01  6.711e+07   0.000
## CLOSEDTRUE                                   5.434e+01  6.711e+07   0.000
## Topic2                                      -1.388e+00  1.392e+00  -0.997
## Topic3                                      -1.043e+00  1.299e+00  -0.803
## Topic4                                       3.978e-01  1.390e+00   0.286
## Topic5                                      -3.210e+00  1.909e+00  -1.682
## Topic6                                      -1.250e+00  1.590e+00  -0.786
## Topic7                                      -7.038e-01  3.401e+00  -0.207
## Topic8                                      -6.842e+00  2.277e+00  -3.005
## Topic9                                      -3.714e+00  1.972e+00  -1.883
## Topic10                                     -2.015e+00  1.988e+00  -1.014
## Topic11                                      1.331e+00  2.402e+00   0.554
## Topic12                                      3.456e-01  1.050e+00   0.329
## Topic13                                     -4.195e+01  3.047e+07   0.000
## Topic14                                     -4.890e+01  3.001e+07   0.000
## Topic15                                     -4.179e+00  2.514e+00  -1.662
## Topic16                                     -7.043e+00  2.499e+00  -2.818
## Topic17                                     -7.767e+00  2.821e+00  -2.753
## Topic18                                     -3.160e+00  2.059e+00  -1.534
## Topic19                                     -1.841e+00  2.599e+00  -0.708
## Topic20                                     -5.276e+01  3.355e+07   0.000
## Topic21                                     -2.558e-01  2.339e+00  -0.109
## Topic22                                     -4.039e+01  1.799e+07   0.000
##                                             Pr(>|z|)   
## (Intercept)                                  0.99999   
## YELP1                                        0.53552   
## YELP2                                        0.00702 **
## YELP3                                        0.98168   
## YELP4                                        1.00000   
## COMMUNICATIVE_ROLEEstablishment_Description  0.05014 . 
## COMMUNICATIVE_ROLEEstablishment_Name         0.08326 . 
## COMMUNICATIVE_ROLEGraffiti                   0.50272   
## COMMUNICATIVE_ROLEInformation                0.24273   
## COMMUNICATIVE_ROLEInstructions               0.99999   
## COMMUNICATIVE_ROLELeaflet                    0.45787   
## COMMUNICATIVE_ROLESlogan                     0.23612   
## COMMUNICATIVE_ROLEStreet_Signs               0.99999   
## MATERIALITYHand_Written                      0.36988   
## MATERIALITYHome_Printed                      0.52884   
## MATERIALITYPermanent                         0.75151   
## MATERIALITYProfessionally_Printed            0.98556   
## CONTEXT_FRAMEBakery                          1.00000   
## CONTEXT_FRAMEBar                             1.00000   
## CONTEXT_FRAMEBeauty_Hair_Salon               1.00000   
## CONTEXT_FRAMEBusiness                        1.00000   
## CONTEXT_FRAMECafe                            1.00000   
## CONTEXT_FRAMEClothing                        1.00000   
## CONTEXT_FRAMECommentary                      1.00000   
## CONTEXT_FRAMEExternal                        1.00000   
## CONTEXT_FRAMEFlier                           1.00000   
## CONTEXT_FRAMEGallery_Museum                  1.00000   
## CONTEXT_FRAMEGrocery_Liquor_Store            1.00000   
## CONTEXT_FRAMEGym_Fitness_Studio              1.00000   
## CONTEXT_FRAMEHardware                        1.00000   
## CONTEXT_FRAMEHotel                           1.00000   
## CONTEXT_FRAMEInstitution                     1.00000   
## CONTEXT_FRAMEJewelry_Store                   1.00000   
## CONTEXT_FRAMELandromat                       1.00000   
## CONTEXT_FRAMEMenu                            0.99994   
## CONTEXT_FRAMEMovie_Theater                   1.00000   
## CONTEXT_FRAMENightclub                       1.00000   
## CONTEXT_FRAMENotary_Financial_Services       1.00000   
## CONTEXT_FRAMEResidential                     1.00000   
## CONTEXT_FRAMERestaurant                      1.00000   
## CONTEXT_FRAMEShop                            1.00000   
## CONTEXT_FRAMESpecialty_Foods                 1.00000   
## CONTEXT_FRAMESupermarket                     1.00000   
## CONTEXT_FRAMETravel_Agency                   1.00000   
## CLOSEDFALSE                                  1.00000   
## CLOSEDTRUE                                   1.00000   
## Topic2                                       0.31881   
## Topic3                                       0.42176   
## Topic4                                       0.77471   
## Topic5                                       0.09262 . 
## Topic6                                       0.43168   
## Topic7                                       0.83605   
## Topic8                                       0.00266 **
## Topic9                                       0.05965 . 
## Topic10                                      0.31071   
## Topic11                                      0.57937   
## Topic12                                      0.74211   
## Topic13                                      1.00000   
## Topic14                                      1.00000   
## Topic15                                      0.09648 . 
## Topic16                                      0.00483 **
## Topic17                                      0.00591 **
## Topic18                                      0.12492   
## Topic19                                      0.47879   
## Topic20                                      1.00000   
## Topic21                                      0.91292   
## Topic22                                      1.00000   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df Chi.sq p-value
## s(latitude,longitude) 57.96  58.61  64.08   0.294
## 
## R-sq.(adj) =  0.438   Deviance explained = 53.4%
## UBRE = -0.173  Scale est. = 1         n = 700

# Odds ratios
# exp(coef(gamES))

# Mostly English with Some Spanish
# Spanish
summary(gamES)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## I(LANGUAGE == "Eng_Span") ~ s(latitude, longitude, k = 60) + 
##     YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + 
##     CLOSED + Topic
## 
## Parametric coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                 -9.157e+01  6.723e+07   0.000
## YELP1                                        1.712e+00  1.024e+00   1.672
## YELP2                                       -3.042e-01  1.165e+00  -0.261
## YELP3                                       -3.202e+01  1.331e+07   0.000
## YELP4                                        9.460e-01  5.054e+07   0.000
## COMMUNICATIVE_ROLEEstablishment_Description -2.680e-02  1.475e+00  -0.018
## COMMUNICATIVE_ROLEEstablishment_Name         2.765e-02  1.424e+00   0.019
## COMMUNICATIVE_ROLEGraffiti                   3.348e+01  8.920e+06   0.000
## COMMUNICATIVE_ROLEInformation               -9.775e-01  1.595e+00  -0.613
## COMMUNICATIVE_ROLEInstructions              -2.237e+00  3.135e+07   0.000
## COMMUNICATIVE_ROLELeaflet                    1.298e+01  1.978e+07   0.000
## COMMUNICATIVE_ROLESlogan                     2.938e+00  2.178e+00   1.349
## COMMUNICATIVE_ROLEStreet_Signs              -4.105e+00  1.867e+07   0.000
## MATERIALITYHand_Written                      2.620e+01  2.552e+06   0.000
## MATERIALITYHome_Printed                      2.584e+01  2.552e+06   0.000
## MATERIALITYPermanent                         2.910e+01  2.552e+06   0.000
## MATERIALITYProfessionally_Printed            2.669e+01  2.552e+06   0.000
## CONTEXT_FRAMEBakery                          2.951e+01  6.994e+07   0.000
## CONTEXT_FRAMEBar                             3.607e+01  6.949e+07   0.000
## CONTEXT_FRAMEBeauty_Hair_Salon               6.884e+01  6.711e+07   0.000
## CONTEXT_FRAMEBusiness                        6.671e+01  6.711e+07   0.000
## CONTEXT_FRAMECafe                            3.592e+01  6.835e+07   0.000
## CONTEXT_FRAMEClothing                        3.707e+01  6.932e+07   0.000
## CONTEXT_FRAMECommentary                      5.441e+00  7.104e+07   0.000
## CONTEXT_FRAMEExternal                        3.740e+01  6.770e+07   0.000
## CONTEXT_FRAMEFlier                           2.216e+01  8.454e+07   0.000
## CONTEXT_FRAMEGallery_Museum                  7.062e+01  6.711e+07   0.000
## CONTEXT_FRAMEGrocery_Liquor_Store            6.907e+01  6.711e+07   0.000
## CONTEXT_FRAMEGym_Fitness_Studio              3.604e+01  7.503e+07   0.000
## CONTEXT_FRAMEHardware                        3.687e+01  8.219e+07   0.000
## CONTEXT_FRAMEHotel                           7.043e+01  6.711e+07   0.000
## CONTEXT_FRAMEInstitution                     3.782e+01  7.127e+07   0.000
## CONTEXT_FRAMEJewelry_Store                   3.541e+01  7.045e+07   0.000
## CONTEXT_FRAMELandromat                       4.067e+01  8.232e+07   0.000
## CONTEXT_FRAMEMenu                            4.285e+01  9.491e+07   0.000
## CONTEXT_FRAMEMovie_Theater                   3.530e+01  7.187e+07   0.000
## CONTEXT_FRAMENightclub                       3.814e+01  7.786e+07   0.000
## CONTEXT_FRAMENotary_Financial_Services       3.813e+01  7.351e+07   0.000
## CONTEXT_FRAMEResidential                     6.612e+01  8.893e+07   0.000
## CONTEXT_FRAMERestaurant                      6.800e+01  6.711e+07   0.000
## CONTEXT_FRAMEShop                            6.791e+01  6.711e+07   0.000
## CONTEXT_FRAMESpecialty_Foods                 3.632e+01  7.062e+07   0.000
## CONTEXT_FRAMESupermarket                     6.970e+01  6.711e+07   0.000
## CONTEXT_FRAMETravel_Agency                   3.740e+01  7.351e+07   0.000
## CLOSEDFALSE                                 -7.310e+00  3.126e+06   0.000
## CLOSEDTRUE                                  -7.519e+00  3.126e+06   0.000
## Topic2                                       2.534e-01  1.140e+00   0.222
## Topic3                                      -5.211e-01  1.581e+00  -0.330
## Topic4                                      -2.192e+00  1.360e+00  -1.611
## Topic5                                      -3.224e+01  1.140e+07   0.000
## Topic6                                       9.174e-01  1.343e+00   0.683
## Topic7                                       1.400e+00  1.672e+00   0.838
## Topic8                                      -3.084e+01  3.122e+06   0.000
## Topic9                                       8.210e-01  1.016e+00   0.808
## Topic10                                     -1.965e+00  1.470e+00  -1.336
## Topic11                                     -3.118e+01  2.340e+07   0.000
## Topic12                                     -3.255e+01  1.469e+07   0.000
## Topic13                                     -3.089e+01  3.046e+07   0.000
## Topic14                                     -2.990e+01  3.006e+07   0.000
## Topic15                                      6.170e+00  1.775e+00   3.476
## Topic16                                     -5.035e-01  1.513e+00  -0.333
## Topic17                                     -3.304e+01  1.684e+07   0.000
## Topic18                                      1.425e-01  2.359e+00   0.060
## Topic19                                     -3.344e-01  1.301e+00  -0.257
## Topic20                                     -3.063e+01  3.395e+07   0.000
## Topic21                                     -3.061e+01  1.837e+07   0.000
## Topic22                                     -5.892e+01  9.203e+06   0.000
##                                             Pr(>|z|)    
## (Intercept)                                 0.999999    
## YELP1                                       0.094554 .  
## YELP2                                       0.793932    
## YELP3                                       0.999998    
## YELP4                                       1.000000    
## COMMUNICATIVE_ROLEEstablishment_Description 0.985504    
## COMMUNICATIVE_ROLEEstablishment_Name        0.984507    
## COMMUNICATIVE_ROLEGraffiti                  0.999997    
## COMMUNICATIVE_ROLEInformation               0.539858    
## COMMUNICATIVE_ROLEInstructions              1.000000    
## COMMUNICATIVE_ROLELeaflet                   0.999999    
## COMMUNICATIVE_ROLESlogan                    0.177245    
## COMMUNICATIVE_ROLEStreet_Signs              1.000000    
## MATERIALITYHand_Written                     0.999992    
## MATERIALITYHome_Printed                     0.999992    
## MATERIALITYPermanent                        0.999991    
## MATERIALITYProfessionally_Printed           0.999992    
## CONTEXT_FRAMEBakery                         1.000000    
## CONTEXT_FRAMEBar                            1.000000    
## CONTEXT_FRAMEBeauty_Hair_Salon              0.999999    
## CONTEXT_FRAMEBusiness                       0.999999    
## CONTEXT_FRAMECafe                           1.000000    
## CONTEXT_FRAMEClothing                       1.000000    
## CONTEXT_FRAMECommentary                     1.000000    
## CONTEXT_FRAMEExternal                       1.000000    
## CONTEXT_FRAMEFlier                          1.000000    
## CONTEXT_FRAMEGallery_Museum                 0.999999    
## CONTEXT_FRAMEGrocery_Liquor_Store           0.999999    
## CONTEXT_FRAMEGym_Fitness_Studio             1.000000    
## CONTEXT_FRAMEHardware                       1.000000    
## CONTEXT_FRAMEHotel                          0.999999    
## CONTEXT_FRAMEInstitution                    1.000000    
## CONTEXT_FRAMEJewelry_Store                  1.000000    
## CONTEXT_FRAMELandromat                      1.000000    
## CONTEXT_FRAMEMenu                           1.000000    
## CONTEXT_FRAMEMovie_Theater                  1.000000    
## CONTEXT_FRAMENightclub                      1.000000    
## CONTEXT_FRAMENotary_Financial_Services      1.000000    
## CONTEXT_FRAMEResidential                    0.999999    
## CONTEXT_FRAMERestaurant                     0.999999    
## CONTEXT_FRAMEShop                           0.999999    
## CONTEXT_FRAMESpecialty_Foods                1.000000    
## CONTEXT_FRAMESupermarket                    0.999999    
## CONTEXT_FRAMETravel_Agency                  1.000000    
## CLOSEDFALSE                                 0.999998    
## CLOSEDTRUE                                  0.999998    
## Topic2                                      0.824161    
## Topic3                                      0.741752    
## Topic4                                      0.107155    
## Topic5                                      0.999998    
## Topic6                                      0.494642    
## Topic7                                      0.402244    
## Topic8                                      0.999992    
## Topic9                                      0.418832    
## Topic10                                     0.181480    
## Topic11                                     0.999999    
## Topic12                                     0.999998    
## Topic13                                     0.999999    
## Topic14                                     0.999999    
## Topic15                                     0.000509 ***
## Topic16                                     0.739367    
## Topic17                                     0.999998    
## Topic18                                     0.951830    
## Topic19                                     0.797135    
## Topic20                                     0.999999    
## Topic21                                     0.999999    
## Topic22                                     0.999995    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df Chi.sq p-value
## s(latitude,longitude) 17.64  23.38  23.22   0.467
## 
## R-sq.(adj) =  0.256   Deviance explained = 47.1%
## UBRE = -0.56172  Scale est. = 1         n = 700

# Odds ratios
# exp(coef(gamES))

# Mostly Spanish with some English
summary(gamSE)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## I(LANGUAGE == "Span_Eng") ~ s(latitude, longitude, k = 60) + 
##     YELP + COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + 
##     CLOSED + Topic
## 
## Parametric coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                 -8.388e+06  6.712e+07  -0.125
## YELP1                                       -5.546e-01  5.436e-01  -1.020
## YELP2                                       -1.772e+00  9.086e-01  -1.950
## YELP3                                       -2.471e+01  1.482e+05   0.000
## YELP4                                       -2.476e+01  7.382e+05   0.000
## COMMUNICATIVE_ROLEEstablishment_Description  6.912e-01  1.237e+00   0.559
## COMMUNICATIVE_ROLEEstablishment_Name         2.719e-02  1.204e+00   0.023
## COMMUNICATIVE_ROLEGraffiti                  -2.871e-02  3.738e+05   0.000
## COMMUNICATIVE_ROLEInformation                9.792e-01  1.248e+00   0.785
## COMMUNICATIVE_ROLEInstructions              -1.003e-01  3.930e+05   0.000
## COMMUNICATIVE_ROLELeaflet                   -2.249e+01  3.507e+05   0.000
## COMMUNICATIVE_ROLESlogan                     1.047e+00  4.646e+05   0.000
## COMMUNICATIVE_ROLEStreet_Signs               2.317e+01  2.708e+05   0.000
## MATERIALITYHand_Written                     -2.686e+00  3.420e+05   0.000
## MATERIALITYHome_Printed                     -1.065e+00  3.420e+05   0.000
## MATERIALITYPermanent                        -2.458e+01  3.602e+05   0.000
## MATERIALITYProfessionally_Printed           -8.574e-01  3.420e+05   0.000
## CONTEXT_FRAMEBakery                          8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEBar                             8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEBeauty_Hair_Salon               8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEBusiness                        8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMECafe                            8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEClothing                        8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMECommentary                      8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEExternal                        8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEFlier                           8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEGallery_Museum                  8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEGrocery_Liquor_Store            8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEGym_Fitness_Studio              8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEHardware                        8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEHotel                           8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEInstitution                     8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEJewelry_Store                   8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMELandromat                       8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEMenu                            8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEMovie_Theater                   8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMENightclub                       8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMENotary_Financial_Services       8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEResidential                     8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMERestaurant                      8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMEShop                            8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMESpecialty_Foods                 8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMESupermarket                     8.388e+06  6.711e+07   0.125
## CONTEXT_FRAMETravel_Agency                   8.388e+06  6.711e+07   0.125
## CLOSEDFALSE                                  1.256e+00  1.179e+06   0.000
## CLOSEDTRUE                                   1.712e+00  1.179e+06   0.000
## Topic2                                       2.975e-01  7.916e-01   0.376
## Topic3                                      -9.652e-01  1.170e+00  -0.825
## Topic4                                      -1.114e+00  8.622e-01  -1.292
## Topic5                                       1.321e+00  6.948e-01   1.902
## Topic6                                      -5.116e+01  1.801e+05   0.000
## Topic7                                      -2.609e+01  2.034e+05   0.000
## Topic8                                      -2.550e+01  2.359e+05   0.000
## Topic9                                      -2.607e+01  1.266e+05   0.000
## Topic10                                     -1.364e+00  1.061e+00  -1.285
## Topic11                                     -2.577e+01  2.807e+05   0.000
## Topic12                                      1.118e-01  9.142e-01   0.122
## Topic13                                     -2.633e+01  4.527e+05   0.000
## Topic14                                     -2.553e+01  4.402e+05   0.000
## Topic15                                     -2.618e+01  4.851e+05   0.000
## Topic16                                     -2.533e+01  1.231e+05   0.000
## Topic17                                     -2.112e-01  1.219e+00  -0.173
## Topic18                                     -2.023e+00  1.422e+00  -1.423
## Topic19                                     -6.111e-01  9.939e-01  -0.615
## Topic20                                     -2.548e+01  4.378e+05   0.000
## Topic21                                      1.126e+00  1.005e+00   1.120
## Topic22                                     -2.517e+01  2.460e+05   0.000
##                                             Pr(>|z|)  
## (Intercept)                                   0.9005  
## YELP1                                         0.3076  
## YELP2                                         0.0512 .
## YELP3                                         0.9999  
## YELP4                                         1.0000  
## COMMUNICATIVE_ROLEEstablishment_Description   0.5763  
## COMMUNICATIVE_ROLEEstablishment_Name          0.9820  
## COMMUNICATIVE_ROLEGraffiti                    1.0000  
## COMMUNICATIVE_ROLEInformation                 0.4326  
## COMMUNICATIVE_ROLEInstructions                1.0000  
## COMMUNICATIVE_ROLELeaflet                     0.9999  
## COMMUNICATIVE_ROLESlogan                      1.0000  
## COMMUNICATIVE_ROLEStreet_Signs                0.9999  
## MATERIALITYHand_Written                       1.0000  
## MATERIALITYHome_Printed                       1.0000  
## MATERIALITYPermanent                          0.9999  
## MATERIALITYProfessionally_Printed             1.0000  
## CONTEXT_FRAMEBakery                           0.9005  
## CONTEXT_FRAMEBar                              0.9005  
## CONTEXT_FRAMEBeauty_Hair_Salon                0.9005  
## CONTEXT_FRAMEBusiness                         0.9005  
## CONTEXT_FRAMECafe                             0.9005  
## CONTEXT_FRAMEClothing                         0.9005  
## CONTEXT_FRAMECommentary                       0.9005  
## CONTEXT_FRAMEExternal                         0.9005  
## CONTEXT_FRAMEFlier                            0.9005  
## CONTEXT_FRAMEGallery_Museum                   0.9005  
## CONTEXT_FRAMEGrocery_Liquor_Store             0.9005  
## CONTEXT_FRAMEGym_Fitness_Studio               0.9005  
## CONTEXT_FRAMEHardware                         0.9005  
## CONTEXT_FRAMEHotel                            0.9005  
## CONTEXT_FRAMEInstitution                      0.9005  
## CONTEXT_FRAMEJewelry_Store                    0.9005  
## CONTEXT_FRAMELandromat                        0.9005  
## CONTEXT_FRAMEMenu                             0.9005  
## CONTEXT_FRAMEMovie_Theater                    0.9005  
## CONTEXT_FRAMENightclub                        0.9005  
## CONTEXT_FRAMENotary_Financial_Services        0.9005  
## CONTEXT_FRAMEResidential                      0.9005  
## CONTEXT_FRAMERestaurant                       0.9005  
## CONTEXT_FRAMEShop                             0.9005  
## CONTEXT_FRAMESpecialty_Foods                  0.9005  
## CONTEXT_FRAMESupermarket                      0.9005  
## CONTEXT_FRAMETravel_Agency                    0.9005  
## CLOSEDFALSE                                   1.0000  
## CLOSEDTRUE                                    1.0000  
## Topic2                                        0.7070  
## Topic3                                        0.4094  
## Topic4                                        0.1962  
## Topic5                                        0.0572 .
## Topic6                                        0.9998  
## Topic7                                        0.9999  
## Topic8                                        0.9999  
## Topic9                                        0.9998  
## Topic10                                       0.1986  
## Topic11                                       0.9999  
## Topic12                                       0.9026  
## Topic13                                       1.0000  
## Topic14                                       1.0000  
## Topic15                                       1.0000  
## Topic16                                       0.9998  
## Topic17                                       0.8625  
## Topic18                                       0.1548  
## Topic19                                       0.5387  
## Topic20                                       1.0000  
## Topic21                                       0.2626  
## Topic22                                       0.9999  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df Chi.sq p-value  
## s(latitude,longitude) 2.001  2.001  5.263  0.0721 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.074   Deviance explained = 31.1%
## UBRE = -0.50668  Scale est. = 1         n = 700

# Odds ratios
# exp(coef(gamSE))

# Equal
summary(gamEQ)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## I(LANGUAGE == "Equal") ~ s(latitude, longitude, k = 60) + YELP + 
##     COMMUNICATIVE_ROLE + MATERIALITY + CONTEXT_FRAME + CLOSED + 
##     Topic
## 
## Parametric coefficients:
##                                               Estimate Std. Error z value
## (Intercept)                                  2.852e+02  1.036e+08   0.000
## YELP1                                       -2.500e+01  1.868e+03  -0.013
## YELP2                                       -2.659e+01  3.278e+03  -0.008
## YELP3                                       -1.986e+03  1.424e+07   0.000
## YELP4                                        3.948e+02  5.071e+07   0.000
## COMMUNICATIVE_ROLEEstablishment_Description  2.430e+01  2.547e+05   0.000
## COMMUNICATIVE_ROLEEstablishment_Name         2.329e+01  2.547e+05   0.000
## COMMUNICATIVE_ROLEGraffiti                  -1.301e+02  2.252e+07   0.000
## COMMUNICATIVE_ROLEInformation                2.568e+01  2.547e+05   0.000
## COMMUNICATIVE_ROLEInstructions              -1.092e+03  3.071e+07   0.000
## COMMUNICATIVE_ROLELeaflet                   -2.776e+02  4.703e+07   0.000
## COMMUNICATIVE_ROLESlogan                    -1.116e+01  3.092e+07   0.000
## COMMUNICATIVE_ROLEStreet_Signs               4.654e+01  1.937e+07   0.000
## MATERIALITYHand_Written                     -8.454e+01  4.152e+07   0.000
## MATERIALITYHome_Printed                     -8.813e+01  4.152e+07   0.000
## MATERIALITYPermanent                        -3.476e+02  4.277e+07   0.000
## MATERIALITYProfessionally_Printed           -2.691e+02  4.152e+07   0.000
## CONTEXT_FRAMEBakery                         -6.589e+02  6.794e+07   0.000
## CONTEXT_FRAMEBar                            -4.177e+02  6.957e+07   0.000
## CONTEXT_FRAMEBeauty_Hair_Salon              -4.185e+02  6.851e+07   0.000
## CONTEXT_FRAMEBusiness                       -2.529e+02  6.711e+07   0.000
## CONTEXT_FRAMECafe                           -4.759e+02  6.842e+07   0.000
## CONTEXT_FRAMEClothing                       -4.444e+02  6.929e+07   0.000
## CONTEXT_FRAMECommentary                     -4.744e+02  7.478e+07   0.000
## CONTEXT_FRAMEExternal                       -2.628e+02  6.711e+07   0.000
## CONTEXT_FRAMEFlier                           3.668e+02  9.538e+07   0.000
## CONTEXT_FRAMEGallery_Museum                  7.260e+02  6.711e+07   0.000
## CONTEXT_FRAMEGrocery_Liquor_Store           -5.364e+02  6.794e+07   0.000
## CONTEXT_FRAMEGym_Fitness_Studio             -3.441e+02  7.543e+07   0.000
## CONTEXT_FRAMEHardware                       -2.966e+02  8.248e+07   0.000
## CONTEXT_FRAMEHotel                          -2.207e+02  6.711e+07   0.000
## CONTEXT_FRAMEInstitution                    -4.852e+02  7.126e+07   0.000
## CONTEXT_FRAMEJewelry_Store                  -5.329e+02  7.048e+07   0.000
## CONTEXT_FRAMELandromat                      -5.956e+02  8.219e+07   0.000
## CONTEXT_FRAMEMenu                            1.656e+03  9.553e+07   0.000
## CONTEXT_FRAMEMovie_Theater                  -3.023e+02  7.194e+07   0.000
## CONTEXT_FRAMENightclub                       8.276e+02  7.817e+07   0.000
## CONTEXT_FRAMENotary_Financial_Services      -2.366e+02  7.449e+07   0.000
## CONTEXT_FRAMEResidential                    -2.834e+02  8.922e+07   0.000
## CONTEXT_FRAMERestaurant                     -5.356e+02  6.740e+07   0.000
## CONTEXT_FRAMEShop                           -2.492e+02  6.711e+07   0.000
## CONTEXT_FRAMESpecialty_Foods                -2.741e+02  7.066e+07   0.000
## CONTEXT_FRAMESupermarket                    -4.127e+02  6.711e+07   0.000
## CONTEXT_FRAMETravel_Agency                   3.185e+02  7.351e+07   0.000
## CLOSEDFALSE                                 -2.562e+02  6.711e+07   0.000
## CLOSEDTRUE                                  -1.812e+02  6.895e+07   0.000
## Topic2                                       1.122e+02  5.694e+03   0.020
## Topic3                                      -7.341e+01  1.513e+04  -0.005
## Topic4                                       2.010e+01  1.017e+04   0.002
## Topic5                                      -9.792e+01  1.166e+07   0.000
## Topic6                                      -1.909e+02  1.354e+07   0.000
## Topic7                                      -2.990e+02  1.544e+07   0.000
## Topic8                                      -4.571e+01  1.567e+04  -0.003
## Topic9                                      -3.349e+02  9.304e+06   0.000
## Topic10                                     -2.491e+02  1.086e+07   0.000
## Topic11                                      3.518e+02  3.937e+04   0.009
## Topic12                                     -1.031e+02  1.503e+07   0.000
## Topic13                                      1.268e+02  3.076e+07   0.000
## Topic14                                     -4.679e+02  3.017e+07   0.000
## Topic15                                      1.865e+02  3.031e+07   0.000
## Topic16                                     -1.588e+02  1.731e+07   0.000
## Topic17                                      2.371e+02  1.731e+07   0.000
## Topic18                                      2.497e+02  1.060e+07   0.000
## Topic19                                     -1.098e+00  1.019e+04   0.000
## Topic20                                      2.688e+02  3.471e+07   0.000
## Topic21                                      1.669e+02  3.945e+04   0.004
## Topic22                                      3.523e+02  9.783e+03   0.036
##                                             Pr(>|z|)
## (Intercept)                                    1.000
## YELP1                                          0.989
## YELP2                                          0.994
## YELP3                                          1.000
## YELP4                                          1.000
## COMMUNICATIVE_ROLEEstablishment_Description    1.000
## COMMUNICATIVE_ROLEEstablishment_Name           1.000
## COMMUNICATIVE_ROLEGraffiti                     1.000
## COMMUNICATIVE_ROLEInformation                  1.000
## COMMUNICATIVE_ROLEInstructions                 1.000
## COMMUNICATIVE_ROLELeaflet                      1.000
## COMMUNICATIVE_ROLESlogan                       1.000
## COMMUNICATIVE_ROLEStreet_Signs                 1.000
## MATERIALITYHand_Written                        1.000
## MATERIALITYHome_Printed                        1.000
## MATERIALITYPermanent                           1.000
## MATERIALITYProfessionally_Printed              1.000
## CONTEXT_FRAMEBakery                            1.000
## CONTEXT_FRAMEBar                               1.000
## CONTEXT_FRAMEBeauty_Hair_Salon                 1.000
## CONTEXT_FRAMEBusiness                          1.000
## CONTEXT_FRAMECafe                              1.000
## CONTEXT_FRAMEClothing                          1.000
## CONTEXT_FRAMECommentary                        1.000
## CONTEXT_FRAMEExternal                          1.000
## CONTEXT_FRAMEFlier                             1.000
## CONTEXT_FRAMEGallery_Museum                    1.000
## CONTEXT_FRAMEGrocery_Liquor_Store              1.000
## CONTEXT_FRAMEGym_Fitness_Studio                1.000
## CONTEXT_FRAMEHardware                          1.000
## CONTEXT_FRAMEHotel                             1.000
## CONTEXT_FRAMEInstitution                       1.000
## CONTEXT_FRAMEJewelry_Store                     1.000
## CONTEXT_FRAMELandromat                         1.000
## CONTEXT_FRAMEMenu                              1.000
## CONTEXT_FRAMEMovie_Theater                     1.000
## CONTEXT_FRAMENightclub                         1.000
## CONTEXT_FRAMENotary_Financial_Services         1.000
## CONTEXT_FRAMEResidential                       1.000
## CONTEXT_FRAMERestaurant                        1.000
## CONTEXT_FRAMEShop                              1.000
## CONTEXT_FRAMESpecialty_Foods                   1.000
## CONTEXT_FRAMESupermarket                       1.000
## CONTEXT_FRAMETravel_Agency                     1.000
## CLOSEDFALSE                                    1.000
## CLOSEDTRUE                                     1.000
## Topic2                                         0.984
## Topic3                                         0.996
## Topic4                                         0.998
## Topic5                                         1.000
## Topic6                                         1.000
## Topic7                                         1.000
## Topic8                                         0.998
## Topic9                                         1.000
## Topic10                                        1.000
## Topic11                                        0.993
## Topic12                                        1.000
## Topic13                                        1.000
## Topic14                                        1.000
## Topic15                                        1.000
## Topic16                                        1.000
## Topic17                                        1.000
## Topic18                                        1.000
## Topic19                                        1.000
## Topic20                                        1.000
## Topic21                                        0.997
## Topic22                                        0.971
## 
## Approximate significance of smooth terms:
##                         edf Ref.df Chi.sq p-value
## s(latitude,longitude) 12.76  13.06  0.002       1
## 
## R-sq.(adj) =  0.698   Deviance explained = 83.3%
## UBRE = -0.72869  Scale est. = 1         n = 700

# Odds ratios
# exp(coef(gamEQ))

We can see from these results a lot of information – what is significant, deviance explained, coefficients, etc. But it is also useful to plot probabilities.

# Plot probabilities? (Adapted from http://myweb.uiowa.edu/pbreheny/publications/visreg.pdf)
library(visreg)
# We will just look at those flagged as 'significant'
# Probability of English by coordinate
visreg2d(gamE, "longitude", "latitude", plot.type="image")

# Spanish
visreg2d(gamS, "longitude", "latitude", plot.type="image")

GAMs for Instagram only

Let’s look at GAMs within our social media data set.

emogam <- gam(I(emogrepl=="TRUE")~s(latitude,longitude, k=60) + V1.x, family=binomial,data=tweets)
concurvity(emogam)

##               para s(latitude,longitude)
## worst    0.9168534            0.09941700
## observed 0.9168534            0.02028529
## estimate 0.9168534            0.01043618

# Concurvity seems ok
gam.check(emogam)

## 
## Method: UBRE   Optimizer: outer newton
## full convergence after 3 iterations.
## Gradient range [5.004738e-08,5.004738e-08]
## (score -0.008444863 & scale 1).
## Hessian positive definite, eigenvalue range [0.0001851222,0.0001851222].
## Model rank =  81 / 81 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                           k'    edf k-index p-value
## s(latitude,longitude) 59.000 36.914   0.915       0

# Same issue with gam.check but will keep k on the lower side to avoid over fitting

# Results!
summary(emogam)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## I(emogrepl == "TRUE") ~ s(latitude, longitude, k = 60) + V1.x
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.51773    0.06987 -21.723  < 2e-16 ***
## V1.x2       -0.10551    0.10678  -0.988 0.323073    
## V1.x3        0.07772    0.10716   0.725 0.468282    
## V1.x4        0.01201    0.11055   0.109 0.913452    
## V1.x5        0.00548    0.11107   0.049 0.960649    
## V1.x6        0.18434    0.11212   1.644 0.100142    
## V1.x7        0.38492    0.11253   3.421 0.000625 ***
## V1.x8        0.11688    0.11947   0.978 0.327903    
## V1.x9       -0.01001    0.09944  -0.101 0.919822    
## V1.x10       0.24011    0.11187   2.146 0.031849 *  
## V1.x11       0.20051    0.12685   1.581 0.113942    
## V1.x12       0.16066    0.12227   1.314 0.188856    
## V1.x13       0.78768    0.11795   6.678 2.43e-11 ***
## V1.x14       0.07260    0.10874   0.668 0.504380    
## V1.x15       0.05380    0.12258   0.439 0.660701    
## V1.x16       0.27443    0.13654   2.010 0.044448 *  
## V1.x17       0.15745    0.10518   1.497 0.134416    
## V1.x18      -0.02877    0.13462  -0.214 0.830771    
## V1.x19       0.12386    0.12564   0.986 0.324213    
## V1.x20       0.09485    0.14550   0.652 0.514487    
## V1.x21       0.04794    0.15142   0.317 0.751550    
## V1.x22       0.09597    0.13291   0.722 0.470255    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df Chi.sq p-value    
## s(latitude,longitude) 36.91     46  193.8  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0158   Deviance explained = 1.87%
## UBRE = -0.0084449  Scale est. = 1         n = 16744

# Now Sparkles emoji
sparky <- grepl(paste(" SPARKLES "), tweets$text)
sparkyDF<-as.data.frame(sparky)
tweets$id <- 1:nrow(tweets)
sparkyDF$id <- 1:nrow(sparkyDF)
tweets <- merge(tweets,sparkyDF,by="id")

sparkygam <- gam(I(sparky=="TRUE")~s(latitude,longitude, k=60) + V1.x, family=binomial, data=tweets)
concurvity(sparkygam)

##               para s(latitude,longitude)
## worst    0.9168534           0.099417003
## observed 0.9168534           0.009350418
## estimate 0.9168534           0.010436183

gam.check(sparkygam)

## 
## Method: UBRE   Optimizer: outer newton
## full convergence after 3 iterations.
## Gradient range [7.38181e-08,7.38181e-08]
## (score -0.8937008 & scale 1).
## Hessian positive definite, eigenvalue range [0.0003198463,0.0003198463].
## Model rank =  81 / 81 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                           k'    edf k-index p-value
## s(latitude,longitude) 59.000 31.750   0.859    0.14

summary(sparkygam)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## I(sparky == "TRUE") ~ s(latitude, longitude, k = 60) + V1.x
## 
## Parametric coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.043769   0.308248 -16.363  < 2e-16 ***
## V1.x2        0.374535   0.421174   0.889 0.373860    
## V1.x3        0.521652   0.414222   1.259 0.207903    
## V1.x4        0.540606   0.421866   1.281 0.200031    
## V1.x5       -0.333582   0.542760  -0.615 0.538817    
## V1.x6        0.406962   0.453892   0.897 0.369929    
## V1.x7        0.610551   0.441826   1.382 0.167008    
## V1.x8        0.459427   0.469569   0.978 0.327876    
## V1.x9       -0.006097   0.429996  -0.014 0.988687    
## V1.x10      -0.073187   0.513893  -0.142 0.886751    
## V1.x11      -0.088493   0.588622  -0.150 0.880497    
## V1.x12       0.105301   0.543732   0.194 0.846439    
## V1.x13       1.363123   0.403352   3.379 0.000726 ***
## V1.x14       0.396858   0.430658   0.922 0.356781    
## V1.x15      -0.551737   0.654946  -0.842 0.399555    
## V1.x16      -1.265384   1.047887  -1.208 0.227217    
## V1.x17       0.066183   0.454785   0.146 0.884297    
## V1.x18      -0.010056   0.588844  -0.017 0.986375    
## V1.x19      -0.449025   0.657493  -0.683 0.494648    
## V1.x20       0.043034   0.656281   0.066 0.947718    
## V1.x21      -0.942557   1.047574  -0.900 0.368252    
## V1.x22      -0.214021   0.655523  -0.326 0.744055    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df Chi.sq  p-value    
## s(latitude,longitude) 31.75  40.41  86.64 3.35e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0116   Deviance explained = 7.43%
## UBRE = -0.8937  Scale est. = 1         n = 16744

Kate Lyons

Kate Lyons

Thesis Time!

Introduction

Getting the Data

Analyzing the Data

Preparing data for Topic Modeling and Sentiment Analysis

Frequency analysis and Sentiment analysis

Sentiment analysis

Word Cloud

Emojis

Mapping Emojis

Topic Modeling

LDA Tuning

Pairing this back with original tweets

Visualizing Topic Model Results

Visualizing the data

Matching tweets with LL data

GAMs!

Model Selection

Results

Multinomial Logistic Regression

GAMs!

GAM diagnostics

GAM results

Combining LL and Instagram Data

Model checking

Results

GAMs for Instagram only