As a group project, I practiced sentiment analysis on music streaming services reviews. Purpose of the project is to find insights from the reviews, and I focused on disclosing relationships between events of the services and user reactions. It can be software updates, offline events, or increasing membership fee. For checking this out, I wanted to see review trends by a word, period and sentiment. The reviews that I handled for the project are from Pandora, Spotify and Amazon music and the total sum of those is about 39,290 reviews
- Manual Scoring
- Resampling
- Extract Data
- Findings
- Limitations
Manual Scoring
For practicing sentiment analysis, I had to decide which lexicon(sentiment dictionary) should I use for the analysis. R provides lexicons such as ‘Bing’, ‘NRC’, and ‘AFINN’. I chose ‘bing’ for the project. This decision is based on comparisons between manual judgments and lexicon scores. The manual judgement datasets are about 1% of total reviews and were selected randomly. Here is the result of the comparison.
Spotify
Pandora
Amazon music
‘Bing’ determined the highest accuracy compared two lexicons on every service. Somehow, the conclusions on negative reviews were shallow, but we couldn’t find out a solution this time. (I assume this is related to the length of reviews and sarcasm)
Also, another interesting point of the result is neutral reviews. The proportion of neutral reviews are around 14% on every services review. I pondered how to resolve this because 14% of data is not small. Then, I decided to deal this with probability, because the sentiment trend of keywords is the only thing that I want to know. I am not interested in each review.
Resampling
I have 1% of manually scored reviews. Now, let’s think this as a population and practice resampling. When we have enough number of samples, we can extract samples from it and infer a normal distribution. So, I extract 50 reviews randomly from the manually scored samples multiple times and deduce proportions of positive and negative in neutral reviews.
Here is an R code for resampling and a result of Amazon music.
library(tidytext)
library(readr)
library(dplyr)
library(anytime)
# Load manually scored review files here. ("../service_scored.csv")
service <- read_csv("https://raw.githubusercontent.com/seonnyseo/Streaming_Sentiment_Analysis/master/Data/amazon_scored.csv")
# I decided to consider ambiguous reviews as negative
service$score <- ifelse(service$score == 0, -1, 1)
# Split reviews by each word
tidy_service <- service %>% unnest_tokens(word, review)
bing_sentiments <- tidy_service %>% inner_join(get_sentiments("bing"), by = "word")
bing_sentiments$score <- ifelse(bing_sentiments$sentiment == "negative", -1, 1)
bing_aggregate <- bing_sentiments %>%
select(review_id, score) %>%
group_by(review_id) %>%
summarise(bing_score = sum(score))
score_compare_service <- merge(x = service, y = bing_aggregate, all.x = TRUE, by = 'review_id')
score_compare_service[is.na(score_compare_service)] <- 0
# score_compare_service(review_id, date, review, score, bing_score)
# Pick 50 each time
resampling <- function(service){
neutral_average <- 0
postive_average <- 0
negative_average <- 0
for(i in c(2000:3000)){
set.seed(i)
random_data <- service[sample(nrow(service), 50),]
neutral_count <- sum(random_data$bing_score == 0)
positive_neutral <- sum(random_data$bing_score == 0 & random_data$score == 1)
negative_neutral <- sum(random_data$bing_score == 0 & random_data$score == -1)
neutral_average <- neutral_average + neutral_count/50
postive_average <- postive_average + positive_neutral/neutral_count
negative_average <- negative_average + negative_neutral/neutral_count
}
cat(sprintf("Neutral : %.3f Positive : %.3f Negative : %.3f\n",
neutral_average/1000, postive_average/1000, negative_average/1000))
}
# Run resampling
resampling(score_compare_service)
## Neutral : 0.140 Positive : 0.710 Negative : 0.291
This is the results of resampling. (Neutral / (Positive|Neutral) / (Negative|Neutral) # Pandora 0.14 0.54 0.46 # Spotify 0.13 0.59 0.41 # Amazon 0.14 0.70 0.30
Extract Data
My purpose is to see a change in trend of sentiment by keywords. Thus, expected outcome is a graph that shows time period in x axis and quantity of reviews by sentiment in y axis. R code is composed of three functions to implement a graph.
- Pre-Process
- Extract reviews
- Create a graph
Pre-Process
pre_process <- function(service)
{
# PreProcessing
service$review_id <- 1:nrow(service)
service$date <- anydate(service$date)
service <- service[c(3,1,2)]
# Sentiment Score
tidy_service <- service %>% unnest_tokens(word, review)
# Edit Dictionary, I only add 4 words with sentiment at this time. This can be expanded later.
bing_edit <- rbind(get_sentiments("bing"), c("commercial", "negative"))
bing_edit <- rbind(bing_edit, c("commercials", "negative"))
bing_edit <- rbind(bing_edit, c("ad", "negative"))
bing_edit <- rbind(bing_edit, c("ads", "negative"))
bing_edit <- rbind(bing_edit, c("wish", "negative"))
bing_sentiments <- tidy_service %>% inner_join(bing_edit, by = "word")
bing_sentiments$score <- ifelse(bing_sentiments$sentiment == "negative", -1, 1)
bing_aggregate <- bing_sentiments %>% select(review_id, score) %>% group_by(review_id) %>% summarise(bing_score = sum(score))
service <- merge(x = service, y = bing_aggregate, all.x = TRUE, by = 'review_id')
service[is.na(service)] <- 0
service$bing_judgement <- ifelse(service$bing_score > 0, "positive",
ifelse(service$bing_score < 0, "negative", "neutral" ))
return(service)
}
Pre-Process function is called only once for each service dataset. On this step, the function cleans data and judge sentiment of reviews based on bing lexicon. Also, I edited bing dictionary by adding some words are not included in the dictionary such as ‘commercials’, ‘ad’, and ‘wish’. Those words connote negative emotion in many reviews, so I intend to treat these as negative words.
So, dataset acquires sentiment score column after pass this function.
Extract reviews
word_data
word_data <-function(service, start, end, word, sentiment){
word <- tolower(word)
# Filter Data between start date & end date
extracted <- service[service$date >= start & service$date <= end,]
# Filter Date that only contains word
extracted <- extracted[grepl(word, tolower(extracted$review)),]
set.seed(101)
# Neutral / (Positive|Neutral) / (Negative/Neutral)
# Pandora 0.14 0.54 0.46
# Spotify 0.13 0.59 0.41
# Amazon 0.14 0.70 0.30
ifelse(service$service == "pandora", positive_weight <- 0.54,
ifelse(service$service == "spotify", positive_weight <- 0.59, positive_weight <- 0.70))
neutral_reviews <- extracted[extracted$bing_judgement == "neutral",] %>% select(review_id)
positive_neutral <- neutral_reviews[sample(nrow(neutral_reviews), nrow(neutral_reviews) * positive_weight),]
negative_neutral <- neutral_reviews[!(neutral_reviews$review_id %in% positive_neutral),]
extracted$bing_judgement <- ifelse(extracted$review_id %in% positive_neutral, "positive",
ifelse(extracted$review_id %in% negative_neutral, "negative",
extracted$bing_judgement))
extracted <- extracted[extracted$bing_judgement == sentiment,]
return(extracted)
}
This function does two operations. This function extracts data by service, period, word, and sentiment from pre-processed raw dataset. As I mentioned above, the extracted dataset should includes around 14% of neutral reviews. So, the function handles neutral reviews with estimated probability from resampling results. It splits neutral reviews as positive or negative randomly follow the weights.
frequnecy_month
frequency_month <- function(service, start, end, word, sentiment){
extracted <- word_data(service, start, end, word, sentiment)
# Make year-month column
extracted$year_month <- anydate(format(as.Date(extracted$date), "%Y-%m"))
frequency_df <- extracted %>% group_by(year_month) %>% summarise(frequency = n())
frequency_df <- frequency_df %>% pad(interval = 'month', start_val = anydate(start), end_val = anydate(end))
frequency_df[is.na(frequency_df)] <- 0
return (frequency_df)
}
I only want to know frequency of reviews each months. This function counts how many reviews were posted in a certain period by keywords.
word_graph
word_graph <- function(service, word, start, end){
positive = frequency_month(service, start, end, word, "positive")
negative = frequency_month(service, start, end, word, "negative")
positive$sentiment <- 'positive'
negative$sentiment <- 'negative'
frequency_df <- positive %>% full_join(negative)
ret <- ggplot(frequency_df, aes(x = year_month)) +
geom_line(aes(y = frequency, col = sentiment)) +
theme(axis.title.x=element_blank())
return(ret)
}
User calls this function to see results.