News Sentiment with ML/AI – Part 3

This is part 3 on evaluating news sentiment using ML/AI. Here is part 1 and part 2.

Recap

Previously on part 2 of this news analysis series, Google Cloud Natural Language was able to detect the emotional level and its sentiment. We looked at a CNBC article reviewing the iPhone and it is an emotional piece.

Let’s look at more articles and its analyses.

CNBC With Higest Positive Score

The CNBC news article with the highest positive score is titled: Victoria Beckham on juggling a fashion brand with family life: ‘I just do the best I can’.

It has a score of 0.4 with a magnitude of 6.4. The average score for all CNBC articles are -0.051443570457 with a magnitude of 5.6818897663138.

The article’s tone is positive as it presents hope for working women with a family. Here are some quotes:

When you’re a working mum, you feel torn, you feel guilty, but I just do the best that I can do. My kids and (soccer star husband) David will always come first.

that’s why we need to support each other, first and foremost.”

“I’m doing the best I can creatively, as a wife, as a mum

CNBC With Lowest Positive Score

The CNBC news article with the lowest score is titled: Mattis relationship with Trump reportedly frays as a decision on his fate looms, but White House dismisses.

It has a score of -0.699999988079071 with a magnitude of 2.0999999046325684. The average score for all CNBC articles are -0.051443570457 with a magnitude of 5.6818897663138.

The score is negative which means that is correlated to negative emotion. The article describes President Trump’s soured relationship with Secretary of Defense James Mattis. Here are are some quotes:

The relationship between President Donald Trump and Secretary of Defense James Mattis may have “soured” to the point of no return

Trump is … resentful of unflattering comparisons between the two men, the publication reported.

…the president is reportedly looking to replace the four star general…

NYTimes With Highest Positive Score

This NYTimes article has the highest score: 20 Wines Under $20: When Any Night Can Be a Weeknight.

It has a score of 0.4 with a whopping magnitude of 63.70000076293945. The average score for all NYTimes articles are -0.03707533304 with a magnitude of 18.0847858237.

The article is slightly positive but does have a high magnitude of 63.7. After reading through parts of the article, it is written with expressive and descriptive words. Here are some quotes:

Greatness in a wine is not solely a measure of complexity or profundity.

…represents a people and a culture and a love of wine, then a few extra dollars is a worthwhile investment.

But good ones, like this wine from La Staffa, grown in the Castelli di Jesi region in the northern Marche near the Adriatic, reawaken curiosity.

NYTimes With Lowest Score

This NYTimes article has the lowest score: Myanmar’s ‘Gravest Crimes’ Against Rohingya Demand Action, U.N. Says.

It has a score of -0.5 with a magnitude of 10.69999980926514. The average score for all NYTimes articles are -0.03707533304 with a magnitude of 18.0847858237.

The article is slightly negative as it details Myanmar army’s crime against a muslim minority group in Rakhine. It is written in a somber tone about a grave injustice. Here are some quotes:

…“the gravest crimes under international law”…

…troops shot some of the children and snatched infants from their mothers, throwing some into the river to drown while tossing others onto a fire, …

“The killing of civilians of all ages, including babies, cannot be argued to be a counterterrorism measure…“

Conclusion

After looking at both articles from CNBC and NYTimes, I am further impressed by Google’s ability to determine human expression. CNBC has an average magnitude of 5.68 compared to NYTimes 18.08. CNBC uses more common words whereas NYTimes’ articles are written with expressive and descriptive words. It is fascinating that an algorithm can make that distinction.

Articles from both sites are written fairly neutral. Depending on the content, some are slightly positive while others are slightly negative. I do not see exaggeration from both sides.

Article Sentiment Through AI

This is part 1 on evaluating sentiments using ML/AI of news articles.

This post builds on work from last week as I explore news articles with ML/AI. To recap, I aggregated the top news from CNBC and NYTimes and calculated their overall sentiment score. However, since all the news articles are combined together, there is no way to evaluate them individually.

In this post, I will examine the individual article’s sentiment.

Methodology

Last week I use AWS Comprehend; however, this week I will using Google Cloud Natural Language.

Why the change?

Because of AWS’s limitations. According to their guidelines and limits, the maximum size for sentiment detection is 5KB. That is a mere 2,500 words!If an article goes over 2500 words, I have to split them, and I have to analysis separately. Then, I need to weigh them appropriately and did a final calculation. I am lazy so I seek a better solution. I found it with Google Cloud Natural Language

Note to businesses: This is a reason why customers switch to a different service.

Google Cloud Natural Language

Google Cloud Natural Language derive insights from unstructured text using Google maching learning

Google’s sentiment analysis is less specific than AWS. Google provides two values: magnitude and score. A score of 0 is neutral. A score of less than 0 is considered to have negative emotion and a score that is greater than 0 is considered to have positive emotion. The magnitude indicates the level of emotional content. Pretty vague but let’s take a look at some samples. You can read more about Google’s sentiment analysis here.

Results

Enough theory! Let’s analyze some examples of the output and see if they make sense.

Here is the results for the average magnitude and score for CNBC and NYTimes:

source avg(gcloud_magnitude) avg(gcloud_score)
cnbc 4.656716405678151260447761194030 -0.082835822463480393880597014925
nytimes 14.884188043510812147863247863248 -0.060256411440861528376068376068

NYTimes is more emotional based on its average gcloud_magnitude score, 4.65 vs 14.88. The sentiment score for both is very close to 0 so they are both neither positive or negative.

From last article’s analysis, CNBC has a 92% and NYTimes has a 87% probability of being neutral respectively. Both AWS and Google seems to agree that the sentiments are most likely neutral.

Individual Article Analysis

Here is the focus point of this article, let’s evaluate one of the articles.

CNBC With Higest Magnitude
url gcloud_magnitude gcloud_score
https://www.cnbc.com/2018/09/18/iphone-xs-and-iphone-xs-max-review.html 30.3999996185302700 0.2000000029802322

This CNBC article reviews Apple’s new iPhones and it generates a magnitude score of 30.39, much higher than the average score of 4.65 for CNBC.

I read the article and there are phrases that indicates how emotional dramatic the article is written in. Here are some quotes:

They’re the best phones Apple has ever made.

The iPhone X, even a year later, is still arguably the best phone on the market.

It’s one of the best screens on the market…

The speakers sound awesome.

I love how shiny it is on the new gold and white models.

I love that iOS 12 gives you so much more control over notifications.

…these are the best phones Apple has made…

Judging from some of these statements, I can understand why Google’s algorithm gives it a high magnitude score. It’s full of dramatic adjectives like best and love.

Conclusion

The CNBC article reviewing Apple’s new iPhoneXS do seem dramatic and emotional. It am pretty impressed that Google Cloud Natural Language can understand that. In part 2, I will dive deeper into articles that have low and high scores from both sources.

Analyzing news sources with Amazon AI/ML

I will be analyzing news sources with Amazon AI/ML and how positive or negative they are.
Why did I decided to explore this?
The are a few reasons.
One, I wanted more positivity in my life.
Two, I believe there are still good things happening around the world today.
Three, AI/ML is interesting and I wanted to learn more about it and use it to get positive results!
I started looking at the top news stories from CNBC, Buzzfeed, and NYTimes.
I read CNBC because I am interested in economics and how our financial markets are doing.
I included Buzzfeed and NYTimes because according to Wibbitz blog, Buzzfeed and NYTimes are the two most visited sites by millennials. [1]
I architected a system that automatically fetches the top news from each of these sites, scrape the content, and ran it through Amazon’s sentiment analysis.
Amazon sentiment analysis is a web service that uses ML/AI to determine the sentiment of some texts.
You send it some text and they tell you the positive, negative, neutral, and mixed score.
It also returns an overall emotion state of either positive, negative, or neutral.
An example they give is to use sentiment analysis is to use it on comments of a blog to determine if your readers liked the post. [2]
So far I have analyzed 1,030 articles and here are the results:
CNBC have a positive score of .025, a negative score of .039, a neutral score of .92, and a mixed score of .014.
NYTimes have a positive score of .055, a negative score of .044, a neutral score of .87, and a mixed score of .022.
I have the results for buzzfeed but after  reviewing the values, my code was not parsing all the pages in buzzfeed correctly so the results may be incorrect. I will be working on it and get an update soon.
So, what does this mean?
The Amazon AI sentiment analysis tells you the probability that something is either positive, negative, neutral or mixed.
Supposedly, you can take these numbers and convert it into a probability that something is positive.
I have asked on the Amazon forums and got an answer from a rep at Sep 15,2018 that the example is outdated.
However, my instinct is that each number represents the percentage of each sentiment.
For example, when you add all the positive, negative, neutral, and mixed score, you get 1.
Therefore, each of the number of a probability percentage of it being that sentiment.
In this case, CNBC has a 92% probability of being neutral and NYTimes has 87%.
NYTimes positive score is 5.5% vs CNBC 2.5%.
This may sound good but is it better for a news article to be neutral or positive?
These are questions that I love to explore.
Since I combine all the articles into a single number, I do not know which articles are deemed positive or negative.
Next time, I will break it down per article and let’s see how accurate the sentiment analysis is.