Entity Analysis with ML/AI – Part 4

This is part 4 on analyzing news articles using ML/AI. Here is part 1, part 2, and part 3.

Recap

Previously on part 3 of this series, Google Cloud Natural Language detects expressive articles and assigns it with a higher magnitude with most of the news articles having a neutral tone.

In this article, I explore their entity detection service.

Entity Analysis

Google describes its entity analysis as such:

Entity Analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.), and returns information about those entities.

Gcloud entity returns a list of entities, types (i.e. PERSON) and its salience score. Salience score is a number between 0 to 1 describing how important or central the entity is. Read more from the official documentation.

Entity analyses returns many entities per article so I filter it to those with salience score greater than 0.1. I am only interested in entities that have significant context in the artcles.

Entity Analysis Article #1

I pick a news article that caught my eye, titled How Chicago Is Changing Theater, One Storefront at a Time. I have recently visited the beautiful city of Chicago but didn’t get a chance to watch any shows.

One entity that Google returns is Red Tape Theater with a category of ORGANIZATION and a salience score of 0.1361. This entity barely passes the 0.1 threshold. Google gave me some metadata and suggested a wikipedia link to Red Tape which is an article for excessive bureaucratic regulation. Google is wrong on that one. It cannot differentiate between the Red Tape Theater and Red Tape as an idiom.

Other entities are mentioned in this article such as WildClaw Theater, The Den, Firebrand Theater, Broken Nose Theater, First Floor Theater, and many more. Since most of those entities are briefly mentioned, the salience score is probably < 0.1. Considering how Red Tape Theater has a salience score of only 0.1361, it seems that not a single theater is central to this article. Instead, there are many theaters presented but Red Tape Theater is the most prominent one. That is evident when I read the article.

Towards the end, Red Tape Theater is mentioned and the next several paragraphs are dedicated to it. If I were to pick a theater to visit, I would go with the Red Tape Theater and see The Shipment. The second most prominent might be the Steppenwolf Theater Company but only one paragraph is dedicated to it.

Conclusion

This is another impressive feat by Google’s machine learning. It is able to detect the most prominent entity in the article. It knows this by relating several paragraphs to the Red Tape Theater.