In this article, I explore their entity detection service.
Google describes its entity analysis as such:
Entity Analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.), and returns information about those entities.
Gcloud entity returns a list of entities, types (i.e. PERSON) and its salience score. Salience score is a number between 0 to 1 describing how important or central the entity is. Read more from the official documentation.
Entity analyses returns many entities per article so I filter it to those with salience score greater than
0.1. I am only interested in entities that have significant context in the artcles.
Entity Analysis Article #1
I pick a news article that caught my eye, titled How Chicago Is Changing Theater, One Storefront at a Time. I have recently visited the beautiful city of Chicago but didn’t get a chance to watch any shows.
One entity that Google returns is
Red Tape Theater with a category of
ORGANIZATION and a salience score of
0.1361. This entity barely passes the
0.1 threshold. Google gave me some metadata and suggested a wikipedia link to Red Tape which is an article for excessive bureaucratic regulation. Google is wrong on that one. It cannot differentiate between the
Red Tape Theater and
Red Tape as an idiom.
Other entities are mentioned in this article such as
Broken Nose Theater,
First Floor Theater, and many more. Since most of those entities are briefly mentioned, the salience score is probably <
0.1. Considering how
Red Tape Theater has a salience score of only
0.1361, it seems that not a single theater is central to this article. Instead, there are many theaters presented but
Red Tape Theater is the most prominent one. That is evident when I read the article.
Towards the end,
Red Tape Theater is mentioned and the next several paragraphs are dedicated to it. If I were to pick a theater to visit, I would go with the
Red Tape Theater and see The Shipment. The second most prominent might be the
Steppenwolf Theater Company but only one paragraph is dedicated to it.
This is another impressive feat by Google’s machine learning. It is able to detect the most prominent entity in the article. It knows this by relating several paragraphs to the
Red Tape Theater.