Introduction to Event Registry Python SDK¶
This page will give you an introduction on how to use the Python SDK that will allow you to search for articles and events provided by NewsApi.ai. It describes how to install it, import it and how to use the most relevant classes in order to search for news articles or events.
Please also note that you can interactively try and modify this Jupyter notebook here.
Registering for a free account¶
In order to use the SDK you'll need your own API key. To get it, please register for a free account. The free account will give you access to 2.000 tokens for free. After you use them, you'll need to subscribe to a paid plan to continue using the service. You'll be able to monitor your token usage on your dashboard.
Importing Event Registry module¶
In order to use Event Registry, you have to import the module called eventregistry
. To install the module open the command line prompt and call
>>> pip install eventregistry
To start using the module, you then first need to import it:
from eventregistry import *
import json, os, sys
There is one main class that interacts with Event Registry service and it is called EventRegistry
. The class accepts an input parameter apiKey
, which you will need to provide in order to make more than a trivial number of requests.
er = EventRegistry(apiKey = "YOUR_API_KEY", allowUseOfArchive=False)
The additional parameter that I've provided in the constructor is the allowUseOfArchive
, which I use to state that I only want to perform the searches on the data published less than 1 month ago. If you need to search older content too, please remove that parameter (note that free users don't have access to archive, regardless of this parameter value).
A few example queries¶
To show some examples, we are providing here a small list of example queries. All the details about the queries and different properties used will be described in the individual sections below.
Ex 1: Getting the most relevant articles about Donald Trump or Boris Johnson written by New York Times on the topic of Business:
q = QueryArticlesIter(
keywords = QueryItems.OR(["Donald Trump", "Boris Johnson"]),
sourceUri= er.getSourceUri("New York Times"),
categoryUri = er.getCategoryUri("Business"))
print("Number of results: %d" % q.count(er))
for art in q.execQuery(er, sortBy = "rel", maxItems = 2):
print(json.dumps(art, indent=4))
Ex 2: The list of latest articles in Chinese or Arabic articles about Apple:
q = QueryArticlesIter(
conceptUri = er.getConceptUri("Apple"),
lang = QueryItems.OR(["ara", "zho"]))
for art in q.execQuery(er, sortBy = "date", maxItems = 2):
print(json.dumps(art, indent=4))
Ex 3: Largest recent events on the topic of Brexit:
q = QueryEventsIter(keywords = "Brexit")
for event in q.execQuery(er, sortBy = "size", maxItems = 1):
print(json.dumps(event, indent=4))
Auto-suggestion methods¶
Several API calls accept parameters that are unique identifiers - examples of such parameters are concepts, categories and sources. If you just know a pretty name or a label of such parameter, then you can use the auto-suggest methods to obtain the unique identifier for the parameter.
If you know that there is a category for Investing, then you can get the URI for it like this:
er.getCategoryUri("investing")
Similarly, if you want to filter based on the sources, then you can get the source URI by providing the source name or the domain name:
print(er.getSourceUri("new york times"))
print(er.getSourceUri("nytimes"))
For concepts, the URIs are URLs of the corresponding Wikipedia pages.
er.getConceptUri("Obama")
The autosuggestion works even for the company tickers:
er.getConceptUri("AAPL")
Searching for articles¶
There are two classes that can be used for searching for articles - QueryArticlesIter
and QueryArticles
. Use QueryArticlesIter
when you simply want to download articles matching a query. QueryArticles
can instead be used when you need to download various summaries of the results, like top concepts, top sources, top authors, etc.
Both classes allow you to specify in the constructor several filters, such as:
keywords
- find articles that mention the keywords or phrasesconceptUri
- find articles that mention the concept(s)categoryUri
- find articles that are about one or more categorieslang
- find articles written in the given languagedateStart
- find articles that were written on the given date or later (in theYYYY-MM-DD
format)dateEnd
- find articles that were written before or on the given date (in theYYYY-MM-DD
format)sourceUri
- find articles written by the given publisher(s)sourceLocationUri
- find articles written by publishers located in the given location (city or country)authorUri
- find articles written by the given author(s)locationUri
- find articles that mention the given location in the article datelinekeywordsLoc
- if keywords are provided, where should we search for the keyword (title
orbody
(default))minSentiment
,maxSentiment
- min and max value of the sentiment (from -1 to 1)startSourceRankPercentile
- starting percentile rank of the sources to consider in the results (default: 0). Value should be in range 0-90 and divisible by 10.endSourceRankPercentile
- ending percentile rank of the sources to consider in the results (default: 100). Value should be in range 10-100 and divisible by 10.ignoreKeywords
,ignoreConceptUri
,ignoreCategoryUri
, ... - from the articles that match the rest of the conditions, exclude the articles that match any of the provided filtersdataType
- which data types should be included in the results -news
(default),blog
orpr
When multiple filters are specified, the results have to match all of the provided filters. For example, when keywords and sources are specified, the results will be articles written by these sources that mention the provided keywords.
If you'll want to make a search, where any of the specified filtes are true, you'll have to use the Advanced Query Language
Executing a search¶
When you create an instance of a QueryArticlesIter
class, you can then retrieve the resulting articles by calling the execQuery
method. The execQuery
method will iterate over the resulting articles, so you can simply use it in a for
loop. In the method call you also need to provide the instance of your EventRegistry
class since it will be used to iteratively download the matching articles in batches of 100 items per call.
q = QueryArticlesIter(keywords="Tesla")
for art in q.execQuery(er, sortBy = "date", maxItems = 300):
print(art)
Please note two important parameters:
sortBy
parameter determines how the articles should be sorted before they are retrieved. Beside the date, you can also sort by relevance, source importance, shares on social media and others.maxItems
parameter determines how many of the matching articles to retrieve before thefor
loop finishes. It is very important that you set this parameter if you don't want to download all matching results.
Please check the documentation page to see the full list of parameters and their descriptions related to the execQuery
method.
Using QueryItems.AND()
and QueryItems.OR()
when providing a list of filters of the same type¶
When you want to provide several keywords, concepts, categories, etc., you have to explicitly determine whether you'd like that the results mention all of them, or any of them.
To do that, you can use the QueryItems.AND()
and QueryItems.OR()
methods
q = QueryArticlesIter(keywords = QueryItems.OR(["Samsung", "Apple", "Google"]))
print("Count with any of the companies: %d" % q.count(er))
q = QueryArticlesIter(keywords = "Samsung")
print("Count mentioning Samsung: %d" % q.count(er))
Retrieving different properties about articles¶
When retrieving articles, you can retrieve a lot of properties. Some properties are not returned by default, such as list of mentioned concepts, categories, links, videos, etc.
To modify which properties to return, use specify the returnInfo
parameter of type ReturnInfo
. With ReturnInfo
you can specify which parameters will be returned for all available returned objects, like articles, concepts, categories, events, ...
QueryArticlesIter(..., returnInfo = ReturnInfo(...))
The detailed description of ReturnInfo
and available parameters are described here.
ReturnInfo(
articleInfo = ArticleInfoFlags(), # details about the articles to return
eventInfo = EventInfoFlags(), # details about the events to return
sourceInfo = SourceInfoFlags(), # details about the news sources to return
categoryInfo = CategoryInfoFlags(), # details about the categories to return
conceptInfo = ConceptInfoFlags(), # details about the concepts to return
locationInfo = LocationInfoFlags(), # details about the locations to return
storyInfo = StoryInfoFlags(), # details about the stories to return
conceptClassInfo = ConceptClassInfoFlags(), # details about the concept classes to return
conceptFolderInfo = ConceptFolderInfoFlags()) # details about the concept folders to return
An example query that will return list of concepts, categories, source location, and a list of potential duplicates of the article:
q = QueryArticlesIter(keywords = "Trump", sourceUri = "nytimes.com")
for art in q.execQuery(er,
sortBy = "date",
maxItems = 1,
returnInfo = ReturnInfo(
articleInfo=ArticleInfoFlags(concepts=True, categories=True, duplicateList=True, location=True),
sourceInfo=SourceInfoFlags(location=True, image=True)
)):
print(json.dumps(art, indent=4))
Creating complex queries¶
In some cases you might want to create a query, that cannot be created using the simple QueryArticlesIter constructor. An example of such query would be:
Give me articles that are on the topic of business or mention Tesla Inc.
Keep in mind that creating a query
QueryArticlesIter(
conceptUri = er.getConceptUri("Tesla"),
categoryUri = er.getCategoryUri("business"))
would return articles on the topic of business and mention Tesla Inc.
So how can we create a correct query? In such cases you have to look into the Advanced Query Language.
Using this language, the correct above query would look like this:
qStr = """
{
"$query": {
"$or": [
{ "conceptUri": "%s" },
{ "categoryUri" : "%s"}
]
}
}
""" % (er.getConceptUri("Tesla"), er.getCategoryUri("business"))
print(qStr)
q = QueryArticlesIter.initWithComplexQuery(qStr)
for art in q.execQuery(er, maxItems = 1):
print(art)
A more complex example could look something like this:
qStr = """{
"$query": {
"$or": [
{ "conceptUri": "http://en.wikipedia.org/wiki/Artificial_Intelligence" },
{
"keyword": {
"$and": [ "deep learning", "machine learning" ]
}
}
],
"$not": {
"keyword": "data mining",
"keywordLoc": "title"
}
},
"$filter": {
"dataType": ["news", "blog"],
"isDuplicate": "skipDuplicates",
"startSourceRankPercentile": 0,
"endSourceRankPercentile": 30,
"minSentiment": 0.2
}
}"""
q = QueryArticlesIter.initWithComplexQuery(qStr)
for art in q.execQuery(er, maxItems = 1, sortBy = "date", returnInfo = ReturnInfo(articleInfo = ArticleInfoFlags(bodyLen=300))):
print(json.dumps(art, indent=4))
Retrieving summaries of search results¶
The QueryArticlesIter
class is great for obtaining the list of articles that match a certain criteria. In some cases, you want, however, to obtain a summary of search results. Examples of such summaries that can be obtained are the list of top mentioned concepts, top keywords, timeline of the results, and top news sources.
We call such summaries aggregates and in order to obtain them, you have to use the QueryArticles
class. QueryArticles
class accepts the same arguments in the constructor, except that it also accepts an argument requestedResult
. The requestedResult
argument can be an instance of any of these classes:
RequestArticlesInfo
- use to retrieve a list of articlesRequestArticlesUriWgtList
- returns a long list of article urisRequestArticlesTimeAggr
- returns the time distribution of search resultsRequestArticlesConceptAggr
- returns the top concepts mentioned in the search resultsRequestArticlesKeywordAggr
- returns the top keywords matching the search resultsRequestArticlesCategoryAggr
- returns the top categories matching the search resultsRequestArticlesSourceAggr
- returns the top news sources that authored the search resultsRequestArticlesConceptGraph
- returns which top mentioned concepts frequently co-occur with other conceptsRequestArticlesDateMentionAggr
- returns which dates are frequently mentioned in the search results
In addition, to execute the search using the QueryArticles
class, you call the execQuery
method on the EventRegistry
class:
er.execQuery(q)
An example looks like this:
q = QueryArticles(
conceptUri = er.getConceptUri("tesla"),
requestedResult = RequestArticlesTimeAggr())
res = er.execQuery(q)
print(json.dumps(res, indent=4))
q = QueryArticles(
conceptUri = er.getConceptUri("tesla"),
requestedResult = RequestArticlesConceptAggr())
res = er.execQuery(q)
print(json.dumps(res, indent=4))
Searching for events¶
Events are collections of articles for which we automatically identify that they discuss the same thing that happened. Examples of events include the launch of the new iPhone on Sept 10, 2019, Trump firing John Bolton as national security adviser, Nissan's CEO resigning, etc.
Searching for events is very similar to searching for articles. There are two main classes available to do the search - QueryEventsIter
and QueryEvents
.
You should use QueryEventsIter
in order to retrieve the list of events that match a certain set of conditions.
QueryEvents
class should be used to obtain various kinds of summaries about the events that match the search conditions.
Both classes allow you to specify in the constructor several filters, such:
keywords
- find events that mention the keywords or phrasesconceptUri
- find events that mention the concept(s)categoryUri
- find events that are about category(s)sourceUri
- find events covered by the given publisher(s)sourceLocationUri
- find events covered by publishers located in the given locationauthorUri
- find events written by the given author(s)locationUri
- find events that mention the given location in the datelinelang
- find events reported in the given language(s)dateStart
- find events that occurred on the given date or later (in theYYYY-MM-DD
format)dateEnd
- find events that occurred before or on the given date (in theYYYY-MM-DD
format)keywordsLoc
- if keywords are provided, where should we search for the keyword (title
orbody
(default))minSentiment
,maxSentiment
- min and max value of the sentiment (from -1 to 1)minArticlesInEvents
,maxArticlesInEvent
- limit events to only those that have been covered by a certain number of articlesstartSourceRankPercentile
- starting percentile of the sources that should cover the event (default: 0). Value should be in range 0-90 and divisible by 10.endSourceRankPercentile
- ending percentile of the sources that should cover the event (default: 100). Value should be in range 10-100 and divisible by 10.ignoreKeywords
,ignoreConceptUri
,ignoreCategoryUri
, ... - from the events that match the rest of the conditions, exclude those that match any of the provided filters
When multiple filters are specified, the results have to match all of the provided filters. For example, when keywords and sources are specified, the results will be events covered by these sources that mention the provided keywords.
If you'll want to make a search, where any of the specified filtes are true, you'll have to use the Advanced Query Language.
An example query should look like this:
q = QueryEventsIter(conceptUri = er.getConceptUri("Apple"))
for event in q.execQuery(er, sortBy = "size", maxItems = 1):
print(json.dumps(event, indent = 4))
Retrieving a list of articles about an event¶
In order to retrieve a list of articles that discuss a single event, you can use the QueryEventArticlesIter
class. The class requires that you provide the eventUri
value, which is the unique id of the event. Additionally, you can also specify additional constraints that determine which subset of articles about the event to retrieve:
lang
- in which language should be the articleskeywords
- which keywords should be mentioned in the articlesconceptUri
- which concepts should be mentioned in the articlescategoryUri
- which category should be assigned to the articlessourceLocationUri
- what should be the source of the publisherauthorUri
- who should be the author of the articleslocationUri
- which location should be mentioned in the dateline of the articledateStart
- on which date or after should the articles be publisheddateEnd
- before or on which date should the articles be publishedkeywordLoc
- if keywords are set, where should we search for them (body
(default) ortitle
)startSourceRankPercentile
- what is the minimum source rank of the returned articles (0 - 90, divisible by 10)endSourceRankPercentile
- what is the maximum source rank of the returned articles (10 - 100, divisible by 10)minSentiment
- minimum sentiment of the articles (between -1 and 1)maxSentiment
- maximum sentiment of the articles (between -1 and 1)
An example for the event above could be:
q = QueryEventArticlesIter("eng-5059598",
lang = "eng",
sourceLocationUri = er.getLocationUri("United states"),
minSentiment = 0.2,
endSourceRankPercentile = 30)
for art in q.execQuery(er, maxItems = 2, returnInfo = ReturnInfo(articleInfo = ArticleInfoFlags(bodyLen=300))):
print(art)
Searching for events using complex queries¶
As with the article search, event search using QueryEventsIter
also allows you just to narrow down the set of matching events, with each added filter. If you want to create a more complex query that has a Boolean OR between two different types of filters, you have to use the Advanced Query Language.
The syntax is the same as when searching for articles. An example of such a query could look like this:
qStr = """{
"$query": {
"$or": [
{ "locationUri": "%s" },
{
"categoryUri": "%s",
"conceptUri": "%s"
}
]
}
}""" % (er.getLocationUri("Washington"), er.getCategoryUri("politics"), er.getConceptUri("Trump"))
print(qStr)
q = QueryEventsIter.initWithComplexQuery(qStr)
for event in q.execQuery(er, sortBy = "size", maxItems = 1):
print(json.dumps(event, indent = 4))
Retrieving summaries of search results¶
In addition to obtaining a list of events that match the search results, you can also obtain various summaries of search results. In order to obtain some summary about events that match your search criteria, you have to use the QueryEvents
class. The class accepts the same filtering parameters as the QueryEventsIter
class, but in addition also accepts the requestedResult
parameter, which should be set to one of the following values:
RequestEventsInfo
- returns a list of eventsRequestEventsUriWgtList
- returns a long list of event URIs that match search resultsRequestEventsTimeAggr
- retrieves a time distribution of events in search resultsRequestEventsKeywordAggr
- retrieves top keywords in the events that match search conditionsRequestEventsLocAggr
- retrieves the locations where the events happenedRequestEventsLocTimeAggr
- retrieves the locations and times when the events happenedRequestEventsConceptAggr
- retrieves the top concepts mentioned in the eventsRequestEventsConceptGraph
- retrieves the top concepts and their co-occurrencesRequestEventsSourceAggr
- retrieves the top sources in the eventsRequestEventsDateMentionAggr
- retrieves the top dates mentioned in the eventsRequestEventsCategoryAggr
- retrieves the top categories in the events
# what is being mentioned the most in the events about China and US?
q = QueryEvents(
conceptUri = QueryItems.AND([er.getConceptUri("China"), er.getConceptUri("United States")]),
requestedResult = RequestEventsConceptAggr())
res = er.execQuery(q)
print(json.dumps(res, indent=4))
# what are the top categories in recent events about AI?
q = QueryEvents(
conceptUri = er.getConceptUri("artificial intelligence"),
requestedResult = RequestEventsCategoryAggr())
res = er.execQuery(q)
print(json.dumps(res, indent=4))