Accessing Scopus via Python

text mining
Python
bibliography
Author

Manish Datt

Published

January 12, 2023

Learn to parse data from Scopus using Python.

Any sound scientific inquiry is grounded on a meticulous review of relevant literature. Today, we have multiple online repositories for storing and accessing literature from different disciplines. For instance, research articles in the domain of biological sciences can be accessed through databases like PubMed, Scopus, Web of Sciences, etc. The information available through these databases is of immense value to the research community. However, the exponential growth of these databases over the last few years has made it challenging to parse the new information. Also, modern research is demolishing the disciplinary divide such that research now needs to access and process literature from different sources. To address these issues we need computational methods to effectively search the online literature archives. Literature mining is an emerging field that enables researchers to programmatically parse text data (scientific literature) to not only extract the required information but also to generate testable hypotheses. Here, we’ll look into one of the Python libraries for mining literature from the Scopus database. The example provided here would act as a foundation for designing comprehensive and rigorous text-mining strategies.

Elsevier Developer Portal

The first thing you need to do is to get an API key to access the Elsevier resoucres programmatically (Link). The APIs are available for no-charge for non-commercial use. However, an important point to note is that if you are not a subscriber to Scopus then there’ll a limit of parsing upto 5,000 articles per search. On the other hand, if you’ve an active subscription then there is no such limit.

Pybliometrics library

Next, you need the pybliometrics library Link to access the Scopus database programmatically. The library can be install by running the command pip install pybliometrics. Once you have got the API keys and installed the pybliometrics library, the key must be saved in the configuration file (config.ini) as described here or you may execute the following two lines of code to create the configuration file.

import pybliometrics
pybliometrics.scopus.utils.create_config()

We’ll now import AbstractRetrieval, AuthorRetrieval, and ScopusSearch functions from the pybliometrics.scopus class. These functions can be used to search Scopus based on doi, author id, or search string, respectively. We’ll also import pandas and matplotlib to facilitate parsing and visualization of the results. The AbstractRetrieval function takes a doi as an argument and return an object of the namesake class. This object has multiple attribute to get information about the document. E.g. the title attribute gives the title of the article.

from pybliometrics.scopus import AbstractRetrieval
from pybliometrics.scopus import AuthorRetrieval
from pybliometrics.scopus import ScopusSearch
import pandas as pd
import matplotlib.pyplot as plt
ab = AbstractRetrieval("10.1016/j.softx.2019.100263")
ab.title
'pybliometrics: Scriptable bibliometrics using a Python interface to Scopus'

Similarly, AuthorRetrieval function can be used to search Scopus using a Scopus author ID as an argument. The returned object has a get_documents function that can be used to generate a list having each publication as its element.

author_search = AuthorRetrieval("24398410800")
my_pubs = author_search.get_documents()
ctr=0
while(ctr<3):
    print(my_pubs[ctr].coverDate, my_pubs[ctr].title)
    ctr+=1
2021-12-01 Interplay of substrate polymorphism and conformational plasticity of Plasmodium tyrosyl-tRNA synthetase
2021-07-15 Safranal inhibits NLRP3 inflammasome activation by preventing ASC oligomerization
2020-11-01 In silico assessment of natural products and approved drugs as potential inhibitory scaffolds targeting aminoacyl-tRNA synthetases from Plasmodium

To receive updates about new posts, announcements, etc., please share your details below.