I performed a study in 2018 to identify the core topics in 98 full-text library and information science Electronic Theses and Dissertations (ETDs) using Topic-Modeling-Toolkit for the period of 2013-2017.
The dashboard helps to visualize the topic modeling results of the above study. It was prepared using Tableau on 25 December 2020 and was updated on 13 June 2021. The data used to prepare the dashboard is available for download and can be searched using the publication year, topics, and keywords for title.
The major caveats in Shodhganga repository that holds back text mining or any kind of analyses are:
- No API
- Individual Chapter PDFs for each ETD
- No Abstract for ETDs in the metadata
- No option to download the metadata
The above issues will be the major reason for me to not able to update the dashboard frequently as it take months to just download the data! And the above issues do not include the other uncertainties related to metadata! There are many instances where there is no “year” or “advisor” in the metadata of the ETDs. Some ETDs are submitted in Word, some are submitted in PDFs. Again all these issues were covered in my small study which I faced when I analysed this repository. If someone wants to do any analysis, one has to go through each ETD manually which is not possible for analyses that depend on large corpus of data. Until and unless INFLIBNET works on fixing on all the above issues, user experience of using Shodhganga will never be competitive to databases like ProQuest.
PS: This study was conducted in 2018 and all these issues have not been fixed till now.