Summary The aim of this article is to analyze the web positioning factors that can influence the order, by relevance, in Google Scholar and the subsequent evaluation of the importance of received citations in this ordering process. The methodology of reverse engineering was applied, in which a comparison was made between the Google Scholar ranking and another ranking consisting of only the number of citations received by documents.
This investigation was conducted employing four types of searches without the use of keywords: by publication, year, author, and “cited by”. The results were matched in the four samples with correlation coefficients between the two highest rankings, which exceeded 0.9. The present study demonstrates more clearly than in previous research how citations are the most relevant off-page feature in the ranking of search results on Google Scholar. The other features have minimal influence. This information provides a solid basis for the academic search engine optimization (ASEO) discipline. We also developed a new analysis procedure for isolating off-page features that might be of practical use in forthcoming investigations.
Keywords ASEO; SEO; Reverse engineering; Citations; Google Scholar; Indicators; Rankings; Algorithms; Academic search engines.
Search engine optimisation (SEO) is the process employed to optimise websites and their content to place them in favourable positions in search engine results (Enge; Spencer; Stricchiola, 2015). SEO is also a well-established profession within the new industry of digital communication, as shown by the existence of a wide range of monographs, professional publications and academic work. Its purpose is to highlight and strengthen the quality of documents to increase their visibility to the algorithms that establish the ranking positions in search engines, particularly Google. This goal must be achieved without falsifying the characteristics of documents, i.e., without employing fraudulent means.
Google Search results pages are ordered by relevance (Google, 2017). According to Google, this relevance criterion is calculated based on more than 200 features. Google does not specify these features or their specific weight; they merely disclose partial and general information, including that the quality of the content and backlinks are the two predominant factors (Ratcliff, 2016; Schwartz, 2016). The reason provided by Google for this lack of transparency is to fight against spam (Beel; Gipp, 2010). If all of the details of ranking factors were made available, spammers could more easily place low-quality documents in favourable positions. Nevertheless, this black box policy works to the detriment of SEO professionals who conduct their activities ethically and whose work is hindered by a lack of reliable information.
Some SEO companies (Gielen; Rosen, 2016; Localseoguide, 2016; MOZ, 2015; Searchmetrics, 2016) conduct reverse engineering research to measure the impact of the factors involved in Google’s positioning process. In this research, many searches have been analysed to identify positioning factors based on the characteristics of pages placed in the first positions. Due to the great number of factors involved in the process of positioning, it is extremely difficult to establish the factors that are truly relevant and the extent to which they influence the final positioning of documents. In addition, Google’s positioning process is highly dynamic, with the algorithm undergoing dozens of changes per year (MOZ, 2017).
In recent years, SEO has been applied to academic search engines. This new process is known as Academic SEO (ASEO) (Beel; Gipp, 2009b, 2010; Codina, 2016; Martín-Martín et al., 2016a; Muñoz-Martín, 2015). Scholars are placing increasingly greater emphasis on enhancing the visibility of their articles in academic search engines. Articles appearing in the leading positions enhance their visibility, thus increasing the probability of being read and cited, and as a consequence, they are more likely to improve the personal h indices of their authors (Farhadi et al., 2013).
In many cases, the same optimisation procedures used successfully on Google Search are being applied to Google Scholar. However, Google Scholar has its own algorithm. Few studies have addressed the specific ordering factors employed by Google Scholar, and among those that could be cited are Beel and Gipp (2009b; 2009c; 2010); Beel, Gipp and Wilde (2010); Martín-Martín et al. (2014; 2017); Orduña-Malea et al. (2016).
The purpose of the present study was to analyse the features of the documents that can influence relevance rankings in Google Scholar. We are particularly interested in the citations received by documents. We aimed to assess the influence of the number of citations received in the ranking algorithm. The number of times that a document is cited is a key feature for determining the specificity of the Google Scholar ranking process.
We believe that the influence of citations is much greater than authors and publishers might believe. For example, the instructions to authors from academic journals provide guidelines regarding how to improve their ranking positions in Google Scholar (Elsevier, 2012; Wiley, 2015; Emerald Publishing Limited, 2017). In these guides, the citations received are not mentioned or are treated without the importance that they deserve.
This article reports the findings of a reverse engineering study that used a new method of analysis. This method allows us to block some factors of the algorithm of positioning, specifically those depending on external elements of ranked pages. In this manner, we could focus the study on a small set of factors with greater control. Our hypothesis is that if we compare the rankings applying only the number of citations received with the standard Google Scholar ranking in searches in which only external factors participate, then we can identify the weight of the citations in the set of these external factors. If the two compared rankings are similar, then the citations will carry significant weight.
This new methodology is possible because of Google Scholar’s advanced search form, which allows users to restrict the search fields to the author, year and source. Only external factors participate in these types of searches in which there are no keywords. In this way, the results obtained herein are far more reliable than those of previous studies using reverse engineering on Google Search without this control of variables.
2. Related works
Google Scholar has become an alternative to classic scientific citation indexing services, such as Web of Science (WoS) or Scopus. The positions of these commercial indexing services in the market could be jeopardised if Google Scholar offers a free product of similar quality. For this reason, Google Scholar has been analysed using several approaches:
- Comparative linear or coverage analysis, aiming to establish its quality and utility (Giustini; Boulos, 2013; Walters, 2008; De-Winter; Zadpoor; Dodou, 2014; Harzing, 2013; 2014; De-Groote; Raszewski, 2012; Orduña-Malea et al., 2014; 2015; Pedersen; Arendt, 2014; Jamali; Nabavi, 2015);
- Assessment of the impact of the authors, their citations or H indices (Van-Aalst, 2010; Jacsó, 2008a; 2008b; 2009; 2012; Martín-Martín et al., 2014; 2017; Farhadi et al.,2013); and
- Assessment of the utility of Google Scholar for bibliometric studies regarding the quality of the scientific activity (Aguillo, 2012; Jacsó, 2009; Torres-Salinas; Ruiz-Pérez; Delgado-López-Cózar, 2009; Beel; Gipp, 2010; Delgado-López-Cózar et al., 2012; 2014; Martín-Martín et al.,2016b).
Limited research regarding the process of information retrieval and search effectiveness has, however, been conducted (Jamali; Asadi, 2010; Walters, 2008). Few works about the intervening factors in ranking algorithms according to relevance have been published (Beel; Gipp, 2009a; 2009b; 2009c; Beel; Gipp; Wilde, 2010).
Unlike the process of positioning in Google Search, that used in Google Scholar has aroused little scientific interest, which is somewhat unexpected considering that it influences the articles that are read. It is widely acknowledged that the first items appearing on a search result list receive more attention from users than subsequent items do (Marcos; González-Caro, 2010). A better position in the ranking implies better chances of being found and read.
Some conclusions can be drawn from the existing works regarding relevance rankings in Google Scholar:
- The keywords used in the search must appear in the document’s title to enable favourable positioning of the document (Beel; Gipp, 2009a);
- The frequency of keywords in the text of the document does not appear to be a determining factor in establishing its ranking order (Beel; Gipp, 2009a);
- Recent articles are more highly ranked than older articles (Beel; Gipp, 2009a) to compensate for the Matthew effect (Merton, 1968): articles with many citations tend to be ranked first; therefore, these articles have more readers and more citations and consequently consolidate their positions at the top (Martín-Martín et al., 2016b); and
- The number of citations received is a determining factor in establishing the ranking order by relevance (Beel; Gipp, 2009c; Martín-Martín et al., 2014).
The latter conclusion is particularly relevant to the present study. However, these investigations have some limitations. In Beel and Gipp (2009c), all SEO features were analysed together; therefore, the variables related to on-page features were not blocked, and the results are not sufficiently clear.
In Martín-Martín et al. (2014) only searches by year were used. The central aim of the present research was to corroborate this conclusion by applying a methodology that establishes stricter control over variables. This methodology allowed us to obtain an accurate insight into the relevance of received citations in relation to all external features of the ranking algorithm in Google Scholar.
Access and download
Rovira, Cristòfol; Guerrero-Solé, Frederic; Codina, Lluís (2018). “Received citations as a main SEO factor of Google Scholar results ranking”. El profesional de la información, v. 27, n. 3, pp. 559-569.
Aguillo, I.F. (2012). “Is google scholar useful for bibliometrics? A webometric analysis”. Scientometrics, v. 91, n. 2, pp. 343-351. https://doi.org/10.1007/s11192-011-0582-8
Beel, J.; Gipp, B. (2009a). “Google scholar’s ranking algorithm: an introductory overview”. In: Proceedings of the 12th international conference on scientometrics and informetrics, ISSI’09, pp. 230-241.
Beel, J.; Gipp, B. (2009b). “Google scholar’s ranking algorithm: the impact of articles’ age (an empirical study)”. In: Sixth international conference on information technology: new generations, ITNG’09, pp. 160-164.
Beel, J.; Gipp, B. (2009c). “Google scholar’s ranking algorithm: the impact of citation counts (an empirical study)”. In: Third international conference on research challenges in information science, RCIS 2009, pp. 439-446.
Beel, J.; Gipp, B. (2010). “Academic search engine spam and google scholar’s resilience against it”. The Journal of Electronic Publishing, v. 13, n. 3, pp. 1-28. https://doi.org/10.3998/3336451.0013.305
Beel, J.; Gipp, B.; Wilde, E. (2010). “Academic search engine optimization (ASEO) optimizing scholarly literature for google scholar & co”. Journal of Scholarly Publishing, v. 41, n. 2, pp. 176-190. https://doi.org/10.3138/jsp.41.2.176
Codina, L. SEO académico: definición, componentes y guía de herramientas. https://www.lluiscodina.com/seo-academico-guia/
de Groote, S.L.; Raszewski, R. (2012). “Coverage of google scholar, scopus, and web of science: a case study of the h-index in nursing”. Nursing Outlook, v. 60, n. 6, pp. 391-400. https://doi.org/10.1016/j.outlook.2012.04.007
de Winter, J.C.F.; Zadpoor, A.A.; Dodou, D. (2014). “The expansion of google scholar versus web of science: a longitudinal study”. Scientometrics, v. 98, n. 2, pp. 1547-1565. https://doi.org/10.1007/s11192-013-1089-2
Elsevier. “Get found: optimize your research articles for search engines”. Elsevier Connect. https://www.elsevier.com/connect/get-found-optimize-your-research-articles-for-search-engines
Enge, E.; Spencer, S.; Stricchiola, J. (2015). The art of SEO: mastering search engine optimization. Sebastopol CA: O’Reilly Media, ISBN: 9781491903643: https://books.google.co.in/books?id=hg5iCgAAQBAJ
Farhadi, H.; Salehi, H.; Yunus, M.M.; Aghaei Chadegani, A.; Farhadi, M.; Fooladi, M.; Ale Ebrahim, N. (2013). “Does it matter which citation tool is used to compare the h-index of a group of highly cited researchers?”. Australian Journal of Basic and Applied Sciences, v. 7, n. 4, pp. 198-202. https://ssrn.com/abstract=2259614
Gielen, M.; Rosen, J. “Reverse engineering the youtube”. Tubefilter.com. http://www.tubefilter.com/2016/06/23/reverse-engineering-youtube-algorithm/
Giustini, D.; Boulos, M.N.K. (2013). “Google Scholar is not enough to be used alone for systematic”. Online Journal of Public Health Informatics, v. 5, n. 2, pp. 1-9. https://doi.org/10.5210/ojphi.v5i2.4623
Google. About google scholar. http://scholar.google.com/intl/en/scholar/about.html
Google. How google search works. Learn how google discovers, crawls, and serves web pages, search console help. https://support.google.com/webmasters/answer/70897?hl=en
Harzing, A.W. (2011). The publish or perish book: your guide to effective and responsible citation analysis. Melbourne, Australia: Tarma Software Research Pty Ltd, ISBN: 978 1 60752 120 4 https://EconPapers.repec.org/RePEc:spr:scient:v:88:y:2011:i:1:d:10.1007_s11192-011-0388-8
Harzing, A.W. (2013). “A preliminary test of google scholar as a source for citation data: a longitudinal study of nobel prize winners”. Scientometrics, v. 94, n. 3, pp. 1057-1075. https://doi.org/10.1007/s11192-012-0777-7
Harzing, A.W. (2014). “A longitudinal study of google scholar coverage between 2012 and 2013”. Scientometrics, v. 98, n. 1, pp. 565-575. https://doi.org/10.1007/s11192-013-0975-y
Jacso, P. (2008a). “Testing the calculation of a realistic h-index in google scholar, scopus, and web of science for FW Lancaster”. Library Trends, v. 56, n. 4, pp. 784-815. https://doi.org/10.1353/lib.0.0011
Jacsó, P. (2008b). “The pros and cons of computing the h-index using google scholar”. Online Information Review, v. 32, n. 3, pp. 437-452. https://doi.org/10.1108/14684520810889718
Jacsó, P. (2009). “Calculating the h-index and other bibliometric and scientometric indicators from google scholar with the publish or perish software”. Online Information Review, v. 33, n. 6, pp. 1189-1200. https://doi.org/10.1108/14684520911011070
Jacsó, P. (2012). “Using Google scholar for journal impact factors and the h-index in nationwide publishing assessments in academia–siren songs and air-raid sirens”. Online Information Review, v. 36, n. 3, pp. 462-478. https://doi.org/10.1108/14684521211241503
Jamali, H.R.; Asadi, S. (2010). “Google and the scholar: the role of Google in scientists’ information-seeking behaviour”. Online Information Review, v. 34, n. 2, pp. 282-294. https://doi.org/10.1108/14684521011036990
Jamali, H. R.; Nabavi, M. (2015). “Open access and sources of full-text articles in Google Scholar in different subject fields”. Scientometrics, v. 105, n. 3, pp. 1635-1651. https://doi.org/10.1007/s11192-015-1642-2
Lemon, J. (2006). “Plotrix: a package in the red light district of R”. R-News, v. 6, n. 4, pp. 8-12.
Localseoguide. Local SEO ranking factors study 2016, localseoguide. http://www.localseoguide.com/guides/2016-local-seo-ranking-factors/
López-Cózar, E.; Robinson-García, N.; Torres-Salinas, D. (2012). Manipular google scholar citations y google scholar metrics: simple, sencillo y tentador, EC3 working papers. Granada: Universidad de Granada. http://hdl.handle.net/10481/20469
López-Cózar, E.; Robinson-García, N.; Torres-Salinas, D. (2014). “The Google scholar experiment: how to index false papers and manipulate bibliometric indicators”. Journal of the Association for Information Science and Technology, v. 65, n. 3, pp. 446-454. https://doi.org/10.1002/asi.23056
Maciá, F. (2015). SEO: técnicas avanzadas. Barcelona: Anaya.
Marcos, M.-C.; González-Caro, C. (2010). “Comportamiento de los usuarios en la página de resultados de los buscadores. Un estudio basado en eye tracking”. El profesional de la Información, v. 19, n. 4, pp. 348-358. https://doi.org/10.3145/epi.2010.jul.03
Martín-Martín, A.; Ayllón, J.M.; Orduña-Malea, E.; López-Cózar, E.D. (2016a). Google Scholar Metrics released: a matter of languages… and something else. Granada: Universidad de Granada. https://arxiv.org/abs/1607.06260v1
Martín-Martín, A.; Orduna-Malea, E.; Ayllón, J. M.; Delgado López-Cózar, E. D. (2016b). “Back to the past: on the shoulders of an academic search engine giant”. Scientometrics, v. 107, n. 3, pp. 1477-1487. https://doi.org/10.1007/s11192-016-1917-2
Martín-Martín, A.; Orduña-Malea, E.; Ayllón, J.M.; López-Cózar, E.D. (2014). Does Google Scholar contain all highly cited documents (1950-2013)? EC3 working papers. Granada: Universidad de Granada. https://arxiv.org/abs/1410.8464
Martín-Martín, A.; Orduna-Malea, E.; Harzing, A.W.; Delgado López-Cózar, E. (2017). “Can we use Google Scholar to identify highly-cited documents?.” Journal of Informetrics, v. 11, n. 1, pp. 152-163. https://doi.org/10.1016/j.joi.2016.11.008
Mayr, P.; Walter, A.-K. (2007). “An exploratory study of google scholar”. Online Information Review, v. 31, n. 6, pp. 814-830. https://doi.org/10.1108/14684520710841784
Merton, R. K. (1968). “The Matthew effect in science: the reward and communication systems of science are considered”. Science, v. 159, n. 3810, pp. 56-63. https://doi.org/10.1126/science.159.3810.56
Moed, H. F.; Bar-Ilan, J.; & Halevi, G. (2016). “A new methodology for comparing Google Scholar and Scopus”. Journal of Informetrics, v. 10, n. 2, pp. 533-551. https://doi.org/10.1016/j.joi.2016.04.017
MOZ. Search engine ranking factors 2015, moz.com. https://moz.com/search-ranking-factors/correlations
MOZ. Google algorithm change history, moz.com. https://moz.com/google-algorithm-change
Muñoz-Martín, B. (2015). “Incrementa el impacto de tus artículos y blogs: de la invisibilidad a la visibilidad”. Revista de la Sociedad Otorrinolaringológica de Castilla y León, Cantabria y La Rioja, v. 6, n. Suppl. 4, pp. 6-32. http://hdl.handle.net/10366/126907
Orduña-Malea, E.; Ayllón, J.M.; Martín-Martín, A.; López-Cózar, E.D. (2014). About the size of Google Scholar: playing the numbers. EC3 working papers. Granada: Universidad de Granada.
Orduña-Malea, E.; Ayllón, J. M., Martín-Martín, A.; Delgado López-Cózar, E. (2015). “Methods for estimating the size of Google Scholar”. Scientometrics, v. 104, n. 3, pp. 931-949. https://doi.org/10.1007/s11192-015-1614-6
Orduna-Malea, E.; Martín-Martín, A.; Ayllón, J.; Delgado López-Cózar, E. (2016). La revolución google scholar: destapando la caja de Pandora académica. Granada: Editorial Universidad de Granada. University of New England, ISBN: 9788433859419
Pedersen, L.A.; Arendt, J. (2014). “Decrease in free computer science papers found through google scholar”. Online Information Review, v. 38, n. 3, pp. 348-361. https://doi.org/10.1108/OIR-07-2013-0159
Ratcliff, C. WebPromo’s Q&A with google’s Andrey Lipattsev, search engine watch. https://searchenginewatch.com/2016/04/06/webpromos-qa-with-googles-andrey-lipattsev-transcript
Revelle, W. Psych: procedures for personality and psychological research. Northwestern University. https://CRAN.R-project.org/package=psych
Schwartz, B. Now we know: here are google’s top 3 search ranking factors, search engine land. http://searchengineland.com/now-know-googles-top-three-search-ranking-factors-245882
Searchmetrics. Rebooting ranking factors. http://www.searchmetrics.com/knowledge-base/ranking-factors/
Team, R.C. R: a language and environment for statistical computing, R foundation for statistical computing. https://www.R-project.org
Torres-Salinas, D.; Ruiz-Pérez, R.; Delgado-López-Cózar, E. (2009). “Google scholar como herramienta para la evaluación científica”. El Profesional de la Información, v. 18, n. 5, pp. 501-510. https://doi.org/10.3145/epi.2009.sep.03
van Aalst, J. (2010). “Using google scholar to estimate the impact of journal articles in education”. Educational Researcher, v. 39, n. 5, pp. 387-400. http://journals.sagepub.com/doi/abs/10.3102/0013189X10371120
van der Graaf, P. Reverse engineering search engine algorithms is getting, searchenginewatch. https://searchenginewatch.com/sew/how-to/2182553/reverse-engineering-search-engine-algorithms-getting-harder
Walters, W.H. (2008). “Google scholar search performance: Comparative recall and precision”. portal: Libraries and the Academy, v. 9, n. 1, pp. 5-24. https://doi.org/10.1353/pla.0.0034
Wiley. Writing for SEO. https://authorservices.wiley.com/author-resources/Journal-Authors/Prepare/writing-for-seo.html