Estimation of Impact Factors

Scientific excellence is often burned down to some numbers. For academics, it is publish or perish and bibliographic analysis is an important factor for academic careers. A bibliographic analysis includes how many papers somebody published, where they published, and how many citations their publications received. One of the numbers that sums up the quality of publications is the impact factor, which classifies journals and is often taken as the quality of individual publications published in those journal. There are services that calculate these impact factors, most prominently isiweb of knowledge, however they provide limited access (they are subscription based) and they only publish impact factors for journals that exist already for at least four years. Here I discuss shortly a method based on google scholar to estimate impact factors and I use it to estimate an impact factor for the journal frontiers in systems neuroscience.

What is the impact factor?

I am only paraphrasing slightly from the wikipedia article on the impact factor:

In a given year, the impact factor of a journal is the average number of citations received per paper published in that journal during the two preceding years. [...] Papers published includes citable items, which are usually articles, reviews, proceedings, or notes; not editorials or Letters-to-the-Editor.


There are alternative strategies to evaluating journal impact, such as eigenfactors, which are probably a better indicator of importance than the impact factor, however the impact factor is commonly used and cited.

Average citations

I am trying to estimate impact factor from google scholar, using the publish or perish software as search front-end.

Search for journal Frontiers in systems neuroscience between 2009 and 2010. Results from publish or perish below.

Papers: 110 Cites/paper: 4.82 h-index: 13 AWCR: 239.50
Citations: 530 Cites/author: 182.54 g-index: 16 AW-index: 15.48
Years: 3 Papers/author: 41.93 hc-index: 17 AWCRpA: 81.49
Cites/year: 176.67 Authors/paper: 3.25 hI-index: 4.02 e-index: 7.55
hI,norm: 6 hm-index: 7.98

The number we are looking for are the cites/paper: 4.82.

Discounting

The impact factor counts only citations received during the year after the publication period. Therefore, we should discount for citations during that time. This is not easily possible in google scholar. Therefore, because citation patterns over time should be similar over journals within a scientific domain, I suggest to discount by a factor suggested by other journals for which the impact factor is known. Probably the citations follow a log-curve over time, however a scalar discount factor could suffice for our purpose.

I will now calculate a discount factor based on impact and citation data for two journals, Neuron and PLOS Biology.

According to google scholar, papers in neuron published during 2009-2010 received an average of 20.25 citations since publication. Neuron's impact factor according to Isiweb is 14.027. Therefore, the discount factor should be 14.027/20.25 is roughly 0.69.

For PLOS Biology (impact factor 12.472) the average citations since publication for papers during the period 2009-2010 is 23.755. The discount factor should therefore be 12.472/23.755, roughly 0.52.

The higher discount factor for Neuron could mean that articles in PLOS Biology have a shorter half-life (i.e. Neuron articles get cited for longer periods of time).

Estimated impact factors

For the journal Frontiers in Systems Neuroscience, discounted according to the model by PLOS Biology, the estimated impact factor would be 4.82*0.52, roughly 2.51. According to the Neuron discount factor, the estimated impact factor would be 3.33.

I tried this out with other journals. For the journal of neuroscience, publish and perish's limit of 1000 papers was reached, so the estimate (11.04) is skewed by publications with higher impact that come first in search results. Maybe introduction of some arbitrary search queries could help, but I am moving on to other journals. For Plos Genetics I got "Cites/paper: 12.15" which would be 6.44 and 8.38 discounted, respectively, while the impact factor of 2010 is 9.543.

The the journal of computational neuroscience reports an impact factor of 2.325 on its web page, while I get 4.43 cites/paper, which would be discounted to 2.35. Frontiers in computational neuroscience has an impact factor (as of 2010) of 2.586 and I find 3.13 cites/paper from google scholar; discounted this would amount to 1.66 and 2.16, respectively.

So the estimate from google scholar is sometimes very crude, but maybe indicative for similar journals.

Conclusions

As indicated before, this estimation has to be taken with a grain of salt. Google scholar results are ordered by pagerank, so you have to take care not to loose the less-cited paper in the analysis. Important in this context is that frontiers in very well-indexed (DOAJ, CrossRef, PubMed Central and PubMed, Google Scholar, SCOPUS) which means that no papers get lost, otherwise we might loose papers that are not indexed or not cited. This could mean that estimates for frontiers journals from google scholar are better than for other journals that are not as well-indexed.

Google scholar takes into account a very broad spectrum of journals and many conferences. Isiweb impact factor includes only citations from journals. It also excludes self-citations, however self-citations (as I found in some study) do not co-vary (at least not significantly) with the number of citations of a paper (which means self-citations do not distort results if you compare different results at least).

Please leave a comment below for questions and suggestions.
[ Read more... ]