The English Wikipedia reached the 3 million article mark today. This is undoubtedly a sign of it's continuous extension and an opportunity to examine some numbers to talk about wikipedia's future growth. In this post I look at how the growth of wikipedia progresses.
First see Jimmy Wales, it's co-founder talk about the 3 million article milestone.
Recently it was discussed at slashdot whether wikipedia is approaching the limit of its growth. Slashdot members have been connected to early growth of wikipedia and I think it's save to say that many slashdot members have contributed at least once to wikipedia. Many comments in the discussion complained about disruptive behavior by other wikipedia contributors and the increasingly closed nature of it's community.
Has wikipedia reached it's limits? Let's look at the numbers.
I took data from wikipedia pools, where people take bets on when certain milestones are reached and plotted them. The abscissa gives the number of days since wikipedia's inception (taken as January 16th, 2001), the ordinate the number of articles. (Click to enlarge.)
I admit that this is of course a very crude statistics consisting of only 5 points, however it was fast to plot. I remember back in 2005 or early 2006 on the wikipedia statistics page there was a discussion whether wikipedia's growth was exponential with more credibility given to the exponential growth hypothesis, however most of the graphs there are outdated. To get more data I could have downloaded the whole wikipedia history (several GBs) or taken data from the milestone page, but I think the conclusion is warranted from this plot: the growth is definitely not exponential, at least since early 2006.
This probably means that there are bottlenecks to it's growth. Growth itself, as accumulation of new articles as measured here, has not to be seen necessarily as positive. Within the wikipedia community there are essentially two fractions as to adding new articles:
Whether quality is actually improving is another question that I will not look at in this post.
Matlab scripts used to calculate and plot
statcalc.m
% data taken from http://en.wikipedia.org/wiki/Wikipedia:Pools
pools={'March 17, 2005','August 4, 2005','March 1, 2006','September 9, 2007','August 17, 2009'};
number_of_articles=[500000,666666,1000000,2000000,3000000];
for i=1:numel(pools) days(i)=since_project(pools{i}); end
figure; plot(days,number_of_articles,'*-'); ylabel('number of articles reached'); xlabel('days passed');
since_project.m
function d=since_project(datestr)
% calculates time passed since wikipedia exists % uses time_passed function.
% wikipedia existed since 16 January 2001, see http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_records
d=time_passed('16 January 2001',datestr);
time_passed.m
function d=time_passed(date1str,date2str)
% calculate time passed between two dates in days % time in dd Dec YYYY format.
% used for wikipedia statistics
[a,d1]=unix(['date -d "' date1str '" +%j']);
d1=str2num(d1); [a,d2]=unix(['date -d "' date2str '" +%j']); d2=str2num(d2); dy=(str2num(date2str(end-3:end))-str2num(date1str(end-3:end)))*365; d=dy+d2-d1;
U COMMENT
I FOLLOW


Wikipedia is very useful for me, I often search and get information from it. And best of all, it is free
Subscribe to replies to this post
This conversation is missing your voice. Your feedback is appreciated.
Post a Comment
You can use some HTML tags, such as <b>, <i>, <a>
You can follow the discussion of this post by subscribing.
You are free to include information from this article on your own site if you provide a backlink. You can use the following markup: