Ubuntu for Research

Every so often it's time to upgrade to a newer release of your operating system of choice. If you are using linux and you are working in research, this might be an article just for you. Here are some instructions about the installation of Precise Pangolin 12.04, the ubuntu long-term support (LTS) release which has just come out and is to be supported until 2017.

I wrote two posts before, which were dedicated to installing software for research in ubuntu. This is an update for Ubuntu Precise Pangolin (12.04) a long-term support release. Installation should be extremely easy and basically a one-line command. You just have to love package managers.

If you want to install from scratch, download the new release from the ubuntu site. Otherwise, if you want to upgrade press alt-f2 (or open the terminal) and type
update-manager -d

Once you are done with the basic installation or upgrade, you are ready for more. This means more software repository sources, more software, more libraries. Below comes a basic list, Please feel free to suggest other programs in the comment section.


Connectivity

  • The ssh server.
  • Google Chromium (Chrome) - because it's very fast and now supports smart bookmarks (at least in linux versions)
  • Dropbox - great for synchronizing files across computers and between users. Works with different operating systems.

Programming

Statistical tools

Article writing and reference management

Illustration

Others

Get a new and comprehensive ubuntu sources list from the ubuntu sources list generator. Choose your country, your ubuntu release (12.04) and all the software sources that you think could be useful. I chose for example the canonical partner, restricted, etc, and google linux repository among others. After pressing generate sources, you'll get a file that you should put as /etc/apt/sources.list (don't forget to backup the old one). On the same internet page, you find how to add the repository keys, which you should do before updating your sources (apt-get update).

For mendeley repositories do (you might want to check that 11.04 is still the latest release they officially provide packages for) sudo sh -c 'echo "deb http://www.mendeley.com/repositories/xUbuntu_11.04 /" >> /etc/apt/sources.list'

For the Adobe Reader repository do sudo apt-add-repository "deb http://archive.canonical.com/ $(lsb_release -sc) partner"

Download the latest package information. sudo apt-get update

I put all the software mentioned above and some more in a single command, so you have everything installed in one wash. No need to search or to sit around and try to fix things.

sudo apt-get install aptitude openssh-server build-essential gcc gcc-doc apt-file gsl-bin gsl-doc-pdf gsl-ref-html libgsl0-dev gsl-bin gsl-doc-pdf libgsl0-dbg libgsl0ldbl glibc-doc libblas-dev maxima maxima-share subversion subversion-tools git screen $(aptitude search R| grep -v ^i | awk '{print $2}' | grep ^r-) octave $(aptitude search texlive | grep -v ^i | awk '{print $2}') untex luatex perl fontforge context-nonfree context-doc-nonfree dvipng imagemagick graphviz gnuplot-x11 gnuplot-doc gnuplot libatlas3gf-base kdevelop kate kile vim-gtk vim vim-addon-manager vim-common vim-doc vim-latexsuite latex2html latex-beamer xpdf writer2latex jabref bibutils hevea hevea-doc wordnet cups-pdf djvulibre-bin djvulibre-plugin pdfedit inkscape scribus pdf2djvu pdf2svg pdftk python-gdbm ipython python3-dev python3-all python-scipy unrar tofrodos epiphany-browser epiphany-extensions scribes lyx claws-mail claws-mail-i18n claws-mail-doc claws-mail-tools libqt4-core libqt4-gui ubuntu-restricted-extras regionset soundconverter gxine libxine1-ffmpeg libstdc++5 libmms0 vim aptitude zim mendeleydesktop -y --force-yes icedtea-plugin sun-javadb-client gimp acroread wine colordiff moreutils

This will take a while...

For google chrome I posted instructions earlier. This time I followed instructions other instructions, which are as follows:

wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
apt-get install google-chrome-stable

Linux is an international effort translated into many languages. Depending on the "locale" you choose, your week may start with a Sunday, Saturday, or Monday. Say, you are not a Christian or Jew? You might want to change the first weekday in your calendar to Monday. Here's how.

What I found had difficulty to find at first was how to add applications to the launcher sidebar. Press the windows key, search for "Main Menu." You might have to install it first (you will get prompted). That's the application for editing the application menu that pops up on pressing the windows key and from this menu you can drag and drop items to the launcher sidebar.

You might want to see other of my articles for more tips, such as (for a short selection) smart bookmarks for faster web searches, how to synchronize web browser bookmarks on different work stations, personalize the vim editor, set up a revision control repository, and automatically synchronize data.
You can also see the UbuntuScience community page for some additional information.
Enjoy. Please leave a comment below for questions and suggestions. [ Read more... ]

Analyzing Chess Positions Using Crafty

Playing chess with the Xboard GUIWhen you play or analyze games with crafty, you can visualize the board as well as crafty's output with Xboard.via the Xboard siteIf you want to raise your chess playing skills to a higher level, you need to analyze positions and chess games. Even on off-the-shelf computers, chess programs are stronger than most grandmasters, therefore what could be more useful in this respect than chess programs? One of the strongest chess programs is Crafty, which is being developed already for many years by Robert Hyatt, who works as an associate professor at the University of Alabama at Burmingham (UAB). Let's see how to install and use crafty.

You are interested in the world chess championship, which is currently going on in Moscow or you want to analyze your past matches that you played on-line, for example on ?

As I just stated, crafty is one of the best chess engines out there, furthermore it's free as in beer and as in speech. You can get it following instructions at its official page. In debian based linux (such as ubuntu) you can install it doing
apt-get install crafty

In ubuntu, this will also install the Xboard interface, which I highly recommend using. For windows, you'll also find executables from the crafty homepage and for the interface you will find ports from the xboard site.

Xboard also works with other engines, apart from crafty, such as dreamchess, fairymax, hoichess, sjeng, gnuchess, phalanx, fruit, glaurung, toga2, and stockfish. The last one is particularly strong, however doesn't support endgame tablebases (EGTB). I'll get back to stockfish later.

I should note here that you should not use a chess engine to cheat on games you're currently playing. If using a chess program constitutes cheating is controversial, but on some on-line playing sites computer assistance is frowned upon or prohibited (compare gameknot's policy on using computer programs and the rules in the German Correspondence Chess Federation). You should only use chess programs where you don't get an unfair advantage. Unproblematic is post-match analysis that gives clues on missed moves and helps you analyzing games. As an example, you might want to use crafty to annotate your game files or you can put it in analyze mode while you step through the moves and look for sudden jumps of evaluation. I find it also very interesting to compare well-known high-level matches and play less explored lines that seem appealing (referring to reference material should be unproblematic).

Expert players not only have a highly developed sense of positions and tactical combinations, but also a broad repertoire of openings and a deep knowledge of endgames. To get this knowledge into crafty, you can supply it with an opening book and endgame databases, respectively. I found the on-line explanations about databases a bit sketchy, so here they come at more detail.

For the opening database you need to download a file with matches, so crafty can build statistics out of that for each position. This file can be a pgn file or you can convert it to pgn (in linux you have the convert-pgn utility). The "enormous" opening book database (careful, big file) you can find at this address, others from the official crafty page. Apart from the files that you find with crafty, you can download other files, such as specialized pgn files for different openings, for example from here.

You need to compile the opening database first (see readme). In crafty, you type
book create filename 60

EGTBs you can find here or on other sites.

Crafty needs to find these files if you want to use them. On my linux system, I put these databases in /opt/crafty. Crafty's configuration file is called .craftrc in linux (in your home directory) and crafty.rc in windows (no idea where to locate it). My configuration file looks like the following:

mt=2
tbpath=/opt/crafty/
bookpath=/opt/crafty/
egtb !
hash=384M
hashp=128M
cache=32M
ponder=off
smpnice=1
log=off
show book
exit

If you want to dumb it down to beginner level, put the lines set ply 5 and ponder off.

If you want to use a graphical user interface (GUI), there's XBoard, which you can call like this:
xboard -fcp "crafty"

That's so much for crafty, but crafty does not give you statistics about moves, which is useful especially for choosing between opening lines. Scid comes with this functionality.

You can create opening books for scid, which uses a different format (the same that fruit and togaII use) in polyglot (e.g. polyglot make-book -pgn BDG2.PGN -bin poly.bin -max-ply 60. You have to copy the generated file to /usr/share/scid/books/).

However, this book format does not include win/loss statistics. If you want that you have to create a new database in scid, open a pgn file (for example enormous.pgn), and then you can use the opening report (ctrl+shift+o). The opening report includes among other things a section "Moves and Themes" with frequencies, scores, etc. for different moves and positional themes.

You can use stockfish and other engines in scid, over tools->analysis engine. Alternatively, for stockfish, use this command:
xboard -fcp stockfish -fUCI

Cheers! Please leave a comment below for questions and suggestions.
[ Read more... ]

Estimation of Impact Factors

Scientific excellence is often burned down to some numbers. For academics, it is publish or perish and bibliographic analysis is an important factor for academic careers. A bibliographic analysis includes how many papers somebody published, where they published, and how many citations their publications received. One of the numbers that sums up the quality of publications is the impact factor, which classifies journals and is often taken as the quality of individual publications published in those journal. There are services that calculate these impact factors, most prominently isiweb of knowledge, however they provide limited access (they are subscription based) and they only publish impact factors for journals that exist already for at least four years. Here I discuss shortly a method based on google scholar to estimate impact factors and I use it to estimate an impact factor for the journal frontiers in systems neuroscience.

What is the impact factor?

I am only paraphrasing slightly from the wikipedia article on the impact factor:

In a given year, the impact factor of a journal is the average number of citations received per paper published in that journal during the two preceding years. [...] Papers published includes citable items, which are usually articles, reviews, proceedings, or notes; not editorials or Letters-to-the-Editor.


There are alternative strategies to evaluating journal impact, such as eigenfactors, which are probably a better indicator of importance than the impact factor, however the impact factor is commonly used and cited.

Average citations

I am trying to estimate impact factor from google scholar, using the publish or perish software as search front-end.

Search for journal Frontiers in systems neuroscience between 2009 and 2010. Results from publish or perish below.

Papers: 110 Cites/paper: 4.82 h-index: 13 AWCR: 239.50
Citations: 530 Cites/author: 182.54 g-index: 16 AW-index: 15.48
Years: 3 Papers/author: 41.93 hc-index: 17 AWCRpA: 81.49
Cites/year: 176.67 Authors/paper: 3.25 hI-index: 4.02 e-index: 7.55
hI,norm: 6 hm-index: 7.98

The number we are looking for are the cites/paper: 4.82.

Discounting

The impact factor counts only citations received during the year after the publication period. Therefore, we should discount for citations during that time. This is not easily possible in google scholar. Therefore, because citation patterns over time should be similar over journals within a scientific domain, I suggest to discount by a factor suggested by other journals for which the impact factor is known. Probably the citations follow a log-curve over time, however a scalar discount factor could suffice for our purpose.

I will now calculate a discount factor based on impact and citation data for two journals, Neuron and PLOS Biology.

According to google scholar, papers in neuron published during 2009-2010 received an average of 20.25 citations since publication. Neuron's impact factor according to Isiweb is 14.027. Therefore, the discount factor should be 14.027/20.25 is roughly 0.69.

For PLOS Biology (impact factor 12.472) the average citations since publication for papers during the period 2009-2010 is 23.755. The discount factor should therefore be 12.472/23.755, roughly 0.52.

The higher discount factor for Neuron could mean that articles in PLOS Biology have a shorter half-life (i.e. Neuron articles get cited for longer periods of time).

Estimated impact factors

For the journal Frontiers in Systems Neuroscience, discounted according to the model by PLOS Biology, the estimated impact factor would be 4.82*0.52, roughly 2.51. According to the Neuron discount factor, the estimated impact factor would be 3.33.

I tried this out with other journals. For the journal of neuroscience, publish and perish's limit of 1000 papers was reached, so the estimate (11.04) is skewed by publications with higher impact that come first in search results. Maybe introduction of some arbitrary search queries could help, but I am moving on to other journals. For Plos Genetics I got "Cites/paper: 12.15" which would be 6.44 and 8.38 discounted, respectively, while the impact factor of 2010 is 9.543.

The the journal of computational neuroscience reports an impact factor of 2.325 on its web page, while I get 4.43 cites/paper, which would be discounted to 2.35. Frontiers in computational neuroscience has an impact factor (as of 2010) of 2.586 and I find 3.13 cites/paper from google scholar; discounted this would amount to 1.66 and 2.16, respectively.

So the estimate from google scholar is sometimes very crude, but maybe indicative for similar journals.

Conclusions

As indicated before, this estimation has to be taken with a grain of salt. Google scholar results are ordered by pagerank, so you have to take care not to loose the less-cited paper in the analysis. Important in this context is that frontiers in very well-indexed (DOAJ, CrossRef, PubMed Central and PubMed, Google Scholar, SCOPUS) which means that no papers get lost, otherwise we might loose papers that are not indexed or not cited. This could mean that estimates for frontiers journals from google scholar are better than for other journals that are not as well-indexed.

Google scholar takes into account a very broad spectrum of journals and many conferences. Isiweb impact factor includes only citations from journals. It also excludes self-citations, however self-citations (as I found in some study) do not co-vary (at least not significantly) with the number of citations of a paper (which means self-citations do not distort results if you compare different results at least).

Please leave a comment below for questions and suggestions.
[ Read more... ]

Handwashing Behavior - Or: Should I take the Peanuts?

particles on the skinMinuscule particles between dermal ridges in the hand, hardly seen by the naked eye. via wikipediaI don't think I am obsessed with personal hygiene, although, I am averse to certain behaviors, such as when you pick your nose next to me and then flip your snot in my direction, or when you reach out to touch me after having been touching dirty things on the street. What sometimes sets me off is seeing people exit the bathroom without washing their hands. I was also surprised, that when toilets featured shared faucets, to see this frequently with women (or should I say rather, not to see it). Now what about the peanuts in the bar? The guy, who just grabbed 10 more peanuts than his hand could hold and let half of them fall back into the bowl, what did he touch before? Should you really eat any of the peanuts? How many people wash their hands anyways?

In 1847 Ignaz Semmelweis showed that hand washing of midwives helped to reduce significantly mortality rate of childbed fever, from aroud 10 percent to around 1 percent, although at the time, he became rather unpopular for it. In 1890, Robert Koch demonstrated that anthrax was caused by the bacterium Bacillus anthracis and provided evidence for Pasteur's germ theory. Despite the implications for hygiene being so clear, it is again a case of theory against practice as case studies show.

Results from studies on hand washing behavior vary, debited in part to experimental protocol. Amanda Stinson concluded in her article "Hand washing behavior of women in public bathrooms", that due to an increased self-awareness, subjects were more likely to wash their hands when someone else was present washing their hands. Stinson distinguished between three conditions in her study on hand washing behavior of women:
  1. No observer was visible
  2. A person is talking on the cell phone next to the faucet
  3. A person is washing her hands

She found that overall only 40 percent of young women washed their hands. In the hand washing condition (3), the subjects were more likely to wash their hands, 56 percent, while in the cell phone condition (2), subjects were less likely to wash their hands 27 percent. Stinson also found a strong and highly significant negative correlation between the time of night and whether or not the subject washed her hands.

In some studies it is not very clear whether the social factor mentioned above was taken into account so it is difficult to compare data over different studies. It also becomes clear from another study (see below), that age and education could be correlated variables. I therefore mention only one more study to compare men and women's hand washing behavior.

In the study "Gender and ethnic differences in hand hygiene practices among college students" by Anderson and colleagues it is not completely clear how they observe people in restrooms, however they make no mention of controlling the social variable and I would speculate that the difference from the results above could be explained in terms of social pressure.

What they found is that men washed their hands in 38 percent of cases and women in 62 percent,hand hygiene in females would be better than in males. In the discussion, Anderson and colleagues reference earlier similar evidence. They provide the argument that females' higher compliance could be associated with their tendency to practice socially acceptable behaviors. This however would also mean that having somebody watch you in the bathroom would have a stronger effect on women than on men, so that the question of who is more hygienic, men or women, cannot be answered conclusively.

Interestingly, Anderson and colleagues found that the minority students exhibited better hand hygiene practices than the Caucasian students. Comparing other studies they find that hand washing behavior in this college student population was only slightly higher than in populations of middle school and high school students. As for adequacy of hygiene, they report that only a small proportion of those who washed their hands did so for 20 seconds.

To come back to the original question: should I take the peanuts? I think that's a question of priority: just how hungry are you?

Enjoy those peanuts. At least as long as you can. Please leave a comment below for questions and suggestions. If you liked this article you might also want to read about the speed of nail growth.
[ Read more... ]