So now that I have more than a passing interest in chemistry (and
therefore
cheminformatics/chemi-informatics/chemoinformatics/whatever-you-call-it),
I'd like to see what the state of the art is for representing chemical
information and if there are any decent libraries for working with
these representations. At first glance, I'm in luck. There's CML, the
Chemical Markup Language, there's the Blue Obelisk set of projects for
open source/open data/open standards in chemsitry, and there's the
CDK, the Chemistry Development Kit. This all sounds promising. Let's
dive in.
Working in reverse order, let's start with CDK. Google CDK and it
shows up as the first hit -- even before the Cyclin Dependent
Kinases. This is good. Now let's follow the link. Uh oh. It's a link
to sourceforge. That's alright, we'll click through and hope for the
best. Ah, not only is it on sf.net, but it's a wiki site:
http://apps.sourceforge.net/mediawiki/cdk/index.php?title=Main_Page.
Ugh. Alright, I'll try to overcome my biases and keep plugging
away. It's not that all wikis are bad, but rather that, IMO, they are
a poor substitute for a properly designed web site for a project. They
certainly have a place, but the idea that all web content gets
wiki-ized can lead to some rather difficult-to-follow web pages,
again, IMO. The wikipedia example is a good counterexample to my
claim, but, most other wiki sites don't have the complete, polished
feel of wikipedia. In any event, let's keep plugging ahead with CDK.
Ah, here we go. Two publications in the peer-reviewed literature. This
should help give us an overview of what CDK has and where it is
going. One is in the Journal of Chemcal Information and Modeling
(although it seems that when the article was published it was called
the Journal of Chemcal Information and Computer Sciences. Alright,
sounds promising. Click through the DOI link, which takes us to a page
of American Chemical Society, the world's largest scientific
society. Surely, this being a paper about on open-source toolkit and
ACS being a society for the betterment of society, this is going to be
an open-access journal, right? Or at least an open-access publication
in a mixed-access journal, right? Click on the link to get the
PDF... Get PDF -- WRONG! $30 for 48 hours of access. Damn. Ok, well,
let's get the other paper, there were two on the CDK wiki. The next
one is in something called Current Pharmaceutical Design. Uh oh. This
doesn't sound promising. And, sure enough: "The full text electronic
article is available for purchase. You will be able to download the
full text electronic article after payment. $55.10 plus tax." This
isn't getting us anywhere. At least CDK is open source. Let's go get
the source. Well, first let's browse the documentation anyway.
Click over to the documentation look on sourceforge:
http://apps.sourceforge.net/mediawiki/cdk/index.php?title=Documentation. How
is the documentation different than what I was looking at before, or
what's the difference between the documentation and the main page?
Who knows. In any case, this looks promising: "A great source of CDK
documentation or introductory reading is the CDK News, the quarterly
newsletter of the CDK team." Click through. Ok, there's a picture of
the (presumably) most recent issue, which is a link to the table of
contents and a note about getting the PDF: "The full issue can be
downloaded as PDF from http://sf.net/projects/cdk/". Hmm... Ok, click
through that... And we're back on sf.net. Oh wait, that's not a
link. Just some text. Cut the URL and paste into the nav bar in the
browser... Hmm. Now we're at another sf.net page. So far we've got the
"main" page, the "documentation" page and now the, presumably,
"project" page. But now that we're there, we see that there is no
mention of CDK News on this page. Damn. Alright, let's start clicking
and see what we find. Ok, under the "Download -- Browse All Packages"
link we get to a page that has CDK News on it. Maybe now we're
getting somewhere. Click through that and we have a nice list of the
various "Releases" of CDK News (this seems like an abuse of the
Release mechanism, if you ask me -- these aren't different versions of
the same thing, rather distinct issues, all of which shoud live on,
but, OK, I think I see what they did there). Let's start at the
beginning. Click on 1/1. Hmm... That just expands the HTML a bit to
show another link for cdknews1.1.pdf. Ok, click that. Ah yes, no I
remember why I hate sf.net. Do I get the PDF in my browser? NO! I get
a window with a whole bunch of orange reminding me the name of the web
site I know hate so much, some links to "share" the project (whatever
the hell that means!), for related stuff and for forums (yet another
set of pages to not get the info I want?) some google ads and some guy
who looks like he hasn't slept in a week carrying stacks of cash,
presumably, in an ad for nortel. Oh, and I link telling me who's
providing this oh-so-handy mirror. Oh, and I almost forgot, some nice
sponsor links! As opposed to the ads, I suppose. Where's my damn PDF?
Who knows. Ah, this is helpful. Please use this "direct link":
http://voxel.dl.sourceforge.net/sourceforge/cdk/cdknews1.1.pdf. Why
the hell didn't they just give me that link the first place??? Ok,
finally got CDK News 1.1. I suppose a (direct) link off of the CDK
home page (or at least one of them) would have been to helpful for
irritating newbies like myself. Ok, time to go read the first CDK
News, since I can't get the peer reviewed articles about the
project. More later.