Cyrus' New Completely Useless Blog

New Intel NUC Firmware Released General

A new version of the BIOS (version 0039) for the Intel Skylake NUCs (e.g. NUC6I5SYK) has been released.

It can be found at: https://downloadcenter.intel.com/download/25864/BIOS-Update-SYSKLi35-86A-

The reason I mention this is that prior to this release, I couldn't properly boot from the NVMe drive. Well, I could, but I had to use a boot loader on the SATA drive to then tell the OS to boot form the NVMe drive. Now I can boot to either disk. yay! These new NUCs are quite nice.

ABCL Error Handling Lisp

There's probably a better way to do this, but I have been having a difficult time trying to, from the lisp side of things, track down the cause of errors signaled from java code.

It turns out that we can use lisp's normal error handling facilities to work with java errors. The following snippet triggers a java NullPointerException and if we just evaluate this in SLIME we don't actually see the java backtrace (or at least I don't see it -- of course it would be nice if there were a way to do so).

(handler-case
    ;; this will throw an NPE
    (java:jstatic-raw "getenv" "java.lang.System" nil)
  (error (e)
    ;; this prints the stack trace to the jvm's standard out, which
    ;; when running under slime, is our *inferior-lisp* buffer.
    (print (#"printStackTrace" (java:java-exception-cause e)))))

But this isn't so great as the stack trace is printed to the inferior-lisp buffer. To see it in SLIME's output buffers, we can use ABCL's getMessage routine as follows:

(handler-case
    ;; this will throw an NPE
    (java:jstatic-raw "getenv" "java.lang.System" nil)
  (error (e)
    ;; this prints the exception type and the stack trace to SLIME's STDOUT
    (print (#"getMessage" e))))

Having this certainly makes it easier to find the source of errors in java code called from ABCL.

CDK Debugging Computational Biology

[Well, this is more computational chemistry than computational biology, but I didn't want this to show up on planet.lisp.org, so I'm using this category]

If you want to turn on logging with CDK, the magic JVM incantation arguments are:

-Dcdk.debugging=true -Dcdk.debug.stdout=true

I'm not sure how to get logging for only a single class yet. Perhaps that will be next. In the meantime, at least basic logging works for me in Eclipse now.

Common Lisp and Java Lisp

Tales of Woe

So... in an attempt to use preexisting wheels, rather than reinvent my own at every turn, I've been trying to get a decent Common Lisp environment working with the CDK (Chemistry Development Kit). My abcl-cdk adventures actually went reasonably well and I was able, eventually, to get ABCL talking nicely to CDK. Of course I wanted more than just that, I wanted interoperability between the CDK and my half-round wheel, chemicl, a cheminformatics package I started writing in Common Lisp. This is where the train began to fall of the tracks.

ABCL and cxml-stp

A while back, in an earlier, aborted attempt to get some of my chem/bioinformatics(https://github.com/slyrus/cl-bio) stuff working with ABCL I noticed that plexippus-xpath couldn't be loaded into ABCL. This was fixed, so I was encouraged that things might work with ABCL. (While I'm on a rant, the ABCL trac issue tracker is really slow...). However, cxml-stp seems to break ABCL.

Hopefully this is a fixable bug and some future version of ABCL will work with cxml-stp.

In the meantime...

SBCL and Java

So, I figured I'd try some other approaches to getting Java and a Common Lisp implementation to play nice. I know, you're thinking "why doesn't the dude just use clojure? After all, that's what clojure was designed for!" Well, that's a good question. I did use clojure for some earlier explorations with CDK and, while the java integration generally works well, I have a bunch of existing Common Lisp code I'd like to use and, at the time at least, it seemed like all of the clojure wrappers where thin wrappers around ugly Java libraries. I've grown to know and love many Common Lisp libraries, many of which are nicely available in QuickLisp, and I'd like to be able to use those (things like cxml-stp, plexippus-xpath, opticl, etc...).

Anyway, I tried to get some sort of SBCL Java interoperability working. Three possibilities appeared: 1) jfli, 2) foil and 3) cl+j. Turns out jfli is (was?) Rich Hickey's pre-clojure Common Lisp. I'm guessing that the challenges in getting jfli to work with any of reasonably Common Lisp implementations was part of the motivation behind clojure. In any event, it doesn't seem that jfli works under SBCL.

Next, I looked at foil, which appears to use sockets to communicate to another process running a JVM. This sounded suboptimal but, presumably, workable. Turns out foil looks like some sort of windows-only beast with a bunch of C# files. Not for me.

Finally, I looked at cl+j and it turns out there are some scary warning messages about how cl+j can't possibly work with SBCL's foreign threads handling mechanism. Bummer. This seems somewhat unreasonable on SBCL's part. Surely some amount of engineering should make it possible to have both a JVM and SBCLs runtime running in the same process. Unfortunately, I'm too out of practice with SBCL internals to give this much of a go at this point. Bummer again.

CCL and Java

Ok, next approach. How about cl+j and Clozure Common Lisp (CCL)? Seemed reasonable, but, unfortunately, hung just like SBCL did. Presumably this is more of a MacOS issue than a CCL issue, as cl+j is supposed to work with CCL, but maybe just on other non-mac platforms.

Now what?

So, it seems I'm stuck without a viable approach to using the common lisp libraries I want and the java libraries I want in the same process. Perhaps the ABCL bug will get fixed. Perhaps JVM integration would make a good summer project for the next SBCL Summer of Code.

More fun with CDK and ABCL Lisp

ticagrelor

The drug ticagrelor (marketed as Brilinta by AstraZeneca) is an inhibitor of platelet activation and aggregation that has been shown to reduce the frequency of cardiovascular events in patients with acute coronary syndrome.

The CHEBI page for ticagrelor tells us that the SMILES for ticagrelor is:

CCCSc1nc(N[C@@H]2C[C@H]2c2ccc(F)c(F)c2)c2nnn([C@@H]3C<a href="OCCO">C@H</a><a href="O">C@@H</a>[C@H]3O)c2n1

So we can read that in as follows:

(eval-when (:compile-toplevel :load-toplevel :execute)
  (asdf:load-system 'abcl-cdk))

(cl:defpackage :ticagrelor
  (:use :common-lisp :abcl-cdk))

(cl:in-package :ticagrelor)

(defparameter *ticagrelor*
  (read-smiles-string
   "CCCSC1=NC2=C(C(=N1)N[C@@H]3C[C@H]3C4=CC(=C(C=C4)F)F)N=NN2[C@@H]5C[C@@H](<a href="[C@H]5O">C@H</a>O)OCCO"))

And we can render a 2-d depiction as follows:

(mol-to-svg *ticagrelor* "ticagrelor.svg")
CL-USER> (in-package :ticagrelor)
#<PACKAGE TICAGRELOR>
TICAGRELOR> (mol-to-svg *ticagrelor* "ticagrelor.svg")
"ticagrelor.svg"

ticagrelor SVG

Notice the 6 nice chiral bonds. This is all well and good, but let's jazz things a bit by rendering the molecule on a black bacground with white bonds:

(let ((*background-color* (java:jfield "java.awt.Color" "black"))
      (*default-bond-color* (java:jfield "java.awt.Color" "white")))
  (mol-to-svg *ticagrelor* "ticagrelor-inverted.svg"))

ticagrelor inverted SVG

There, now we have a nice pretty picture of ticagrelor. Thanks CDK!

ABCL-CDK update part 2 Lisp

# An update on using the Chemistry Development Kit (CDK) with ABCL, Part 2 #

Rendering Stereochemical Molecules

You may recall that in my original blog post on using CDK with ABCL I had an example for reading a description of a molecule (a SMILES string) and rendering a picture of the 2-d structure of the molecule. Let's take another look at this process and see where things went awry and how they have gotten better.

The following line reads in a description of the amino acid valine, creates returns a new AtomContainer object:

(defparameter *valine* (abcl-cdk:parse-smiles-string "CC(C)[C](C(=O)O)N"))

Evaluating this gives:

CL-USER> (defparameter *valine* (abcl-cdk:parse-smiles-string "CC(C)[C](C(=O)O)N"))
*VALINE*
CL-USER> *valine*
#<org.openscience.cdk.AtomContainer AtomContainer(1954296239, #A:8, .... {50F523C0}>

We can write this molecule to an SVG file with the following:

(abcl-cdk:mol-to-svg *valine* "valine.svg")

valine SVG

So far so good. But the problem is that valine actually comes in two forms that are mirror images of each either. Think a left-handed version, l-valine, and a right-handed version, d-valine. The central carbon atom in valine has four neighbors, two carbons (which are functionally distinct as they themselves have distinct neighbors), a nitrogen, and a hydrogen. These four neighbors are arranged in a tetrahedral configuration and can be arranged in two distinct non-superimposable configurations, giving rise to a tetrahedral chiral center. A given chiral molecule and its mirror image are known as enantiomers.

Let's assume that we're really interested in the biologically important enantiomer, l-valine. Fortunately the SMILES spec has support for representing this information and we can write (and read) l-valline as:

(defparameter *l-valine* (abcl-cdk:parse-smiles-string "CC(C)[C@@H](C(=O)O)N"))

The problem with the 2012-era CDK was that it just ignored this information and, until recently, didn't draw the 2-d structure in such a way as to show the stereochemistry.

Luckily, recent changes to the CDK add support for precisely this.

Reading and writing a chiral SMILES string

CL-USER> (defparameter *l-valine* (abcl-cdk:read-smiles-string "CC(C)[C@@H](C(=O)O)N"))
*L-VALINE*
CL-USER> *l-valine*
#<org.openscience.cdk.AtomContainer AtomContainer(31488044, #A:8, At.... {2B0F0F71}>
CL-USER> (abcl-cdk:write-chiral-smiles-string *l-valine*)
"CC(C)[C@@H](C(=O)O)N"

So we can now do a round-trip to and from a chiral smiles string with CDK without losing the stereochemistry information. Hooray for CDK 1.5.4!

Render a 2-d depiction of a chiral molecule to an SVG file:

(abcl-cdk:mol-to-svg *l-valine* "l-valine.svg")

l-valine SVG

Double hooray for CDK 1.5.4!

Just for good measure, let's render the other entantiomer of valine, d-valine:

CL-USER> (defparameter *d-valine* (abcl-cdk:read-smiles-string "CC(C)[C@H](C(=O)O)N"))
*D-VALINE*
CL-USER> (abcl-cdk:mol-to-svg *d-valine* "d-valine.svg")
"d-valine.svg"

l-valine SVG

Notice that the bond connecting the carbon in the middle of the molecule and the nitrogen is now a solid wedged bond (indicating that the bond is going up and that the nitrogen should be considered as being above the plane created by the bonds carbon-carbon bonds.

Explicit configurations around double bonds

In addition to the tetrahedral chiral centers mentioned, another important class of stereochemistry is the configurations around double bonds. For a simple example, let's consider the molecule 2-butene, or as it is known by its IUPAC name, but-2-ene.

CL-USER> (defparameter *but-2-ene* (abcl-cdk:read-smiles-string "CC=CC"))
*BUT-2-ENE*
CL-USER> (abcl-cdk:mol-to-svg *but-2-ene* "but-2-ene.svg" :height 128 :width 128)
"but-2-ene.svg"

but-2-ene SVG

Notice that the two single bonds are shown as going in opposite directions from the atoms involved in the double bond in the middle. But this is really just an accident. We didn't explicitly specify the stereochemical configuration. The convention for describing configurations around double bonds is known as the E/Z notation. If we want to ensure that the two terminal carbons are on the same side of the double bond (represented by Z (short for zusammen, which supposedly means together in German)), we can read an appropriate so-called chiral SMILES string (I say so-called because we're actually describing the stereochemistry of explicit configuration around a double bond, not a chiral center, but the SMILES folks play fast and loose with the nomenclature):

CL-USER> (defparameter *z-but-2-ene* (abcl-cdk:read-smiles-string "[H]/C(C)=C([H])C"))
*Z-BUT-2-ENE*
CL-USER> (abcl-cdk:mol-to-svg *z-but-2-ene* "z-but-2-ene.svg" :width 128 :height 128)
"z-but-2-ene.svg"

z-but-2-ene SVG

Now we see that the two terminal carbons are indeed on the same side of the double bond between the two internal carbons, and that when we draw an explicit configuration around a double bond the otherwise implicit hydrogens are shown in their proper position. Another hooray for CDK 1.5.4!

While we're at it, notice that we have explicitly provided width and height arguments to abcl-cdk:mol-to-svg in the previous two examples. The CDK 2-d rendering code requires some dimension arguments that seem to affect the size of things like bonds and atom symbols. It's not entirely clear what the best way to figure out what parameters should be used to display a given molecule at a given size, so we'll use some combination of (hopefully) lucky guesses and trial and error. 128x128 seems to look good for small molecules like the various flavors of butene.

Support for tetrahedral chiral centers and explicit stereochemical configuration around double bonds is a big win for CDK. Many thanks to John May and the rest of the CDK team for including this in the latest release. We'll look at some more complicated examples and additional features of abcl-cdk in the next installment.

CDK/ABCL Update Lisp

An update on using the Chemistry Development Kit (CDK) with ABCL

Last year I explored using the CDK with ABCL. It was nice to see that ABCL could call out to the CDK and that I could use a Common Lisp environment for dealing with various kinds of chemistry data, molecules, atoms, bonds, etc...

The seemingly straightforward use-case I had in mind was to be able to read and write descriptions of molecules and to render these as 2-d drawings in various ways. This sort of worked, when I tried to work with more complex molecules, particularly molecules with explitic stereochemistry such as tetrahedral chiral centers or explicit configurations around double bonds, things broke down. I'm pleased to report that things have gotten much better in the past year or so!

First, the preliminaries. The canonical home for the cdk source code has for some time been somewhat difficult to track down, or, rather, I should say it's hard to know which particular version of the source code is the canonical version at any given time. But it does seem like https://github.com/cdk/cdk is the current canonical location. Unfortunately, the good folks at cloudera seem to have grabbed the top-ranking google spot for CDK with the Cloudera Development Kit. As awesome as the cloudera folks are, that's not what we're after. And the second hit on google is for Egon Willighagen's personal CDK repository, which is pretty damn close to the canonical repository these days, but I think https://github.com/cdk/cdk is actually the preferred place to grab the source at any given point in time.

Until quite recently, I needed a branch of CDK from John May that can be found at https://github.com/johnmay/cdk/tree/master+. But fortunately these changes were rolled into the recent CDK 1.5.4 and John May's blog post describes many of the changes that went into 1.5.4.

So, now we're good to go with either the 1.5.4 release or, at least for the moment, the current HEAD of the master branch which will presumably one day become CDK 1.5.5.

Getting started with CDK

git clone http://github.com/cdk/cdk.git
cd cdk

If we want to use version 1.5.4 we can either hunt it down from some maven repository, which I generally hate doing, or build our own:

git checkout cdk-1.5.4
ant dist-large

Note that we need to make sure that ant builds the dist-large target as we want all of the CDK files to be rolled into one jar. We could use the individual jars but that would be a lot more work.

Now that we have the jar, I'm going to hold my nose and suggest that we use maven for the installation of the jar and then rely on ABCL's ASDF extensions that interact with maven to access the required jar files. Certainly other approaches could work too, but this one seems simple enough. In order to install the CDK jar using maven we can do the following:

CDK_VERSION=1.5.4
CDK_BUILD_VERSION=1.5.4
mvn install:install-file -DgroupId=org.openscience.cdk -DartifactId=cdk \
    -Dversion=${CDK_VERSION} -Dpackaging=jar \
    -Dfile=dist/jar/cdk-${CDK_BUILD_VERSION}.jar

If we look at the abcl-cdk ASDF system defintion we see:

(asdf:defsystem :abcl-cdk
  :name "abcl-cdk"
  :author "Cyrus Harmon"
  :serial t
  :default-component-class asdf:cl-source-file
  :components
  ((:mvn "org.freehep/freehep-graphics2d" :version "2.2.1")
   (:mvn "org.freehep/freehep-graphicsio-pdf" :version "2.2.1")
   (:mvn "org.freehep/freehep-graphicsio-svg" :version "2.2.1")
   (:mvn "org.openscience.cdk/cdk" :version "1.5.4")
   (:file "package")
   (:file "utilities")
   (:file "smiles")
   (:file "geometry")
   (:file "render")
   (:file "inchi")))

If we want to build from the current HEAD of the master branch and install this into maven we would do:

CDK_VERSION=1.5.5
CDK_BUILD_VERSION=1.5.5.git
mvn install:install-file -DgroupId=org.openscience.cdk -DartifactId=cdk-git \
    -Dversion=${CDK_VERSION} -Dpackaging=jar \
    -Dfile=dist/jar/cdk-${CDK_BUILD_VERSION}.jar

Note that we change the name of the artifact to cdk-git here. We do this because (recent versions of) ASDF only accepts dotted integers for versions, so we can't request :version "1.5.5-git". Therefore we change the name of the artifact and use cdk-git for devlopment versions and cdk for release versions.

So now if we want to use the work-in-progress 1.5.5 git HEAD version we have to change the line in the ASDF system definition to:

   (:mvn "org.openscience.cdk/cdk-git" :version "1.5.5")

Both versions should suffice for the following examples. I'm going to assume we're using the 1.5.5-git version from here on out.

ABCL and abcl-cdk

So, of course we need ABCL, and we'll need abcl-cdk:

git clone https://github.com/slyrus/abcl-cdk

To load abcl-cdk, do:

(pushnew *default-pathname-defaults* asdf:*central-registry*)
(asdf:load-system 'abcl-cdk)

To load the examples do:

(asdf:load-system 'abcl-cdk-examples)

We'll walk through some examples in the next installment.

Seems Legit General
I'm not sure exactly how this ended up on the internet, but the address is correct and that looks my messy scribble handwriting. I have no idea where the rose came from. http://www.gdao.org/items/show/811999
More on ABCL and Maven and ABCL-CDK Lisp

So in the last installment, we saw a few problems with ABCL, maven and libraries to be supplied by maven. I've tracked a few of these things down, learned a few things, and released a trivial new library.

Maven 3.0.3 vs 3.0.4

It turns out I had maven 3.0.3 installed. I'm not sure where this came from. XCode perhaps? In any event, the ABCL maven stuff requires version 3.0.3 or later, so I was OK there, but it depends on some features that are only found in 3.0.4 (some HttpWagon or something or other).

Removing the 3.0.3 maven and installing homebrew's maven 3.0.4 fixes this problem. If there's an easy way to make the ABCL maven-embedder stuff work with 3.0.3 or 3.0.4, that would be nice.

Other Remote Repositories

I'm still relying on the freehep 2d graphics libraries and these aren't in maven central, but rather in the freehep maven repo. How can we tell the ABCL maven stuff to search this repository? There may be a way, but if so I haven't found it yet.

Using Sharpsign-quote

It turns out one can do (#"foo" ...) instead of (java:jcall "foo" ...), so I've switched over my code to this style.

ABCL-CDK

It's a farily trivial package at this point, but I've released abcl-cdk which provides some examples of calling the CDK from ABCL.

Cheminformatics, Java and, of Course, Common Lisp Lisp

in which I attempt to write some Common Lisp code to be run in a Common Lisp environment that runs inside a virtual machine designed to support a C-like language that incorporated a few lispy features, so that I can use a library written in said C-like language with my Common Lisp code, or something like that.

Ok, it's time to see if I can get the CDK and ABCL playing nicely together.

CDK

The CDK (Chemistry Development Kit) is java library for dealing with various type of chemistry data, elements, atoms, bonds, molecules, etc... and various computed or measured properties thereof. I should point out that the CDK isn't really just one library, but rather a family of various related libraries. We'll come back to building an appropirate version of CDK in a moment, but, for now, let's move on.

ABCL

ABCL is an implementation of the Common Lisp programming language that runs on the JVM. Besides running (in theory) on any platform that supports the JVM, ABCL provides for relatively smooth interoperability with other code (such as Java libraries) that run on the JVM.

Building CDK

First, we need the CDK. Some of the main things I want to do with the CDK are to instantiate a molecule from a SMILES string, get a 2D representation of the molecule, and compute various properties (molecular weight, charge, etc...) of the molecule. The only problem with that is the main CDK doesn't actually support 2D rendering. Before we get into how to get a CDK that does 2D rendering, I should take this opportunity to gripe about the various versions of the CDK for a moment.

Sourceforge's CDKs

One of the things that bothers me about sourceforge-hosted projects is that there are often too many "home pages" for a project. For the CDK we have two:

http://sourceforge.net/projects/cdk/

and

http://cdk.sf.net, which in turn redirects to:

http://sourceforge.net/apps/mediawiki/cdk/index.php?title=Main_Page

Ok, so there's a bunch of info on the sf.net page and the CDK Development with Git page is kind enough to point us over to github:

 $ git clone git://github.com/cdk/cdk.git

which, of course implies that there is something of a home/project page over at https://github.com/cdk/cdk. And, sure enough, there is.

From there we can see that github's cdk/cdk project is actually a fork of Egon Willighagen's cdk git repository.

JChemPaint

Of course none of these (at least on first glance) contain the 2D rendering code we want. It turns out that's not part of the core CDK, but rather part of the JChemPaint code. The JChemPaint project is another effort, closely related to CDK, that has applets/applications for interactive 2D molecule editing, 2D structure rendering code, etc... So, on the JChemPaint page we see links to various downloads where we have CDK, JChemPaint, CDK-JChemPaint, etc...

Wait, what? CDK-JChemPaint? Hang on a second! We'll come back to that in a moment. First we see that the CDK code is moving ahead rapidly but that the JChemPaint is from September 2011 and the JChemPaint (development) code is from November 2010! Hmm...

So, near as I can tell, JChemPaint was a separate, but related-to-CDK project and at some point somebody cribbed some of the reusable bits from JChemPaint and put them into CDK-JChemPaint.

But then it seems like maintaining a separate CDK-JChemPaint seemed a bit silly and egonw (?) has been maintaining a branch of the CDK with some of the JChemPaint (or is it CDK-JChemPaint?) functionality incorporated: https://github.com/egonw/cdk/tree/13-unsorted-patches. This is what I orginally used for the 2D rendering code. It turns out that there is a newer, better (?) version of the CDK with the appropriate JChemPaint bits added, the 381-14x-renderextra branch.

Back to building CDK...

First we get the code

git clone git://github.com/cdk/cdk.git

Then we need to pull from egonw's branch (I suppose we could have just cloned this first):

git remote add egonw git://github.com/egonw/cdk.git
git pull egonw

And now let's checkout the branch we want:

git checkout 381-14x-renderextra

Ok, now we've got the code. We build it with ant:

ant

Assuming we have java properly setup, things should build fine. Now we have a brazillion jar files in cdk/dist/jar. Wait, that's not what we want. We want a single CDK jar that we can (presumaly) point our CLASSPATH to, or at least do whatever the ABCL equivalent is. Turns out there's a "dist-large" target in the CDK build.xml file so we can build that with:

ant dist-large

Ok, now we have dist/jar/cdk-1.4.8.git.jar.

Installing CDK

So what are we supposed to do with that? Well, it appears that some folks in the Java world use this thing called maven for both remote and local package fetching/deployment/whatever-you-call-it-in-the-java-world.

So, assuming we have maven around, we can install a CDK which we can later, hopefully, use with ABCL with the following:

export CDK_VERSION=1.4.8-SNAPSHOT
export CDK_BUILD_VERSION=1.4.8.git
mvn install:install-file -DgroupId=org.openscience.cdk -DartifactId=cdk \
    -Dversion=${CDK_VERSION} -Dpackaging=jar \
    -Dfile=dist/jar/cdk-${CDK_BUILD_VERSION}.jar

Notice that we need two distinct version identifiers as maven wants nice clean version numbers (and doesn't really like the 1.4.8.git version) and most maven-ized projects seem to use the SNAPSHOT suffix for in-progress releases. On the other hand, the CDK build.props file sets the version to 1.4.8.git. We use the two identifiers here so that cdk-1.4.8.git.jar gits installed as org.openscience.cdk/cdk version 1.4.8-SNAPSHOT.

Fetching Java Dependency Libraries

At this point I should point out that I'm not exactly a big maven fan. It's no quicklisp. But there must be a reason why folks in the Java world use it. Let's see what it takes to download some more dependencies (presumably other jar files we're going to use later). So, we fire off some queries on our favorite search engine for, say, "maven fetch", and we see things like http://stackoverflow.com/questions/1895492/how-can-i-download-a-specific-maven-artifact-in-one-command-line and http://stackoverflow.com/questions/4568633/use-maven-just-to-fetch-some-library-jars. Oh, man. I just want to download some jars and now I'm being told to use Ivy (whatever that is) or some crazy maven plugin where all I need to do is edit my ~/.m2/settings.xml and a ~/.m2/plugin-registry.xml file? No thanks!

(Note: I think there's some built-in functionality in ABCL to handle this next task -- but I couldn't get it to work!)

Fortunately, the clojure folks, who occasionally drink a little too much Java toolchain (tooling?) Kool-aid for my taste, but at least have enough taste to want a lisp-ish language, have gotten here first and the standard tool for these kinds of jobs seems to be Phil Hagelberg's leiningen. I'm going to assume for the moment that you actually have leningen lying around, or that you're smart enough to figure out some other way to get these dependencies installed if not.

So, to trick leiningen into doing some dirty work for us, we make a project.clj that looks as follows:

(defproject abcl-cdk-hacking "0.0.0"
  :description "Fake project for fetching abcl-cdk-hacking dependencies"
  :dependencies [[org.freehep/freehep-graphics2d "2.1.1"]
                 [org.freehep/freehep-graphicsio-pdf "2.1.1"]
                 [org.freehep/freehep-graphicsio-svg "2.1.1"]]
  :repositories {"freehep" "http://java.freehep.org/maven2"})

Once we have this we can do: lein deps

which will install the dependencies for us somewhere in ~/.m2 (let's forget about system-wide installs for the moment).

Using Java Dependency Libraries

Ok, we should be ready to figure out how to make ABCL talk to CDK now. First we just have to figure out how to make ABCL talk to CDK. Wait, wasn't that what I just said? Yes, but, how do we do it? Fortunately, the ABCL guys anticpated this problem and added what they call abcl-asdf. By doing a (require 'abcl-asdf) (oh wait, and a (require 'abcl-contrib) before that, I think), we can tell our ASDF system how to tell ABCL to tell the JVM where to find the jars we need put on the CLASSPATH, or something like that.

(eval-when (:compile-toplevel :load-toplevel :execute)
  (cl:require 'abcl-contrib)
  (cl:require 'abcl-asdf))
(asdf:defsystem :abcl-cdk-hacking
  :name "abcl-cdk-hacking"
  :author "Cyrus Harmon"
  :serial t
  :default-component-class asdf:cl-source-file
  :components
  ((:mvn "org.freehep/freehep-graphics2d" :version "2.1.1")
   (:mvn "org.freehep/freehep-graphicsio-pdf" :version "2.1.1")
   (:mvn "org.freehep/freehep-graphicsio-svg" :version "2.1.1")
   (:mvn "org.openscience/cdk" :version "1.4.8-SNAPSHOT")
   (:file "abcl-cdk-hacking")))

We can add :mvn components to our ASDF system and the abcl-asdf machinery will add the maven artifact (?) or jar file or whatever to the CLASSPATH, or at least somehow make it so the classes are available to the JVM.

Well, that's the theory anyway. In practice this doesn't work with a stock ABCL because of the following bug: http://trac.common-lisp.net/armedbear/ticket/204. Once this is fixed (via the patch attached to the bug report), and ABCL rebuilt, a simple:

(asdf:load-system 'abcl-cdk-hacking)

will load the dependencies into the JVM and we should be off and running, finally.

Calling Static Java Methods

Ok, now we need to do some Java interop stuff with CDK. First thing we want to do is call a static Java method.

We're going to need an instance of the org.openscience.cdk.DefaultChemObjectBuilder class. We can get this via the static getInstance method as follows:

(defparameter *dcob*
  (java:jcall
   (java:jmethod (java:jclass "org.openscience.cdk.DefaultChemObjectBuilder")
                 "getInstance")
   nil))

So, we have the java:jclass function to lookup a class, the java:jmethod function to lookup a method and the java:jcall function to invoke the method. So far so good.

Creating Java Objects

Now we're going to need to create a Java object. Turns out we can do that with the java:jnew function:

(defparameter *smiles-parser*
  (java:jnew "org.openscience.cdk.smiles.SmilesParser" *dcob*))

This gives us a new instance of the org.openscience.cdk.smiles.SmilesParser class.

Calling Methods on Java Objects

Finally, we can call a java method with java:jcall, as we do with the parseSmiles method here:

(defparameter *caffeine*
  (java:jcall "parseSmiles" *smiles-parser* "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"))

Java Class Identifiers

Well, the clojure folks have figured out that some people, at least, hate typing long java class names all over the place, and the ABCL java interop stuff seems to require lots of typing of long java names. In a perhaps misguided attempt to relieve this burden and provide something more like clojure's syntax, I present the jimport macro:

(defmacro jimport (java-package class &optional package)
  `(defparameter ,(apply #'intern class
                         (when package (list package)))
     (concatenate 'string (symbol-name (quote ,java-package))
                  "."
                  (symbol-name (quote ,class)))))

This macro allows one to do:

(jimport |org.openscience.cdk| |DefaultChemObjectBuilder|)

which then defines the value of the |DefaultChemObjectBuilder| symbol (in the current packaage, at least if not specified in the jimport call) to be "org.openscience.cdk.DefaultChemObjectBuilder", so now we can do:

(java:jcall
 (java:jmethod
  (java:jclass |DefaultChemObjectBuilder|) "getInstance")
 nil)

Not a huge win, but it does allow the compiler to ensure that we're seeing identified symbols, rather than just potentially random strings for Java classes.

Java List<Foo>'s

One of the CDK classes, org.openscience.cdk.renderer.AtomContainerRenderer, has a constructor that expects a List<IGenerator<IAtomContainer>> as one of its arguments. How do we invoke the constructor with one of those? Well, it turns out we can't just use a lisp list as the argument. We have to make a java List of some sort. It turns out there's some infrastructure provided by ABCL to help with this, although nothing I can find that does exactly what I need. The extensible-sequence stuff allows us to make a lisp sequence that is actually some sort of instance of the java.util.List interface. I use a java.util.Vector and provide a helper function called jlist as follows:

(defun jlist (&rest initial-contents)
  (sequence:make-sequence-like
   (java:jnew |Vector|) (length initial-contents)
   :initial-contents initial-contents))

So, now we've got a way to create java lists that we can pass on to the constructor.

Getting a java stream from a lisp stream

One final bit of consternation, we'd like to be able to create streams using the sane lisp syntax like:

(with-open-file (out-stream pathname :direction :output
                                       :if-exists :supersede
                                       :element-type :default)
  ...)

but then use the corresponding streams where we need java streams. In particular the freehep SVG and PDF libraries want java streams for files. It turns out there's a function to get the java output stream associated with a lisp stream, getWrappedOutputStream. We use that to get the java.io.Stream or whatever and we're good to go.

Now we can define our mol-to-svg function as follows:

(defun mol-to-svg (mol pathname)
  (with-open-file (out-stream pathname :direction :output
                                       :if-exists :supersede
                                       :element-type :default)
    (let*
        ((r (java:jnew |AtomContainerRenderer|
                       (jlist
                        (java:jnew |BasicAtomGenerator|)
                        (java:jnew |BasicBondGenerator|)
                        (java:jnew |BasicSceneGenerator|))
                       (java:jnew |AWTFontManager|)))
         (vg (java:jnew |SVGGraphics2D|
                        (java:jcall "getWrappedOutputStream" out-stream)
                        (java:jnew |Dimension| 320 320)))
         (adv (java:jnew |AWTDrawVisitor| vg)))
      (java:jcall "startExport" vg)
      (java:jcall "generateCoordinates"
                  (java:jnew |StructureDiagramGenerator| mol))
      (java:jcall "setup" r mol (java:jnew |Rectangle| 0 0 100 100))
      (java:jcall "paint" r mol adv
                  (java:jnew (java:jconstructor |Rectangle2D$Double| 4)
                             10 10 300 300)
                  java:+true+)
      (java:jcall "endExport" vg))))

Finally, we can render our molecule of choice, caffeine to an SVG file thusly:

(mol-to-svg *caffeine* "/tmp/caffeine.svg")

And we see:

caffeine SVG

Voila!

Next time hopefully we can explore integrating chemicl and CDK directly with ABCL, but I think that requires fixing an ABCL bug that prevents it from successfully compiling plexxipus-xpath.

Previous 1 2 3 4 5 6 7 Next