openNLP: getting started, code example

openNLP is an interesting tool for Natural Language Processing. Today, it took me a while to get started. So I want to write again that next time (or anyone) who wants to try it out, it will take less time for you.

Here we go:

1. Download and build:
This is the main website: http://opennlp.sourceforge.net/

You can either download the package there and build to the .jar file (where you have to set your $JAVA_HOME environment - see below). Or you can directly download the .jar file from this link.

This step took me a while since I didn't know how to set my $JAVA_HOME, and didn't find out that there's already a .jar file to download.

2. Some code

So now, you want to start with some code. Here is some sample code for doing Sentence Detection and Tokenization.

Note that you can either download the models from the previous website or have the training dataset yourself.

In this example, I used 2 models of openNLP (EnglishSD.bin.gz and EnglishTok.bin.gz).


 //This is the path to  your model files
SentenceDetector  sendet = new  SentenceDetector("opennlp-tools-1.4.3/Models/EnglishSD.bin.gz"); 
  Tokenizer tok = new  Tokenizer("opennlp-tools-1.4.3/Models/EnglishTok.bin.gz");

//Sentence detection
  String[] sens = sendet.sentDetect("This is sentence one. This is  sentence two.");

for (int i=0;  i<sens.length; i++)
         {
             System.out.println("Sentence " + i + ": ");
            String[] tokens = tok.tokenize(sens[i]);
            for (int j=0; j<sens.length; j++)
                System.out.print(tokens[j] + " - ");
            System.out.println();
        }

3. Other notes
If you got some exception when running the above code, it's probably that you didn't include the .jar files (e.g., maxent.jar and trove.jar) in the /lib folder.

Good luck!

Wandering around

Search This Blog

openNLP: getting started, code example

Labels

Comments

Post a Comment

Popular posts from this blog

Python Tkinter: Changing background images using key press

Skip-gram model and negative sampling

Parameter estimation