Skip to main content

openNLP: getting started, code example

openNLP is an interesting tool for Natural Language Processing. Today, it took me a while to get started. So I want to write again that next time (or anyone) who wants to try it out, it will take less time for you.

Here we go:

1. Download and build:
This is the main website: http://opennlp.sourceforge.net/

You can either download the package there and build to the .jar file (where you have to set your $JAVA_HOME environment - see below). Or you can directly download the .jar file from this link.

This step took me a while since I didn't know how to set my $JAVA_HOME, and didn't find out that there's already a .jar file to download.


2. Some code


So now, you want to start with some code. Here is some sample code for doing Sentence Detection and Tokenization.

Note that you can either download the models from the previous website or have the training dataset yourself.

In this example, I used 2 models of openNLP (EnglishSD.bin.gz and EnglishTok.bin.gz).

//This is the path to your model files
SentenceDetector sendet = new SentenceDetector("opennlp-tools-1.4.3/Models/EnglishSD.bin.gz");
Tokenizer tok = new Tokenizer("opennlp-tools-1.4.3/Models/EnglishTok.bin.gz");

//Sentence detection
String[] sens = sendet.sentDetect("This is sentence one. This is sentence two.");

for (int i=0; i<sens.length; i++)

{
System.out.println("Sentence " + i + ": ");
String[] tokens = tok.tokenize(sens[i]);
for (int j=0; j
<sens.length; j++)
System.out.print(tokens[j] + " - ");
System.out.println();
}



3. Other notes
If you got some exception when running the above code, it's probably that you didn't include the .jar files (e.g., maxent.jar and trove.jar) in the /lib folder.

Good luck!

Comments

Popular posts from this blog

Pytorch and Keras cheat sheets

Sigmoid, tanh, ReLU functions. What are they and when to use which?

If you are working on Deep Learning or Machine Learning in general, you have heard of these three functions quite frequently. We know that they can all be used as activation functions in neural networks. But what are these functions and why do people use for example ReLU in this part, sigmoid in another part and so on? Here is a friendly introduction to these functions and a brief explanation of when to use which. Sigmoid function Output from 0 to 1 Exponential computation (hence, slow) Is usually used for binary classification (when output is 0 or 1) Almost never used (e.g., tanh is a better option) Tanh function A rescaled logistic sigmoid function (center at 0) Exponential computation Works better than sigmoid ReLU function (Rectified Linear Unit) and its variants Faster to compute Often used as default for activation function in hidden layers ReLU is a simple model which gives 0 value to all W*x + b < 0. The importance is that it introduces t...

Python Tkinter: Changing background images using key press

Let's write a simple Python application that changes its background image everytime you click on it. Here is a short code that helps you do that: import os, sys import Tkinter import Image, ImageTk def key(event): print "pressed", repr(event.char) event.widget.quit() root = Tkinter.Tk() root.bind_all(' ', key) root.geometry('+%d+%d' % (100,100)) dirlist = os.listdir('.') old_label_image = None for f in dirlist: try: image1 = Image.open(f) root.geometry('%dx%d' % (image1.size[0],image1.size[1])) tkpi = ImageTk.PhotoImage(image1) label_image = Tkinter.Label(root, image=tkpi) label_image.place(x=0,y=0,width=image1.size[0],height=image1.size[1]) root.title(f) if old_label_image is not None: old_label_image.destroy() old_label_image = label_image root.mainloop() # wait until user clicks the window except Exception, e: # Skip a...