Skip to main content

Posts

Showing posts from May, 2010

OpenOffice: Chart X Axis Format Scale

After doing the experiments, I drew graphs using OpenOffice. First, I used chart type: Column and Line. Then I couldn't format Chart X Axis (i.e., cannot untick "Automatic" to change the value of Scale Minimum, Maximum, etc.) Finally, I figured out that I should have used Chart type: XY (Scatter), since in the other Chart types, it considers X Axis value as Text.

SAXParser: too many exceptions for invalid XML character..

I'm working on my Similarity Search project, in which I have to implement the Tree Edit Distance and Traversal String Edit Distance. Trees are all represented in XML format and I'm using SAXParser to parse those XML files in java. I've used it a lot of times before but still, I don't quite like. So my first step is to create a valid XML database. However, "valid" to be parsed using SAXParser is complicated!! Here is what I get again and again: File Read Error: org.xml.sax.SAXParseException : The content of elements must consist of well-formed character data or markup. org.xml.sax.SAXParseException: The content of elements must consist of well-formed character data or markup. The reasons can be different, like: - Tags cannot contain number (e.g., <1> is an invalid tag) - Tags cannot contain some symbols, like {, ., ?, etc. ("_" or "-" is fine) - Tags cannot be empty However, in my database, all of the tag are numbers.. To make it a val

Blogger: format your code (java, python, etc.) in your blog

I used prettyPrint to auto-format codes in my blogger. You don't need to download but link to it as follows: Step 1: Go to Customize -> Edit HTML , paste the following code inside tag <head> </head> <link href='http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.css' rel='stylesheet' type='text/css'/> <script src='http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.js' type='text/javascript'/> Step 2: Change tag <body> to <body onload='prettyPrint()'> Step 3: How to use: Put your code inside the following tag: <pre class="prettyprint"> <!-- your code here --> </pre> Note that you can also change and upload your own css file and link to that address to format your code.

Linux command: keep on running a process after exiting from a shell prompt

I logged in a server remotely via ssh and I wanted to keep my program running even after turning off my computer (i.e., exiting from the shell prompt that my program is running on). nohup [command] > [file_output] & (ignoring input and redirecting stderr to std out) For example: I want to continue running my java process: nohup java -jar cluster.jar > output.txt & - To keep updated about the output file and display it: tail -f output.txt Note: - if you want to kill this process, use: kill -9 $pid - Thanks, BM :-)

openNLP: getting started, code example

openNLP is an interesting tool for Natural Language Processing. Today, it took me a while to get started. So I want to write again that next time (or anyone) who wants to try it out, it will take less time for you. Here we go: 1. Download and build: This is the main website: http://opennlp.sourceforge.net/ You can either download the package there and build to the .jar file (where you have to set your $JAVA_HOME environment - see below). Or you can directly download the .jar file from this link . This step took me a while since I didn't know how to set my $JAVA_HOME, and didn't find out that there's already a .jar file to download. 2. Some code So now, you want to start with some code. Here is some sample code for doing Sentence Detection and Tokenization. Note that you can either download the models from the previous website or have the training dataset yourself. In this example, I used 2 models of openNLP (EnglishSD.bin.gz and EnglishTok.bin.gz). //This is

Mac: enviroment variable JAVA_HOME set

I was building some java tools (OpenNLP), but it required me to set the variable JAVA_HOME in my Macbook. First, I tried with "which java" and it led me to "/usr/bin/java", which is not a direct link (?!). After a while, I found something like "/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home" but it didn't work as well. So finally: export JAVA_HOME=/Library/Java/Home :-)

Python: numpy read array from file

Bravo numpy! Ok, now I have one more reason to use numpy instead of list in python. From a CSV file, you can read into a numpy array: * CSV file format: (test.csv) A B C 1 2 3 4 5 6 * Python code: >>import numpy >>numpy.loadtxt('test.csv',delimiter=' ', dtype = float, skiprows=1) [[ 1. 2. 3.] [ 4. 5. 6.]] Here we go :-)

Java: Read all .txt files in a directory

List all .txt files in a directory (args[0]) File inputDir = new File(args[0]); //list all .txt files File [] children = inputDir.listFiles(); for (int i = 0; i < children.length; ++i){ if (!children[i].getName().endsWith(".txt")) continue; System.out.println("Reading file: " + children[i].getPath()); }