SAXParser: too many exceptions for invalid XML character..

I'm working on my Similarity Search project, in which I have to implement the Tree Edit Distance and Traversal String Edit Distance.

Trees are all represented in XML format and I'm using SAXParser to parse those XML files in java. I've used it a lot of times before but still, I don't quite like.

So my first step is to create a valid XML database. However, "valid" to be parsed using SAXParser is complicated!!

Here is what I get again and again:

File Read Error: org.xml.sax.SAXParseException: The content of elements must consist of well-formed character data or markup.
org.xml.sax.SAXParseException: The content of elements must consist of well-formed character data or markup.

The reasons can be different, like:
- Tags cannot contain number (e.g., <1> is an invalid tag)
- Tags cannot contain some symbols, like {, ., ?, etc. ("_" or "-" is fine)
- Tags cannot be empty

However, in my database, all of the tag are numbers..
To make it a valid XML file, I have defined a mapping between each number to a valid character in the alphabet using ASCII code. For example: 0 is mapped to a, 1 to b, 2 to c, etc. using the following code:


public String toXMLTagString(String instr) {
  String outstr = "";
  for (int i=0; i<instr.length(); i++)
  outstr+= new Character((char)(Character.getNumericValue(instr.charAt(i))+97)).toString();

  return outstr;
}

So with instr = "01234", outstr = "abcef"
Note: "a" corresponds to 97 in the ASCII code.

Comments

AnonymousMay 29, 2010 at 1:42 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
AnonymousMay 29, 2010 at 1:45 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies

Add comment

Wandering around

Search This Blog

SAXParser: too many exceptions for invalid XML character..

Labels

Comments

Post a Comment

Popular posts from this blog

Python Tkinter: Changing background images using key press

Skip-gram model and negative sampling

Parameter estimation