I'm working on my Similarity Search project, in which I have to implement the Tree Edit Distance and Traversal String Edit Distance.
Trees are all represented in XML format and I'm using SAXParser to parse those XML files in java. I've used it a lot of times before but still, I don't quite like.
So my first step is to create a valid XML database. However, "valid" to be parsed using SAXParser is complicated!!
Here is what I get again and again:
File Read Error: org.xml.sax.SAXParseException: The content of elements must consist of well-formed character data or markup.
org.xml.sax.SAXParseException: The content of elements must consist of well-formed character data or markup.
The reasons can be different, like:
- Tags cannot contain number (e.g., <1> is an invalid tag)
- Tags cannot contain some symbols, like {, ., ?, etc. ("_" or "-" is fine)
- Tags cannot be empty
However, in my database, all of the tag are numbers..
To make it a valid XML file, I have defined a mapping between each number to a valid character in the alphabet using ASCII code. For example: 0 is mapped to a, 1 to b, 2 to c, etc. using the following code:
So with instr = "01234", outstr = "abcef"
Note: "a" corresponds to 97 in the ASCII code.
Trees are all represented in XML format and I'm using SAXParser to parse those XML files in java. I've used it a lot of times before but still, I don't quite like.
So my first step is to create a valid XML database. However, "valid" to be parsed using SAXParser is complicated!!
Here is what I get again and again:
File Read Error: org.xml.sax.SAXParseException: The content of elements must consist of well-formed character data or markup.
org.xml.sax.SAXParseException: The content of elements must consist of well-formed character data or markup.
The reasons can be different, like:
- Tags cannot contain number (e.g., <1> is an invalid tag)
- Tags cannot contain some symbols, like {, ., ?, etc. ("_" or "-" is fine)
- Tags cannot be empty
However, in my database, all of the tag are numbers..
To make it a valid XML file, I have defined a mapping between each number to a valid character in the alphabet using ASCII code. For example: 0 is mapped to a, 1 to b, 2 to c, etc. using the following code:
public String toXMLTagString(String instr) {
String outstr = "";
for (int i=0; i<instr.length(); i++)
outstr+= new Character((char)(Character.getNumericValue(instr.charAt(i))+97)).toString();
return outstr;
}
So with instr = "01234", outstr = "abcef"
Note: "a" corresponds to 97 in the ASCII code.
This comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDelete