R E P O R T
"Parsing XML with DOM"
Having already worked with XML several times, I thought that it would be better
to include some of my real projects I have done during my assistanat at TECFA.
So, the most important cases I give here are :
- Formation Continue, here I had already done the DTD and the XML file is complete, but I parsed the file with Java & Xerces for the needs of this exercise.
- Project staf18 (promotion Fanny), this is a real work that I completely did during my work at TECFA and that will change and improved by the end of June, as the project evolves with time.
I am presenting the DTD's of Formation Continue and project staf18.
Actually, for staf18 I made many DTD's that all together form a complete one
and can be combined in different ways also.
I made some small changes in the DTD of Formation Continue, like adding
the id's for the courses and some other minor ones.
For the formation Continue, I give here the original file that have been
filled in by the users (with my help for some of them). For staf18, I use the xml
files from the students of staf-f.
For the formation continue, I parse the xml file and I create a contents table
for all the existing courses and their modules.
For the staf18 project, I parse 2 files from each student's directory and
I extract useful information such as :
info for the files, group info (names, url's, etc), titles, etc. that
are helpful for the monitoring of the project by the professor.
Technical details - problems
As far as it concerns the DTD's, I found the conception quite easy.
Of course there was always the precious guidance of DKS who with his
experience pointed out the mistakes that I was doing, mostly concerning
the semantics aspect.
For the filling in of the XML file, I used xml-mode of Emacs that works
quite well, although not perfectly :)
For the parsing of the xml files I used the DOM parser of Xerces. I know that
it is slower than SAX, but having not much time to invest to this also, I
decided to do it that way, that seemed to me easier for the time being and
leave SAX for the near future. Of course, for the "catching" of the exceptions
I used SAX that is more precise in most of the cases.
The things that I am proud of (or ashamed of :)
- DTD inclusions:
I have split the DTD into several ones, to manage the project by phases,
so each DTD is being made by several included DTD's.
catching the exceptions with SAX for the staf18 project,
I can notify the students that their file is not valid.
- educative tools:
both projects are being used for courses at TECFA.
- DOM parsing:
it took me some time to understand how it works though,
and I crashed several times the TECFA2 server :)
- going through many directories:
In order to parse info.xml & specification.xml files from each
student directory, I used the File class and it's methods.
References - Bibliography - Sites