Slicing XML Documents

 

 

 

Abstract

 

 

 
Program slicing is a well-known technique to extract the program statements that (potentially) affect the values computed at some point of interest. In this work, we introduce a novel slicing method for XML documents. Essentially, given an XML document (which is valid w.r.t. some DTD), we produce a new XML document (a slice) that contains the relevant information in the original XML document according to some criterion. Furthermore, we also output a new DTD such that the computed slice is valid w.r.t. this DTD. A prototype implementation of the XML slicer has been undertaken.
 
 

Program Slicing

 
 

One of the most widely used program transformations is program slicing. It consists of a decomposition technique for the extraction of those program statements that (potentially) affect the values computed at some point of interest. Program slicing was originally introduced by Weiser and has now many applications such as debugging, code understanding, program specialization, etc.

Some useful links to program slicing works:

Original Definition
M.D. Weiser. Program Slicing. IEEE Transactions on Software Engineering,
10(4):352-357, 1984.

Debugging
Y.A. Liu and S.D. Stoller. Eliminating Dead Code on Recursive Data. Science of Computer Programming, 47:221–242, 2003.

Partial Evaluation
G. Vidal. Forward Slicing of Multi-Paradigm Declarative Programs Based on Partial Evaluation. In Logic-based Program Synthesis and Transformation, LOPSTR’02, pp. 219-237. Springer LNCS 2664, 2003.

Program Specialization
T. Reps and T. Turnidge. Program Specialization via Program Slicing. In Partial Evaluation. Dagstuhl Castle, Germany, pages 409–429. Springer LNCS 1110, 1996.

Survey
F. Tip. A Survey of Program Slicing Techniques. Journal of Programming
Languages, 3:121-189, 1995.

 
 

XML

 
 

XML was developed by an XML Working Group formed under the auspices of the World Wide Web Consortium (W3C) in 1996. XML documents are made up of basic units called "elements'' according to a set of restrictions specified on an independent specification called Document Type Definition (DTD).
An XML document is "well-formed'' if it conforms to the standard XML syntax rules described in the XML specification. A "valid'' XML document is a well-formed XML document, which also conforms to the rules of a DTD.

More information about XML can be found in:

The Extensible Markup Language (XML)
 
  Slicing XML Documents  
 

An DTD/XML slicer can bring many benefits for the manipulation and maintenance of webpages.

For instance, the following webpage has been automatically generated from the XML file PersonalInfo.xml (valid with respect to PersonalInfo.dtd) by using the PersonalInfo.xsl XSLT file:


Web Page generated from the XML document

If we slice the DTD and XML documents with respect to some criteria, the associated web page is consequently also sliced automatically.

For instance,
If we slice the web page in order to keep only the information related to the courses in which professors teach subjects we obtain the files: PersonalInfo2_CourseSlice.xml, PersonalInfo_CourseSlice.dtd (See the first Web Page in the figure bellow).
If we slice the web page in order to keep only the information related to the Ryan Gibson professor, we obtain the files: PersonalInfo2_ForwardSlice.xml, PersonalInfo_ForwardSlice.dtd (See the second Web Page in the figure bellow).
If we slice the web page in order to keep only the information related to projects, we obtain the files: PersonalInfo2_ProjectSlice.xml, PersonalInfo_ProjectSlice.dtd (See the third Web Page in the figure bellow).

        Web Page slices

Of course, slices can be also sliced in order to obtain more specialized information.

Results

XML slices are well-formed

DTD slices are well-formed

XML slices are valid with respect to DTD slices

Sliced webpages can be reduced more than 70%:

  1. Information is found and accessed quickly.
  2. Less scrolling

Main Applications

Personalization: Webpages can be automatically personalized by selecting the slicing criterion.

Filtering: The irrelevant information of a webpage is automatically discarded.

Example:

Webpage of Tesco at www.tesco.com

Slice of the previous webpage

This webpage has been sliced w.r.t. ths slicing criterion "mobile phones". After the slicing process, text, banners and images that were not semantically (in the XML document) related to mobile phones have been deleted.

Future Work

Amorphous Slicing: Our slicer keeps the original tree-structure of DTD/XML documents after the transformation. We plan to provide new more aggressive approaches by changing the original structure during the slicing process.

XSLT Slicing: Currently, we are able to slice DTD/XML documents. The slicing of XSLT documents would provide us much more control over the final generated webpage. By Slicing XSLT documents we would be able to manipulate structure and behavior.

 
  Implementation  
 

Here you can download some files with the implementation of the DTD/XML slicer. We have implemented our prototype in Haskell. Haskell provides us a formal basis with many advantages for the manipulation of XML documents such as the HaXml library.
It allows us to automatically translate XML or HTML documents into a Haskell representation. In particular, we use the following data structures that can represent any XML/HTML document:

data Element = Elem Name [Attribute] [Content]
data Attribute = (Name, Value)
data Content = CElem Element
                       | CText String


_______________Haskell Libraries_______________
HaXML library Descargar

 

_________________Current Version_________________

XMLSlicer Descargar

 

__________________Example Files__________________

PersonalInfo.xml

Descargar
PersonalInfo.dtd Descargar
PersonalInfo.xsl Descargar
PersonalInfo_CourseSlice.xml Descargar
PersonalInfo_CourseSlice.dtd Descargar
PersonalInfo_ProjectSlice.xml Descargar
PersonalInfo_ProjectSlice.dtd Descargar
PersonalInfo_ForwardSlice.xml Descargar
PersonalInfo_ForwardSlice.dtd Descargar