- Home >
- Full text issues >
- Number 1 >
| [Version française] |
Multistructured documents: from modelling to multidimensional analyses
Karim Djemal, Chantal Soulé-Dupuy et Nathalie Vallès-ParlangeauAbstract
With the emergence of digitization, decision makers are more and more concerned by the issues of access and exploitation of document content. They need applications and methodologies to extract the knowledge (measures, indicators …) from digital libraries. We present in this paper a methodology of multidimensional analysis that provides to business decision makers a detailed view of the document within larger collections. This methodology is based on documents structures, where structural elements and content are transposed into an analysis subject (Fact) and analysis axis (dimensions). However, one document can have multiple descriptions and therefore several fragmentations and several structures according to several contexts.
Taking into account this document multistructurality in the multidimensional analysis process provides two advantages: (1) first of all to widen and to enhance the analyses possibilities and then (2) to be able to refine the results. The specific objective of this paper is to present a multidimensional analysis of multistructured documents. Relying on the “MVDM” model, five types of analysis are possible: by generic structure, by generic view, by generic node, by specific structure and by specific view. This methodology is validated through the proposal of a prototype that ensures the document integration and the multidimensional analysis of their contents.
Keywords
multidimensional analysis, datamart, multistructured documents, decision support system, document modellingTable of contents
Full text
Introduction
Nowadays, knowledge is the main success factor of any business. Business decision makers must retrieve measures and indicators from heterogonous resources (internet, library...). Knowledge is not only factual information like in a classical database. The problem is that information is often encapsulated in documents. So we can easily exploit metadata linked to the document, descriptions of the document but not its content. Considering that documents are a very important knowledge source, decision makers must have powerful methodologies and tools to extract this knowledge.
Decision systems enable these decision makers to synthesize the information to aid them in their task. Most of the available documents are digital documents having very different natures: meeting reports, invoices, newspapers, ... Moreover, the same document can have multiple descriptions, so many fragmentations and consequently several structures according to various contexts. Taking into account this multistructurality has a twofold interest. It allows to enhance and to widen the document exploitation possibilities using the information from different structures. It allows also to refine the information retrieval. Indeed, the combination of several structures provides a more accurate and better localization of the relevant document fragments. Approaches dealing with multistructurality management exploit these interests only on document queries.
In this paper, we look at the decision-making systems. To our knowledge, the related works do not consider the document multistructurality. They exploit only a single structure that is generally given by the first document description grammar (i.e. logical structure). As an example, defines a generic structure which gathers the structurally similar documents. This generic structure facilitates access to the contents of attached documents to ensure the multidimensional analysis. With the same goal, ie the access to the content of documents having similar structures, [Golfarelli et al., 2001] and [Pokorný, 2001] propose an approach based on the DTD of XML documents; [Vrdoljak et al., 2006] uses the XML schema of these documents.
We propose in this paper an approach of multidimensional analysis that exploits the document multistructurality.
We first introduce some related works, and then we present the “MVDM” model that we propose to represent multistructured documents. This model is characterized by two levels: a specific one and a generic one. The specific level represents the specific characteristics of one multistructured document. The generic level represents a document class that gathers similar multistructured documents. Relying on the generic level, we develop an approach of multidimensional analysis involving several structures of the same document. Finally, we illustrate the use of this approach through the development of an example in the least section.
Related Works
There have been some recent attempts to deal with the problem of document multistructurality from a computer science perspective. To address this issue, it may be interesting to manage the various problems related to the representation, the storage, the reconstruction and the management of concurrent structures of documents.
Reading the problems of representation and storage, two kinds of approaches have been proposed:
(i) The first kind of approaches is based on syntactic solutions. The main problem of this approach is to represent all the structures in a same document. This document must be on the one hand, well formed and on the other hand, XML compatible. The formal framework imposed by these syntactic approaches structures the documents in a precise, concise and unambiguous way. But at the same time, there are two problems with these approaches: a legibility problem for users and a compiler problem when processing these documents. In these approaches, there are two categories of works: those which suggest extending the syntax of XML or SGML [LeMaitre, 2006] in order to preserve a certain compatibility (CONCUR [Fernandez et al., 2007], XCONCUR [Hilbert et al., 2005] and TEI [Sperberg-McQueen & Burnard, 2001]) and those which propose new syntaxes (LMNL [Tennison & Piez, 2002], MECS [Huitfeldt, 1998] and TexMECS [Huitfeldt & Sperberg-McQueen, 2001]).
(ii) The second kind of approaches concerns the propositions based on models. Exhaustiveness and clearness of representation are the main advantages of these approaches. Indeed, modelling the various concepts and fragments of a document offers a more precise and exhaustive vision of its composition (nodes and structures). Moreover, through this modelling, it is possible to implement tools allowing to exploit documents without using any parser of multistructured documents (which is difficult to develop). A first category of solutions, like in [Navarro, 1995] and [Chatti et al., 2004], recommends the use of a given structure as pivot structure. The various document structures are attached to this pivot structure. A second category of solutions consists of an independent modelling of each structure, like in [Mechkour, 1995], [Bruno & Murisasco, 2006], [Sperberg-McQueen & Huitfeldt, 2000] and [LeMaitre, 2006]. All of these models are based on a tree-like or on a graph-like representation of the structures. They provide a concise and accurate vision of the node composition of each structure. So it is now obvious that modelling the structure composition is a good way to enhance and to facilitate document exploitation, more particularly in comparison with approaches having to use parsers during the exploitation stage.
To our knowledge, most of the time, document exploitation is limited to querying. Given the complexity of some models of document representation, the querying of multiple structures requires special treatments: these treatments are developed within an extension of XQuery or within a new querying language. [Bruno & Murisasco, 2006] proposes to add functions and operators to the existing document query languages XQuery and XPath. They choose to extend the semantics of XQuery filters to query the concurrent structures of a same document. MECS and TexMECS documents will be managed through the GODDAG model [Sperberg-McQueen & Huitfeldt, 2000]. This model is based on directed and acyclic graphs. In order to manage overlapping between the various concurrent structures, the GODDAG model shares the common nodes of these different structures. Thus, each node can have multiple parents. To generate the structures based on the GODDAG model, [Dekhtyar & Iacob, 2005] have developed a compiler that translates multistructured documents into a DXD (Distributed XML Document: set of XML documents that share the same root and the same content). To query GODDAG, [LeMaitre, 2006] proposes to extend the model MXD [Fernandez et al., 2007], which does not allow to manage non-tree structures, with “delay nodes”. A delay node is the virtual representation of a sub-tree of a father node through an XQuery expression.
Multistructured Document Model
Our work is naturally in this second category of approaches dedicated to multistructured document management. We formalize a model based on a “logical” fragmentation technique that allows to describe separately the various entities of a document (elements, element attributes, metadata and metadata attributes) and the relationships between these entities. This fragmentation is called “logical” because the document content is not really split up and scattered. It is stored as a data block and referenced by different logical nodes, and therefore by the different structures. Thus, a node and its content are associated by an index that determines the beginning and end of each content fragment. This indexing leads to the sharing of common contents that belong to several structures. This optimization has a twofold interest: on the one hand, it avoids a duplication of storage, and on the other hand, its takes into account the overlapping between nodes of concurrent structures.
A multistructured document is represented by a specific structure. This specific structure includes all the nodes (fragments) of the document. Each document structure is encapsulated within a specific view. All the views of the same document are aggregated to compose the specific structure of this document by respecting the sharing of common nodes.
The main originality of our model lies in the definition of a generic level. This level acts on the one hand as a document definition grammar and on the other hand as a document classifier. Indeed through this model, documents having similar structures can be gathered together into document classes. Thus, each generic structure (resp. view) represents a collection of specific structures (resp. views). An example of a two generic views is illustrated in Figure 5.
The “MVDM” model (Multi View Document Model) (see Figure 1) [Djemal et al., 2008] includes two levels:
The specific level is described through the following metaclasses: “SpeStr”, “SpeNode”, “SpeRelation” and “SpeView”. It also includes two additional metaclasses specific to a single document: the metaclass “Document” represents a specific document and the metaclass “Declaration” ensures the preservation of document characteristics like its version number for example. At the specific level, we also detail the type of each specific node in order to define her specific features. So one metaclass is designed for each type.
The generic level is described through the following metaclasses: “GenStr”, “GenNode”, “GenRelation” and “GenView”. The generic structure “GenStr” represents the global structure of a document collection. It is defined through a set of generic nodes “GenNode”. The generic relations “GenRelation” characterize the links that join two generic nodes according to a particular view. The generic view “GenView” references a sub-structure of the generic structure. Each sub-structure refers to the representation of a particular document class.
The relationship between the two levels (specific and generic) is ensured by a special link that we define as “compliance link”. UML does not have this kind of link; therefore, we opted for a new stereotype: “” (see Figure 1).
Interest of the Compliance Link. The relationship between the specific level and the generic level can be described as an inheritance because the specific classes are on one hand fully consistent with the generic classes, and on the other hand, they can be enriched by specific information. In this case these classes are subclasses, but not instances of the mother class.
Characteristics of the Compliance Link. In addition to the generalization/specialization aspect, he compliance link such as defined should have the following characteristics:
-
a child metaclass is an instance of the mother metaclass. This instance inherits all the characteristics of the derived metaclass,
-
homomorphism: this link provides an homomorphism between the specific and generic level. Indeed, this relationship ensures that every specific fragment needs to be attached to a generic fragment,
-
classification: each generic fragment includes a set of specific fragments. This allows to create fragment classes that facilitate the access to a specific fragment.
Multidimensional Analysis
We describe in this section the multidimensional analysis technique that we want to apply to factual information included in document repository organized according to the “MVDM” model. This multidimensional analysis technique consists of a data structuring according to several analysis axes that can represent different concepts. According to the proposed model, these data can be derived from specific elements, attributes of specific elements, specific metadata or attributes of specific metadata.
To analyze the content of a document repository, we must group the data by analysis component within one or several datamarts. A datamart is an extract of information organized adequately in order to decision-making purposes. Therefore, the first step consists in building and instantiating these datamarts. The data extracted are generally suitable for a particular use.
In our work, we adopted the multidimensional tables to visualize the content of the generated datamarts [Gyssens & Lakshmanan, 1997] since the representation in tabular form is the simplest and most intuitive vision for the user. The approach we propose to restitute document analysis from document repository is based on a multidimensional representation by the use of the generic structures and generic views represented within MVDM model. These generic elements act as indexes to access specific elements. They provide several access points to document content. This approach can be illustrated as shown in Figure 2. This process is composed of three steps:
Building of datamart schemas: this step needs a user intervention to identify the subject (fact) and the axes (dimensions) of analysis,
Datamart generation: during this step, the datamart must be generated in an automatic and transparent way towards user. Thus, it is necessary to access the document repository in order to retrieve the content values and to instantiate datamarts,
Visualization of multidimensional tables: once the datamart is built, this step allows to visualize its content automatically through multidimensional tables.
Process of datamart schema building
The first step of a multidimensional analysis process consists in generating, from document repository, the target datamart schema.
The datamart schema building (Cf. Figure 3) is composed of four steps: 1. Selection of the analysis type, 2. Selection of components analysis (Fact/Dimensions), 3. Screening and 4. Schema visualization.
The first step should allow the user to select an analysis type. Considering the MVDM model, five types of analysis are possible:
Analysis by generic structure: this analysis applies to a set of documents attached to the same generic structure. In this case, the analysis components will be selected regardless of the associated views,
-
analysis generic view: this analysis has the same principle as the previous one. However the analysis focuses on a single view aggregated to a generic structure of a set of documents,
-
analysis by generic nodes: the first two proposals may be restrictive in some cases. In fact, a generic node can be shared by several generic structures. Then this proposal consists in analyzing the documents by generic nodes which can belong to several generic structures,
-
analysis by specific structure: the fourth proposal consists in analysing the contents of one and only one document on the basis of its specific structure. Thus, it is necessary to refer to its generic structure to determine its document schema,
-
analysis by specific view: the last proposal consists in analysing the contents of a document being focused on only one specific view.
These different types of analysis allow the user to focus on one or more structures, on a defined field or even on a document, according to its analysis needs;
During the second step, user must select the analysis components, that is to say fact (analysis subject) and dimensions (analysis axes). User must also order the dimensions and choose the aggregation function (Account, Sum, Maximum, Minimum, Average, content) for the measure of fact (analysis indicator). In the case of generic nodes, component selection is done through component lists, because this analysis type requires the use of several hierarchies. For other types of analysis, component selection is made from the generic structure or the generic view chosen previously by the user;
The third step is “screening”. It should allow the user to select specific values to refine his/her analysis. We distinguish two types of screening:
For a dimension, user must choose from among all the values, he/she wants to integrate into the datamart,
For a fact, we provide two filter types. When the fact value is in digital form, we propose a filter that allows to select criteria using conventional operators of arithmetic comparison (<,> =, <>, <=,> =). For text values, we propose to use screening techniques in order to select keywords that can be connected by logical operators (“+”: and, “-”: not, “|”: or);
The last step is “visualization”. It allows user to display the document datamart schema using a graphical representation to illustrate the analysis choice before generating the datamart that are the basis of multidimensional tables.
Process of Document Datamart Generation
This phase consists in generating the datamart in an automatic way to retrieve information from the document repository. This generation is established through two steps (see Figure 4), that is to say: 1. View generation for each analysis component (either fact or dimension) and 2. Joint and gathering of different views generated at the first step.
View Generation for each Analysis Component
For each dimension, we must generate one view. The views that we handle in this section are the views used in databases and not the views presented in the MVDM model. A database view can be regarded as a virtual table defined by a query [Gupta & Mumick, 1995].
The name of the created view will have the following form: “Dim_n” where “n” represents the dimension number. This view includes also a block of attributes “Anc_x” and a field “Node” (see Query 1). The number of attributes “Anc_x” depends on the number of dimensions that will be used to perform a multidimensional analysis.
For the node chosen as fact, the system must generate a view called “Fact”. This view includes a block of attributes “Anc_x” and a field “Node”.
Other constraints must be added to the generated views:
-
if the analysis type is by “generic view” (resp. generic structure), all specific nodes must belong to the specific views (resp. specific structures) of the documents related to the generic view (resp. generic structure) selected;
-
if the analysis type is “specific view” (or specific structure), all the specific nodes must belong to the specific view (or the specific structure) of the document selected by the user;
-
if the user requires specific values for a dimension or a fact (screening operation), conditions should be added to take into account these values (and only these).
In the case of a generic node analysis, the system must firstly determine the generic structures that contain all the nodes chosen as analysis components. Then a new view is generated for each generic structure determined, and this according to the approach previously detailed. This process ends by a union between all of the views generated.
The generic form of a dimension view is shown in Query 1. Some instantiation examples of this generic form are presented in section (see Query 4, Query 5, Query 6 and Query 7).
Query 1
|
CREATE VIEW Dim_i (Doc, Anc_1, {Anc_2, Anc_3}, nœud) AS |
Joint and gathering of generated views
This step consists in joining and gathering all the views generated in the previous step. Thus, a new view is generated by joining the attributes “Anc_x” from all views created in the previous step. At this level, only specific nodes are handled.
This new view (see Query 2) will have the following form:
Query 2
|
CREATE VIEW Jointure (nœud_d1, {nœud_d2, nœud_d3}, nœud_f) AS |
An instantiation example of this generic form is presented in section (see Query 8).
The definition of several structures on the same content induces content overlapping. In a first time, we look over the list of generic nodes whose content may overlap. This list “OverlGenNode” was constructed during the generic structure integration. For each line of the view “Joint”, if nodes occur in the “OverlGenCode”, we perform the overlapping computation in order to determine the common content. The process is applied on the nodes and all their descendants. Only elements and metadata nodes are concerned by the overlapping process. If attributes nodes have been chosen, the process is applied to their father elements or metadata.
The resulting table “JointT” contains the node content. Depending on the node nature, the stored content can be the node content itself (element nodes) or content indexes (metadata nodes).
Once the table “JointT” is created, it is necessary to perform a gathering in order to apply the aggregate function chosen by the user while taking into account the screening function imposed on the fact. This new view (see Query 3) generates the content of the datamart as follows:
Query 3
|
CREATE VIEW Vue (j.nœud_d1, {j.nœud_d2, j.nœud_d3}, j.nœud_f) AS |
Visualization of multidimensional tables
Once the last view created, its content will be displayed through several tables rather simple to handle and to interpret. These tables allow to better appreciate the content of document datamarts. They organize data by classifying them according to the dimensions previously chosen by user. Thus, the columns represent the first dimension, the rows represent the second dimension and the plans represent the third dimension. The fact measure values are shown inside the tables in the form of interrelations between the different dimensions values.
Since each plan of the multidimensional table corresponds to a single value of the third dimension, the transition from the last view generated by the system in a multidimensional table is settled by generating a view for each value of the third dimension. Each new view contains three columns: (1) the first dimension, (2) the second dimension and (3) the fact.
From each of these views, the system must:
-
retrieve all possible values of the first dimension; these values will be displayed in the columns of the appropriate plane;
-
retrieve all possible values of the second dimension; these values will be displayed in the rows of the appropriate plane;
-
restitute for each couple (column i and row j) the corresponding measure from the third column of the view (the fact). This measure will be displayed in the appropriate box (intersection between i and j).
Example
Model Instantiation
To illustrate our process of multidimensional analysis of multistructured documents, we propose an example of analysis by generic structure. We consider the generic structure “audio sequence” (see Figure 5). This generic structure is based on two generic views: “Speaker” and “Topic”. These two views represent two different structurings of the same audio sequence: the first view represents a segmentation in speaker segments and the second view represents a segmentation in topic segments. From this generic structure, we apply our multidimensional analysis process.

Figure 5: Example of generic structure based on two generic views.
Multidimensional Analysis
The given example retrieves the speech segment expressed by each speaker, topic and sequence. From the generic structure “audio sequence” we choose three dimensions: “NameS” (to represent the name of speaker), “NameT” (to represent the name of topic) and “Sequence” (to represent the sequence); and one fact: “Speech” (to represent the speech segment).
The first step is to generate four views corresponding to the three dimensions and the fact like we have detailed in section .
For the first dimension “NameS”, the system must generate the following view:
Query 4
|
CREATE VIEW Dim_1 ("Doc", "Anc_1", "Anc_2", "Anc_3", "NameS") AS |
For the second dimension “NameT”, the system must generate the following view:
Query 5
|
CREATE VIEW Dim_2 ("Doc", "Anc_1", "Anc_2", "Anc_3", "NameT") AS |
For the third dimension “Sequence”, the system must generate the following view:
Query 6
|
CREATE VIEW Dim_3 ("Doc", "Anc_1", "Anc_2", "Anc_3", "Sequence") AS |
For the fact “Speech”, the system must generate the following view:
Query 7
|
CREATE VIEW Fact ("Doc", "Anc_1", "Anc_2", "Anc_3", "Speech") AS |
Once these four views have been created, the system generates new view “Joint” (see Query 8) like we have described in section .
Query 8
|
CREATE VIEW Joint ("NameS", "NameT", "Sequence", "Speech") AS |
Once this view has been created, the system carries out the overlapping computation. The nodes “Topic” and “Speaker” are marked on the list “OverlGenNode”. Then the overlapping computation concerns these nodes and their father nodes. In our example, the nodes “Sequence” and Speech” are concerned by overlapping computation. The nodes “NameS” and “NameT” are attributes; they are taken into account in the overlapping computation through their father nodes. Therefore, the final fact value is the result of the content intersection of four nodes “Topic”, “Speaker”, “Sequence” and “Speech”. In Figure 6, we present the computation established on the first row of the “Joint” view. The content is an audio sequence; it will be translated in seconds by begin and end marks.

Figure 6: Management of overlapping.
After managing the overlapping, a new table “JointT” is created. A gathering operation is performed on this table in order to apply the aggregate function. Thus, a new view that represents the datamart content is established. From this view, a multidimensional table is generated as shown in Figure 7 like we have described in section .

Figure 7: Multidimensional table.
Validation
To validate our proposals, we have developed the prototype MDOCREP (Multistructured DOCument REPository). This prototype is dedicated to multistructured document integration and analysis. Our experimental base contains a set of audio documents extracted from radio emissions of RFI (Radio France International) and of RM (Radio Maroc). This collection was annotated in the framework of Raives project [Parlangeau-Vallès et al., 2003]. The generic structure shown in Figure 5 is a simplified generic structure of Raives collection. Actually the annotations of this corpus are presented as textual transcription. Then, we consider these annotations as textual documents. In this section, we present an example of analysis that can be applied on the Raives collection.
If we want to know the speech transcription expressed by each speaker, each topic and each language, we should carry out the following steps:
Selection of analysis type
The first step consists in selecting the type of analysis (in our example by “generic structure”). Thus, the system displays the list of all the existing structures in the document repository. Among these structures, the user must choose the generic structure “RadioEmissions”. Once the choice of the structure carried out, the system displays this tree-like structure as shown in Figure 8.
Selection of analysis components
In this step, the user must select and define the analysis components: i.e. to specify the fact (subject of analysis) and dimensions. The assignment of these roles is done through contextual menus (cf. FigurValidatione 8). The user must point out the desired fragment and fix her choice (fact or dimension, by a right click). He/she must do the same with the attributes, namely: the order for dimensions and the formula for the fact (Count, Sum, Maximum, Minimum, Average, etc.). The analysis component can be selected from views. In our prototype, each view is presented as a tree within a specific tab.
In our example, the first dimension is “Topic” (see Figure 9), the second is “Language” (see Figure 10) and the third is “Name” (name of speaker, see Figure 11) The measure of the fact is the transcription content “Trans” (see Figure 9).
Figure 12 shows the result in the form of multidimensional tables.

Figure 8: Selection of the node “Topic” as first dimension.

Figure 9: Analysis components selected from the “Topic” view.

Figure 10: Analysis components selected from the “Language” view.

Figure 11: Analysis components selected from the “Speaker” view.
Conclusion
The approach we have chosen to perform multidimensional analysis of the document contents is based on their structures. From these structures, nodes and content are transposed into the analysis subject (Fact) and analysis axes (dimensions). The selected nodes can belong to a single document structure or several structures of the same document. Taking into account document multistructurality allows analysts to have more accurate results. This precision is ensured through the addition of new analysis parameters and the management of overlappings between nodes defined on the same content. For example, if we just consider a single structure in the example presented in Figure 5, we will select less nodes and therefore less analysis components. In this case, the analysis cannot integrate the components taking into account the topics and speakers at the same time. The overlapping management allows to adjust the result according to the contents of two nodes that overlap. In the example shown in Figure 6, we show how the measure of the first fact passes from (89_123) to (89_99). Thus, the analyst will have a better localization of the fragments corresponding to the chosen analysis components.
Generally, decision-making systems that integrate document information offer more data to the analyst. According to [Tseng & Chou, 2006], only 20% of data are used by OLAP techniques. These 20% represent transactional data. The 80% of remaining data are encapsulated in the documents.
Multidimensional analysis certainly requires complex treatment. Nevertheless, in comparison with information retrieval system (IRS), it has some advantages especially as regards document exploitation. In an IRS, the number of documents retrieved may be too high. Thus, obtaining relevant information effectively (that contain a piece of the information searched) lead a loss of time. Furthermore, the results generated through an IRS concern the full documents and not specific document passages (which may be interesting on long documents).
The validation of our proposal is ensured by the implementation of a prototype (MDocRep). This prototype is based, on the one hand, on the DBMS “Oracle 10g2” to store structures and contents of documents according to the MVDM model, and on the other hand, a client interface “Java 1.5” to present a graphical tool that facilitates selection of analysis components, automatic generation of queries and visualization of results.
For the future work, we plan to apply this approach of multidimensional analysis to the management of document versions. Indeed, we plan to extend the MVDM model to take into account the concept of version as a new kind of view.
Bibliography
Bruno E., Murisasco E., MSXD: a formal model for concurrent structures defined over the same textual data, DEXA 2006 (LNCS), 2006, p. 172-181
Chatti N., Calabretto S., Pinon J-M, Vers un environnement de gestion de documents à structures multiples, Base de Données Avancées, BDA’2004, Montpellier, octobre 2004
Dekhtyar A., Iacob E., A framework for management of concurrent XML markup, Data and Knowledge Engineering, 2005, p. 185-208
Djemal K., Mbarki M., Vallés-Parlangeau N., Une approche multi-vues pour la gestion des documents multistructurés, Document numérique, Hermès, Numéro spécial Entreposage de documents et données semi-structurées, 2007, vol. 10, N. 2, p 37-61
Djemal K., Soulé-Dupuy C., Vallés-Parlangeau N., Formal Modeling of Multistructured Documents, International Conference on Research Challenge in Information Science (RCIS 2008), Marrakech, Morocco, 03/06/2008-06/06/2008, IEEE, p 227-236
Fernandez M., Malhotra A., Marsh J., Nagy M., Walsh N., XQuery 1.0 and XPath 2.0 Data Model (XDM), W3C CandidateRecommendation, 2007
Golfarelli M., Rizzi S., Vrdoljak B., Data Warehouse Design from XML Sources, Fourth ACM International Workshop on Data Warehousing and OLAP, November 9, 2001, Atlanta, Georgia, USA, p 40-47
Gyssens M., Lakshmanan L.V.S., A Foundation for Multi-dimensional Database, International Conference on Very Large DataBases (VLDB’97), Athens, Greece, August 1997, p. 106-115
Hilbert M., Schonefeld O., Witt A., Making CONCUR work Dans Extreme Markup Languages, Montreal, 2005
Huitfeldt C., MECS - A Multi-Element Code System, Working Papers from the Wittgenstein Archives at the University of Bergen, Version 3, October 1998
Huitfeldt C., Sperberg-McQueen C. M. et TexMECS: An experimental markup meta-language for complex documents, Rev., 17 February 2001
Khrouf K., Soulé-Dupuy C., A Textual Warehouse Approach: a Web Data Repository, Chapter VII, Intelligent Agents for Data Mining and Information Retrieval, Idea Group Publishing, p. 101-124, 2004
LeMaitre J., Representing multistructured XML documents by means of delay nodes, Proceedings of the 2006 ACM Symposium on Document Engineering, Amsterdam, The Netherlands, October 2006.
Mechkour M., A multifacet formal image model for information retrieval, MIRO final workshop, Glasgow, UK, 1995, p. 18-20.
Navarro G., A language for queries on structure and contents of textual databases, Thèse de Doctorat, Université de Chili, 1995
Pokorný J., Modelling Stars Using XML, Fourth ACM International Workshop on Data Warehousing and OLAP, November 9, 2001, Atlanta, Georgia, USA, p 24-31
Sperberg-McQueen C. M., Huitfeldt C., GODDAG: A Data Structure for Overlapping Hierarchies, DDEP/PODDP, 2000, p 139-160
Sperberg-McQueen C. M., Burnard L., Guidelines for Electronic Text Encoding and Interchange, Chicago and Oxford, TEI P4, 2001
Tennison J. et Piez W., The Layered Markup and Annotation Language (LMNL), Proceeding of Extreme Markup Languages Conferences, Montreal, Quebec, Canada, 2002.
Tseng F. S. C. Chou, A. Y. H., The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence, Decision Support Systems 42, no. 2 (2006): 727-744.
Vrdoljak B., Banek M., Skocir Z., Integrating XML Sources into a Data Warehouse, DEECS 2006, San Francisco, CA, USA, p 133-142
I-Revues
Contact






