Preprint / Version 1

AUTOMATED VERBALIZATION OF CONCEPTUAL DATA MODELS IN BAHASA MELAYU AND MANDARIN

##article.authors##

  • LIM SHIN HUEI INTI

Keywords:

Automated Verbalization, Bahasa Melayu, Mandarin, Conceptual Data

Abstract

This thesis is mainly concerned with automatically verbalizing conceptual data models expressed in the graphical notation of Object-Role Modeling (ORM) into readable text expressed in two Asian languages: Bahasa Melayu (BM) which is the national language of Malaysia, Brunei and Indonesia, and one of the four official languages of Singapore; and Mandarin (the most widely used Chinese language).   In developing information systems, the most critical aspect is to specify a conceptual model that correctly and completely captures the semantics of the relevant business domain. This model can then be used to drive the later phases of the information systems engineering process. Industrial practice has shown that the most reliable way to validate the model with business domain experts is to communicate the model clearly in their native language.   Object-Role Modeling (ORM) is an approach for conceptual data modeling specifically designed to optimize communication between data modelers and business users. In Malaysia, the most common non-English languages spoken are Bahasa Melayu and Mandarin. The main objectives of this research are to develop procedures for automating the verbalization of ORM data models (including constraints and business rules) in Bahasa Melayu and Mandarin, and to implement the approach in a software tool that allows the models to be entered and verbalized in these languages.   After conducting a literature review to identify existing, relevant research, to ensure that our work will provide a novel contribution, we will specify the theoretical underpinnings by developing the relevant metamodels and information structures needed to capture the relevant logical and linguistic forms underlying the verbalizations and the transformations from ORM structures to the native languages.   We provide one full and one partial implementation of the approach by coding the structures and transformations in appropriate computing languages. Our full implementation is an independently developed prototype, mainly programmed in C#. Our partial implementation is an extension to the public domain version of the NORMA software tool for ORM, mainly coded in XML and XSLT. As the internal verbalization framework in NORMA will soon be replaced by another tree-based viverbalization framework, we limited our Mandarin and Bahasa Melayu extension of the NORMA’s current verbalization framework to just a few ORM constraints.   Some of our implementation aspects (e.g. vocal rendition of our verbalizations in Mandarin) also make use of other existing public domain tools.   Evaluation of the end product involved a number of native speakers of Bahasa Melayu and Mandarin to determine the quality of the generated verbalizations in terms of clarity, unambiguity and precision.

Additional Files

Posted

2022-04-07