Attribute lattice: a graph-based conceptual modeling grammar for heterogeneous data

Asgari, Mojtaba (2020) Attribute lattice: a graph-based conceptual modeling grammar for heterogeneous data. Doctoral (PhD) thesis, Memorial University of Newfoundland.

[img] [English] PDF - Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

Download (6MB)

Abstract

One key characteristic of big data is variety. With massive and growing amounts of data existing in independent and heterogeneous (structured and unstructured) sources, assigning consistent and interoperable data semantics, which is essential for meaningful use of data, is an increasingly important challenge. I argue, conceptual models, in contrast to their traditional roles in the Information System development, can be used to represent data semantics as perceived by the user of data. In this thesis, I use principles from philosophical ontology, human cognition (i.e., classification theory), and graph theory to offer a theory-based conceptual modeling grammar for this purpose. This grammar reflects data from users of data perspective and independent from data source schema. I formally define the concept of attribute lattice as a graph-based, schema-free conceptual modeling grammar that represents attributes of instances in the domain of interest and precedence relations among them. Each node in an attribute lattice represents an attribute - a true statement (predicate) about some instances in the domain. Each directed arc represents a precedence relation indicating that possessing one attribute implies possessing another attribute. In this thesis, based on the premise that inherent classification is a barrier that hinders semantic interoperation of heterogeneous data sources, a human cognition based conceptual modeling grammar is introduced as an effective way to resolve semantic heterogeneity. This grammar represents the precedence relationship among attributes as perceived by human user and provides a mechanism to infer classes based on the pattern of precedences. Hence, a key contribution of attribute lattice is semantic relativism – that is, the classification in this grammar relies on the pattern of precedence relationship among attributes rather than fixed classes. This modeling grammar uses the immediate and semantic neighbourhoods of an attribute to designate an attribute as either a category, a class or a property and to specify the expansion of an attribute – attributes which are semantically equal to the given attribute. The introduced conceptual modeling grammar is implemented as an artifact to store and manage attribute lattices, to graphically represent them, and integrate lattices from various heterogeneous sources. With the ever-increasing amount of unstructured data (mostly text data) from various data sources such as social media, integrating text data with other data sources has gained considerable attention. This massive amount of data, however, makes finding the data relevant to a topic of interest a new challenge. I argue that the attribute lattice provides a robust semantic foundation to address this information retrieval challenge from unstructured data sources. Hence, a topic modeling approach based on the attribute lattice is proposed for Twitter. This topic model conceptualizes topic structure of tweets related to the domain of interest and enhances information retrieval by improving the semantic interpretability of hashtags.

Item Type: Thesis (Doctoral (PhD))
URI: http://research.library.mun.ca/id/eprint/14448
Item ID: 14448
Additional Information: Includes bibliographical references (pages 163-173).
Keywords: Attribute lattice, Conceptual modeling grammar, Semantic data integration, Attribute-lattice-based topic modeling, Twitter content analysis
Department(s): Business Administration, Faculty of
Date: May 2020
Date Type: Submission
Digital Object Identifier (DOI): https://doi.org/10.48336/kbz2-sv47
Library of Congress Subject Heading: Conceptual structures (Information theory); Semantic integration (Computer systems); Business--Data processing.

Actions (login required)

View Item View Item

Downloads

Downloads per month over the past year

View more statistics