Labeling large scale social media data using budget-driven One-class SVM classification

Yuan, Hao (2016) Labeling large scale social media data using budget-driven One-class SVM classification. Masters thesis, Memorial University of Newfoundland.

[img] [English] PDF - Accepted Version
Available under License - The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

Download (3949Kb)

Abstract

The social media classification problems draw more and more attention in the past few years. With the rapid development of Internet and the popularity of computers, there is astronomical amount of information in the social network (social media platforms). The datasets are generally large scale and are often corrupted by noise. The presence of noise in training set has strong impact on the performance of supervised learning (classification) techniques. A budget-driven One-class SVM approach is presented in this thesis that is suitable for large scale social media data classification. Our approach is based on an existing online One-class SVM learning algorithm, referred as STOCS (Self-Tuning One-Class SVM) algorithm. To justify our choice, we first analyze the noise-resilient ability of STOCS using synthetic data. The experiments suggest that STOCS is more robust against label noise than several other existing approaches. Next, to handle big data classification problem for social media data, we introduce several budget driven features, which allow the algorithm to be trained within limited time and under limited memory requirement. Besides, the resulting algorithm can be easily adapted to changes in dynamic data with minimal computational cost. Compared with two state-of-the-art approaches, Lib-Linear and kNN, our approach is shown to be competitive with lower requirements of memory and time.

Item Type: Thesis (Masters)
URI: http://research.library.mun.ca/id/eprint/11897
Item ID: 11897
Additional Information: Includes bibliographical references (pages 86-94).
Keywords: One-Class SVM, Label Noise, Social Media Classificaition, Budget-Driven Classification, Online Learning
Department(s): Science, Faculty of > Computer Science
Date: May 2016
Date Type: Submission
Library of Congress Subject Heading: Online social networks--Data processing; Classification rule mining; Big data; Computer algorithms

Actions (login required)

View Item View Item

Downloads

Downloads per month over the past year

View more statistics