Digital library of construction informatics and information technology in civil engineering and construction
 
ITC
Digital library
SciX
Tower of Babel
Home All papers Browse by series Browse by authors Browse by keywords Browse by years
Paper: w78-2010-25
Paper title: Unstructured Construction Document Classification Model Through Latent Semantic Analysis (LSA)
Authors: Tarek Mahfouz, Amr Kandil
Summary: The dynamic nature of the construction industry and the increasing sophistication and complexity of construction projects mandate extensive coordination between different parties and produces massive amounts of documents in diverse formats. Therefore, in an attempt to provide a robust document classification methodology for the construction industry, the current research develops an automated classifier model through Latent semantic Analysis (LSA). The analyses and models developed in this paper focused on two groups of construction documents. The first constitutes of documents with high variation in words like transmittals, correspondences, and meeting minutes. The second relates to documents of low word variations like construction claims and legal documents. The adopted research methodology (1) investigated Latent Semantic Analysis (LSA) algorithms; (2) developed reduced feature spaces; (3) developed two C++ algorithms which process unstructured construction documents into a readable format by the LSA algorithms; (4) developed LSA automated classification models; and (5) tested and validated the developed models. The developed models under the current research attained higher classification accuracy, and better precision and recall than previous researches illustrated in the literature. An overall accuracy of 89% and 92% were attained in the first and second groups of documents addressed respectively. The main finding of this paper represent a step in a line of research that targets developing a coherent and integrated methodology for Knowledge Management (KM) and construction decision support through Machine Learning (ML) techniques. It is conjectured that this research stream would help in relieving the negative consequences associated with lengthy tasks related to analyzing textual documents in the construction industry.
Type: normal paper
Year of publication: 2010
Keywords: Knowledge Management, Latent Semantic analysis, Machine Learning, Document Classification
Series: w78:2010
ISSN: 2706-6568
Download paper: /pdfs/w78-2010-25.pdf
Citation: Tarek Mahfouz, Amr Kandil (2010). Unstructured Construction Document Classification Model Through Latent Semantic Analysis (LSA) . CIB W78 2010 - Applications of IT in the AEC Industry (ISSN: 2706-6568), http://itc.scix.net/paper/w78-2010-25
hosted by University of Ljubljana University of Ljubljana

includes:

CIB
W78

ECCE

ITcon
© itc.scix.net
inspired by SciX, ported by Robert Klinc [2019]