N-GRAM ANALYSIS IN THE ENGINEERING DOMAIN
Year: 2011
Editor: Culley, S.J.; Hicks, B.J.; McAloone, T.C.; Howard, T.J. & Chen, W.
Author: Leary, Martin; Pearson, Geoff; Mazur, Maciej; Burvill, Colin Reginald; Subic, Aleksandar
Series: ICED
Section: Design Information and Knowledge Management
Page(s): 414-423
Abstract
New technologies have enabled the digitization and linguistic analysis of a vast number of books published throughout history. This technology has enabled a step-change in the opportunities to understand the interests of the authors and by doing so provide insight into the aspirations of society throughout published human history. Such analysis provides an unprecedented opportunity, however there are numerous analysis pitfalls due to fundamental technology limitations and misunderstanding of the analysis outcomes. This work defines the technologies which have enabled this opportunity and, in doing so, identifies potential risks of erroneous outcomes. A broad scope analysis of the engineering design domain is presented for the first time.
Keywords: OPTICAL CHARACTER RECOGNITION; OCR; N-GRAM; NGRAM; LINGUISTIC ANALYSIS