N-GRAM ANALYSIS IN THE ENGINEERING DOMAIN

DS 68-6: Proceedings of the 18th International Conference on Engineering Design (ICED 11), Impacting Society through Engineering Design, Vol. 6: Design Information and Knowledge, Lyngby/Copenhagen, Denmark, 15.-19.08.2011

Year: 2011
Editor: Culley, S.J.; Hicks, B.J.; McAloone, T.C.; Howard, T.J. & Chen, W.
Author: Leary, Martin; Pearson, Geoff; Mazur, Maciej; Burvill, Colin Reginald; Subic, Aleksandar
Series: ICED
Section: Design Information and Knowledge Management
Page(s): 414-423

Abstract

New technologies have enabled the digitization and linguistic analysis of a vast number of books published throughout history. This technology has enabled a step-change in the opportunities to understand the interests of the authors and by doing so provide insight into the aspirations of society throughout published human history. Such analysis provides an unprecedented opportunity, however there are numerous analysis pitfalls due to fundamental technology limitations and misunderstanding of the analysis outcomes. This work defines the technologies which have enabled this opportunity and, in doing so, identifies potential risks of erroneous outcomes. A broad scope analysis of the engineering design domain is presented for the first time.

Keywords: OPTICAL CHARACTER RECOGNITION; OCR; N-GRAM; NGRAM; LINGUISTIC ANALYSIS

Download

Please sign in to your account

This site uses cookies and other tracking technologies to assist with navigation and your ability to provide feedback, analyse your use of our products and services, assist with our promotional and marketing efforts, and provide content from third parties. Privacy Policy.