바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

Generating and Controlling an Interlinking Network of Technical Terms to Enhance Data Utilization

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2018, v.35 no.1, pp.157-182
https://doi.org/10.3743/KOSIM.2018.35.1.157

  • Downloaded
  • Viewed

Abstract

As data management and processing techniques have been developed rapidly in the era of big data, nowadays a lot of business companies and researchers have been interested in long tail data which were ignored in the past. This study proposes methods for generating and controlling a network of technical terms based on text mining technique to enhance data utilization in the distribution of long tail theory. Especially, an edit distance technique of text mining has given us efficient methods to automatically create an interlinking network of technical terms in the scholarly field. We have also used linked open data system to gather experimental data to improve data utilization and proposed effective methods to use data of LOD systems and algorithm to recognize patterns of terms. Finally, the performance evaluation test of the network of technical terms has shown that the proposed methods were useful to enhance the rate of data utilization.

keywords
롱테일 법칙, 개방형 연결 데이터, 언어 자원, DBLP, 편집 거리 알고리즘, long tail theory, linked open data, language resources, DBLP, edit distance algorithm

Reference

1.

안광모. (2013). Levenshtein 거리를 이용한 영화평 감성 분류. 디지털콘텐츠학회논문지, 14(4), 581-587. http://dx.doi.org/10.9728/dcs.2013.14.4.581.

2.

황미녕. (2011). 기술 용어의 용어지배값을 이용한 활용주기 모델링방법 (139-141). 한국정보과학회 학술발표논문집.

3.

Abe, A.. (2010). Analysis of research keys as temporal patterns of technical term usage in bibliographical data. Lecture Notes in Computer Science book series, 6496, 150-157. http://dx.doi.org/10.1007/978-3-642-15470-6_16.

4.

Graham Cormode. (2007). The string edit distance matching problem with moves. ACM Transactions on Algorithms, 3(1), 1-. http://dx.doi.org/10.1145/1186810.1186812.

5.

Fortune. (2017). Apple just acquired this little-known artificial intelligence startup. http://fortune.com/2017/05/13/apple-lattice.

6.

Gartner. (2018). Dark data (Gartner IT Glossary). https://www.gartner.com/it-glossary/dark-data.

7.

P. Bryan Heidorn. (2008). Shedding Light on the Dark Data in the Long Tail of Science. Library Trends, 57(2), 280-299. http://dx.doi.org/10.1353/lib.0.0036.

8.

Hwang, M. N.. (2014). Technical terms trends analysis method for technology opportunity discovery. Information, An International Interdisciplinary Journal, 17(3), 877-883.

9.

Jain, P.. (2010). Ontology alignment for linked open data. Lecture Notes in Computer Science book series, 6496, 402-417. http://dx.doi.org/10.1007/978-3-642-17746-0_26.

10.

Jeong, D. H.. (2011). Generating knowledge map for acronymexpansion recognition (287-293). Proceedings on U- and E-Service Science and Technology.

11.

Jeong, D. H.. (2013). Acronym-expansion recognition based on knowledge map system. Information, An International Interdisciplinary Journal, 12(A), 8403-8408.

12.

Jinhyung Kim. (2012). Technology trends analysis and forecasting application based on decision tree and statistical feature analysis. Expert Systems with Applications, 39(16), 12618-12625. http://dx.doi.org/10.1016/j.eswa.2012.05.021.

13.

Qi Li. (2014). A confidence-aware approach for truth discovery on long-tail data. Proceedings of the VLDB Endowment, 8(4), 425-436. http://dx.doi.org/10.14778/2735496.2735505.

14.

Noia, T. D.. (2012). Linked open data to support content-based recommender systems (1-8). Proceedings of the 8th International Conference on Semantic Systems.

15.

Paulheim, H.. (2012). Unsupervised generation of data mining features from linked open data (-). Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics.

16.

Reis, D. C.. (2004). Automatic web news extraction using tree edit distance (502-511). Proceedings of the 13th International Conference on World Wide Web.

17.

Veritas. (2016). Veritas global databerg report finds 85% of stored data is either dark or Redundant. https://www.veritas.com/news-releases/2016-03-15-veritas-global-databerg-report-finds-85-percent-of-stored-data.

18.

Wikipedia. (2018). Long tail. https://en.wikipedia.org/wiki/Long_tail.

19.

Wikipedia. (2018). X-ray diffraction (redirection). https://en.wikipedia.org/wiki/X-ray_crystallography.

20.

Wikipedia. (2018). High-performance liquid chromatography. https://en.wikipedia.org/wiki/High-performance_liquid_chromatography.

21.

Wikipedia. (2018). Edit distance. https://en.wikipedia.org/wiki/Edit_distance.

22.

Wu, F.. (2008). Information extraction from Wikipedia: moving down the long tail (731-739). Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

23.

Zhang, C.. (2016). Extracting databases from dark data with deepdive (847-859). Proceedings of the 2016 International Conference on Management of Data.

Journal of the Korean Society for Information Management