Leveraging Publicly Available Information to Analyze Information Operations
DOI:
https://doi.org/10.37266/ISER.2021v9i2.pp142-148Keywords:
Information Operations, Publicly Available Information, Natural Language Processing, Web ScrappingAbstract
Traditionally, a significant part of assessing information operations (IO) relies on subject matter experts’ time- intensive study of publicly available information (PAI). Now, with massive amounts PAI made available via the Internet, analysts are faced with the challenge of effectively leveraging massive quantities of PAI to draw meaningful conclusions. This paper presents an automated method for collecting and analyzing large amounts of PAI from China that could better inform assessments of IO campaigns. We implement a multi-model system that involves data acquisition via web scraping and analysis using natural language processing (NLP) techniques with a focus on topic modeling and sentiment analysis. After conducting a case study on China’s current relationship with Taiwan and comparing the results to validated research by a subject matter expert, it is clear that our methodology is valuable for drawing general conclusions and pinpointing important dialogue over a massive amount of PAI.References
Barde, B., & Bainwad, A. (2017). An Overview of Topic Modeling and Tools. 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), (pp. 745-750). Madurai, India.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. J. Mach. Learn. Res, 993–1022.
eBay, Inc. v. Bidder's Edge, Inc., C-99-21200RMW (US District Court for the Northern District of California May 24, 2000).
Global Times. (2021, February 5). PLA expels trespassing US warship from Xisha Islands. Retrieved from https://www.globaltimes.cn/page/202102/1215073.shtml
Gupta, R., Besacier, L., Dymetman, M., & Galle, M. (2019). Charecter-based NMT with Transformer. arXiv: 1911.04997.
Holm, R. R. (2017, March). Natural Language Processing of Online Propoganda as a Means of Passivley Monitoring an Adversarial Ideology. Retrieved from [Master's thesis, Naval Postgraduate School]: https://apps.dtic.mil/sti/pdfs/AD1045878.pdf
Hutchins, J. W., & Somers, H. L. (1992). An Introduction to Machine Translation. London: Academic Press.
Information Operations. (2012). In Joint Publication 3-13 (p. 87). Washington D.C.
Jones, T., & Doane, W. (2019). textmineR. Retrieved from https://www.rtextminer.com/
Mastro, O. S. (2021). The Precarious State of Cross-Strait Deterrence. Statement before the U.S. China Economic and Security Review Commission on "Deterring PRC Aggression Toward Taiwan.”
Mohammad, S. M., & Turney, P. D. (2010). Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon. Los Angeles: Association for Computational Linguistics.
Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a Word-Emotion Association Lexicon. 1308.6297.
Muthukadan, B. (2018). Selenium with Python. Retrieved from https://selenium-python.readthedocs.io/
Rees, B. (2018). Dismantling Contemporary Military Thinking and Reconstructing Patterns of Information: Thinking Deeper About Future War and Warfighting. Small Wars Journal, smallwarsjournal.com/jrnl/art/dismantling-contemporary-military-thinking-and-reconstructing-patterns-information.
Richardson, L. (2020). Beautiful Soup Documentation. Retrieved from https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Rinker, T. (2019). sentimentr. Retrieved from https://github.com/trinker/sentimentr
Shumei, L., & Lin, W. (2021, February 18). Taiwan island's intensive military exercises a political show to cover its weakness: analysts. Retrieved from Global Times: https://www.globaltimes.cn/page/202102/1215898.shtml
Sievert, C., & Shirley, K. (2014). Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces. (pp. 63-70). Baltimore: Association for Computational Linguistics.
Smith, S. T., Kao, E. K., Mackin, E. D., Shah, D. C., Simek, O., & Rubin, D. B. (2021). Automatic detection of influential actors in disinformation networks. National Academy of Sciences (pp. 118-122). DOI: 10.1073/pnas.2011216118.
Xuanzun, L. (2020, December 22). PLA expels US warship trespassing South China Sea. Retrieved from Global Times: https://www.globaltimes.cn/page/202012/1210657.shtml
Xuanzun, L. (2021, January 27). Taiwan's display of new missile 'wrongly boosts courage of secessionists'. Retrieved from Global Times: https://www.globaltimes.cn/page/202101/1214177.shtml
Yin, J., & Wang, J. (2014). A dirichlet multinomial mixture model-based approach for short text clustering. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.
Published
How to Cite
Issue
Section
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
The copyediting stage is intended to improve the flow, clarity, grammar, wording, and formatting of the article. It represents the last chance for the author to make any substantial changes to the text because the next stage is restricted to typos and formatting corrections. The file to be copyedited is in Word or .rtf format and therefore can easily be edited as a word processing document. The set of instructions displayed here proposes two approaches to copyediting. One is based on Microsoft Word's Track Changes feature and requires that the copy editor, editor, and author have access to this program. A second system, which is software independent, has been borrowed, with permission, from the Harvard Educational Review. The journal editor is in a position to modify these instructions, so suggestions can be made to improve the process for this journal.