Geospatial Big Data Analytics for Quality Control of Surveys
Main Article Content
Geospatial big data analytics allows survey quality control analysts to draw important conclusions about survey data quality that otherwise would take excessive time and resources. In this work, we explored two algorithmic methods that can help ensure reliability of survey interviews by detecting geospatial outliers. Focusing on geospatial data collected from surveys, we implemented outlier detection techniques with two different distance metrics to identify statistical anomalies in real-world datasets that may have qualitative interpretations. We found that one algorithm, which considers the local distribution of points in a dataset, identifies a different set of outliers when compared to another method, which considers the global distribution of points. Since there was a small overlap (10-19%) of flagged points between the two algorithms implemented, it may be helpful for analysts to focus on the fewer “outlier” points that are flagged by both methods rather than all the “outlier” points that are flagged by each algorithm. Finally, analysts should consider the computational costs, as the algorithms differ significantly.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
The copyediting stage is intended to improve the flow, clarity, grammar, wording, and formatting of the article. It represents the last chance for the author to make any substantial changes to the text because the next stage is restricted to typos and formatting corrections. The file to be copyedited is in Word or .rtf format and therefore can easily be edited as a word processing document. The set of instructions displayed here proposes two approaches to copyediting. One is based on Microsoft Word's Track Changes feature and requires that the copy editor, editor, and author have access to this program. A second system, which is software independent, has been borrowed, with permission, from the Harvard Educational Review. The journal editor is in a position to modify these instructions, so suggestions can be made to improve the process for this journal.
Ben-Gal, I. (2005). Outlier Detection. In O. Maimon, & L. Rokach (Eds.), Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA, 2005; 131-146. Retrieved from https://doi.org/10.1007/0-387-25465-X_7.
Brassil, C. (2021a). Quality Control Overview. Presentation on 21 August 2021 from D3 Systems.
Brassil, C. (2021b). GPS Training III – Basic Analysis. Presentation on 21 August 2021 from D3 Systems.
Breunig, M., Kriegel, H. P., Ng, R., and Sander, J. (2000). LOF: Identifying Density-Based Local Outliers. Proceedings of the ACM SIGMOD 2000 International Conference on Management of Data, 2000; 1-12..
Singh, A., and Lalitha, S. (2018). A novel spatial outlier detection technique. Communications in Statistics - Theory and Methods, 2018; 47, 247-257.