GEOG 323 Reflection 4: Spatial Twitter Analysis

For our next unit, we will be working on a reproduction of Wang, Ye, and Tsou's (2016) paper "Spatial, temporal, and content analysis of Twitter for wildfire hazards." Our first tasks were to read the paper, paying special attention to the analytical techniques the authors used and how they conveyed and visualized their results, and to analyze the reproducibility and replicability of the study.

Wang et al. separated their methodology into three main sections: kernel density estimation (KDE) to analyze the spatial distribution of tweets about wildfires and dual KDE to normalize number of tweets by population, text mining for picking out common topics in these tweets, and finally social network analysis to "detect the opinion leaders in wildfire hazards" (p. 527). The authors present their findings on the temporal distribution of wildfire-related tweets and tweets referring to Bernardo and San Marcos in bar charts (Figures 1-3, pp. 529-30). The KDE technique produced a map of "fire" and "wildfire" tweet clusters in the study region (Figure 3, p. 531). Dual KDE produced heat maps representing the density of tweets per population in five classes from "very low" to "very high" (Figures 4 and 5, pp. 532 and 533). The authors give the results of their text mining in a term frequency plot bar chart (Figure 7, p. 535) and a table of term clusters (Table 3, p. 535). Finally, the results of the social network analysis appear in graphs of the indegree and outdegree cumulative distributions of the retweet network (Figure 8, p. 535, and Figure 9, p. 536, respectively).

I do consider this paper to be replicable. The National Academies of Sciences, Engineering, and Medicine define replicability as "obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data" (National Academies of Sciences, Engineering, and Medicine 2019, p. 46). The methods used in this paper could be relatively easily adapted to analysis of other natural disasters in other places, potentially even using other social media platforms' APIs, to determine whether the patterns identified in this study hold true in other situations or using other datasets. Although the authors do not share specific code, they give a thorough phase-by-phase description of their methods and explain the rationale behind many of their decisions. This would enable other researchers to adapt Wang et al.'s methods to other disasters, locations, and social media platforms. By the same token, this thorough documentation makes a successful reproduction of the paper a possibility. According to the National Academies of Sciences, reproducibility means "obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis" (National Academies of Sciences 2019, p. 46). Using the Twitter API, it is distinctly possible that other researchers could follow Wang et al.'s steps to gather and process wildfire-related tweets from the same date range and the same locations as the original researchers. However, given that the type of Twitter API queries Wang et al. conducted are limited to 1% of all tweets (Wang et al. 2016, p. 537), the reproducing researchers might end up with a different sample of tweets, possibly leading to different results. As the authors point out, "[T]he 1% sample limitation may lead to question that whether the sampled data are a valid representation of the overall wildfire Twitter activities" [sic] (p. 537). Perhaps a different 1% sample would yield different results. Moreover, without access to Wang et al.'s code, it is difficult to tell exactly how the authors performed each computation; any reproduction's code would probably differ from the original authors' code in at least some small ways. Thus, while the basic information necessary for a reproduction is present, it is difficult to know if a reproduction would be a true success.

References:

National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and Replicability in Science. https://doi.org/10.17226/25303

Wang, Z., Ye, X., & Tsou, M.-H. (2016). Spatial, temporal, and content analysis of Twitter for wildfire hazards. Natural Hazards, 83(1), 523–540. https://doi.org/10.1007/s11069-016-2329-6


Back Home