Five Steps To Using Geospatial Information Systems for HIV Research
Step 1: Developing a Geospatial Research Question.
GIS is suitable to address many diverse research questions (1). For example, with respect to Antiretroviral Therapy, studies have examined whether ART uptake and adherence varied across North and South America (2), whether neighborhood factors impacted CD4 cell count among women living with HIV (3), and whether structural barriers inhibited access to care among PLWH (4). While GIS is not appropriate for answering every question—both for practical and theoretical reasons—the core requirement of any GIS research question is that it includes some factor that varies spatially. By leveraging GIS, researchers can conduct analyses that provide a better understanding of the environmental factors that shape health. This is especially useful when detailed information cannot be obtained on an individual level—such as is the case with many large HIV cohorts comprised of surveillance data. Linking data through geospatial analyses can approximate the environmental influences to which individuals are exposed using a theoretically informed approach for understanding the decisions and behaviours of individuals and groups that are supported by decades of psychological and sociological theory (5,6).
Step 2: Collecting Exposure and Outcome Data.
Once a research question has been articulated, appropriate data must be identified and secured for the use of the study. As with traditional epidemiological studies, both exposure and outcome data are required. Exposure data should characterize the spatial variance in an explanatory factor. Municipal census data are often the most common sources for exposure data. However, exposure data were also collected from participants in the form of neighborhood assessments (3) or through various open source data catalogs, in which exposure data had already been collected for some other purpose (7). In particular, economic, administrative, and other data collected by governmental and non-governmental agencies (usually for other purposes) can be adapted for use in HIV research to widen the scope of potential explanatory factors that may have a meaningful impact on the environments being assessed. Further, outcome variables can also be collected from both primary and secondary sources provided that the data are spatially representative and/or unbiased. Surveillance data and data from large cohorts also serve as the primary data sources for outcome variables.
Step 3: Geocoding.
In addition to exposure and outcome data, GIS studies require a third data type—geospatial data—which is necessary to link exposure variables (e.g., median household income, median age, racial composition) to individual-level outcome data (e.g., surveillance data, survey responses). For example, Wilson et al. used reported address data (i.e., ZIP codes) and census population size estimates for each ZIP code to classify participant residence as rural (<10,000), peri-urban (10,000 – 100,000), or urban (>100,000). Similarly, Ohl et al. used ZIP codes and Rural-Urban Commuting Area (RUCA) codes to create Urban-Metropolitan-Small Town categories. While simpler dichotomies (e.g., urban vs. rural) are also used, studies can also utilize more complex geographical frameworks—usually postal code and address data, aggregated by census unit (i.e., blocks, tracts), geopolitical unit (i.e., State, Province, City, ZIP Code Tabulation Area [ZCTA]), or clinical catchment area (i.e., Health Authority, hospital catchment areas).
GIS linkages are also used to aggregate participant characteristics by geographic area, as well as to determine correlations between spatial characteristics. This is done when access to person-level data is limited or when neighborhood-level conditions might serve as a better measure for given outcomes of interest (8–11). For example, Das et al. and others have summed and averaged individual viral load data at the “neighborhood” level because it may provide a better operationalization of relevant risk measures than other measures can. Though imperfect, this approach allows public health leaders to respond to areas with high community viral loads, which present higher risk for residents in that area. Matching these aggregated data to census data, Das was also able to show that African American and low income communities had higher average viral loads—a trend which was repeatedly reported (11–14).
Step 4: Statistical Analysis.
While aggregation and linkage of data allow researchers to identify correlations between risk and outcome factors, a number of spatial analytic techniques are also available. Table 1 provides a summary of these statistical approaches. The empirical foundations of many GIS studies remain rooted in traditional statistical approaches where variables are extracted from GIS and used as explanatory variables in traditional statistical analyses—such as univariable statistics (i.e., chi-square tests, Wilcoxon rank sum tests, Kruskal-Wallis test, Fisher’s exact test, Student’s t-tests), multivariable regression (i.e., logistic, linear, and Poisson), survival analysis (i.e., Kaplan Meier survival analysis, Weibul survival modelling, Cox proportional hazards modelling), and analysis of variance (ANOVA)—making GIS feasibly accessible to most trained epidemiologists and statisticians (15). It should be noted, however, that when the spatial dependence of observations is not accounted for, standard error may be artificially deflated resulting in an increased type I error rate.
Step 5: Reporting Results.
For-pay (i.e. ArcGIS) and open-source (i.e., QGIS and R) GIS software can also allow researchers to generate maps that can help inform our understanding of the geography of health outcomes. These maps can often help communicate important health messages or phenomena that might otherwise go unnoticed or are not easily articulated in words (2). To this end, many communities and resources, such as those available through github.com or stackexchange.com, can be used to learn these software and guide GIS decisions and analyses. However and in general, GIS reporting relies heavily on a researcher’s knowledge of the study population, geography, and demography. Based on this knowledge, geographic information can be conveyed using dot maps, choropleths, kernel density maps, and cartographic maps. Mapping health data introduces a new level of ethical concern (16)—especially regarding participant privacy. Consistent with the level of caution generally employed when working with PLWH and municipal surveillance data, must studies are extremely careful to maintain patient privacy. Generally, authors maintain study privacy by avoiding dot-maps, opting instead for choropleths, kernel density maps, or other aggregated-mapping techniques. When dot maps were used, they represented providers and services, rather than PLWH, or presented maps that were sufficiently zoomed out as to make it difficult to ascertain the true location of participants. In cases where population characteristics of specific geospatial units might allow participants to be identified simply by deduction, some authors omitted these units from analyses—analogous to removing an outlier in traditional statistical studies. Alternatively, many geospatial data need not be mapped at all and can be categorized by region , and displayed using traditional charts (e.g., bar-charts, histograms, scatterplots, etc.).
When reporting results in geospatial analyses, additional care must be taken to ensure that study methods are reproducible. Unfortunately, to our knowledge no reporting guidelines, such as PRISMA (17) or STROBE (18), have yet been developed and/or widely adopted for GIS health research. For example, researchers should provide an overview of the entire region included in their analysis and specify which characteristics of specific sub-geographies might relate to the exposure and outcome variables. Indeed, only with detailed background information can the lessons of one study be correctly applied to other settings, as the conditions present in one region may not exist elsewhere. Careful reporting of explanatory, outcome, and geospatial data sources is necessary to conceptualize fully what was being done and what degree of rigor findings represented. Further, because GIS statistical approaches are not as widely understood, detailed information regarding these procedures should be provided in-text. Background information regarding study settings is also needed to understand to what extent findings may be representative of other geographies. Regarding maps, special care should be taken to report how maps were constructed: What do key features represent? From where do data derived originate? How were color and shading choices rationalized? To which settings or under what conditions can findings be extrapolated, if at all? In sum, geospatial analyses may require additional and more detailed reporting than other epidemiological reports.
1. Kandwal R, Garg PK, Garg RD. Health GIS and HIV/AIDS studies: Perspective and retrospective. J Biomed Inform. 2009 Aug;42(4):748–55.
2. Althoff KN, Rebeiro PF, Hanna DB, Padgett D, Horberg MA, Grinsztejn B, et al. A picture is worth a thousand words: maps of HIV indicators to inform research, programs, and policy from NA-ACCORD and CCASAnet clinical cohorts. J Int AIDS Soc. 2016;19(1):20707.
3. Burke-Miller JK, Weber K, Cohn SE, Hershow RC, Sha BE, French AL, et al. Neighborhood community characteristics associated with HIV disease outcomes in a cohort of urban women living with HIV. AIDS Care. 2016 Oct 2;28(10):1274–9.
4. Akullian AN, Mukose A, Levine GA, Babigumira JB. People living with HIV travel farther to access healthcare: a population-based geographic analysis from rural Uganda. J Int AIDS Soc [Internet]. 2016 Feb 10 [cited 2016 Oct 17];19(1). Available from: http://www.jiasociety.org/index.php/jias/article/view/20171
5. Sack RD. The Power of Place and Space. Geogr Rev. 1993;83(3):326–9.
6. Latkin CA, Curry AD, Hua W, Davey MA. Direct and Indirect Associations of Neighborhood Disorder with Drug Use and High-Risk Sexual Partners. Am J Prev Med. 2007 Jun;32(6 Suppl):S234–41.
7. Cooke GS, Tanser FC, Bärnighausen TW, Newell M-L. Population uptake of antiretroviral treatment through primary care in rural South Africa. BMC Public Health. 2010;10:585.
8. Rose G. Sick Individuals and Sick Populations. Int J Epidemiol. 1985 Mar 1;14(1):32–8.
9. Das M, Chu PL, Santos G-M, Scheer S, Vittinghoff E, McFarland W, et al. Decreases in community viral load are accompanied by reductions in new HIV infections in San Francisco. PloS One. 2010;5(6):e11068.
10. Castel AD, Befus M, Willis S, Griffin A, West T, Hader S, et al. Use of the community viral load as a population-based biomarker of HIV burden. AIDS Lond Engl. 2012 Jan 28;26(3):345–53.
11. Sayles JN, Rurangirwa J, Kim M, Kinsler J, Oruga R, Janson M. Operationalizing treatment as prevention in Los Angeles County: antiretroviral therapy use and factors associated with unsuppressed viral load in the Ryan White system of care. AIDS Patient Care STDs. 2012 Aug;26(8):463–70.
12. Arnold M, Hsu L, Pipkin S, McFarland W, Rutherford GW. Race, place and AIDS: The role of socioeconomic context on racial disparities in treatment and survival in San Francisco. Soc Sci Med 1982. 2009 Jul;69(1):121–8.
13. Hanna DB, Buchacz K, Gebo KA, Hessol NA, Horberg MA, Jacobson LP, et al. Trends and Disparities in Antiretroviral Therapy Initiation and Virologic Suppression Among Newly Treatment-Eligible HIV-Infected Individuals in North America, 2001–2009. Clin Infect Dis. 2013 Apr 15;56(8):1174–82.
14. Shacham E, Lian M, Önen N, Donovan M, Overton E. Are neighborhood conditions associated with HIV management? HIV Med. 2013 Nov 1;14(10):624–32.
15. Gesler W. The uses of spatial analysis in medical geography: a review. Soc Sci Med 1982. 1986;23(10):963–73.
16. Gagnon M, Guta A. Mapping HIV community viral load: space, power and the government of bodies. Crit Public Health. 2012 Dec;22(4):471–83.
17. Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev. 2015;4:1.
18. Elm E von, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. Strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ. 2007 Oct 18;335(7624):806–8.
19. Moran PAP. Notes on Continuous Stochastic Phenomena. Biometrika. 1950;37(1/2):17–23.
20. Geary RC. The Contiguity Ratio and Statistical Mapping. Inc Stat. 1954;5(3):115–46.
21. Cuzick J, Edwards R. Spatial Clustering for Inhomogeneous Populations. J R Stat Soc Ser B Methodol. 1990;52(1):73–104.
22. Ripley BD. The Second-Order Analysis of Stationary Point Processes. J Appl Probab. 1976;13(2):255–66.
23. Kulldorff M. A spatial scan statistic. Commun Stat - Theory Methods. 1997 Jan 1;26(6):1481–96.
24. Knox EG, Bartlett MS. The Detection of Space-Time Interactions. J R Stat Soc Ser C Appl Stat. 1964;13(1):25–30.
25. Mantel N. The detection of disease clustering and a generalized regression approach. Cancer Res. 1967 Feb;27(2):209–20.
26. Jacquez GM. A k nearest neighbour test for space-time interaction. Stat Med. 1996 Sep 15;15(17–18):1935–49.
27. Anselin L. Local Indicators of Spatial Association—LISA. Geogr Anal. 1995 Apr 1;27(2):93–115.
28. Getis A, Ord JK. The Analysis of Spatial Association by Use of Distance Statistics. Geogr Anal. 1992 Jul 1;24(3):189–206.
29. Apparicio P, Abdelmajid M, Riva M, Shearmur R. Comparing alternative approaches to measuring the geographical accessibility of urban health services: Distance types and aggregation-error issues. Int J Health Geogr. 2008;7:7.
30. Delamater PL, Messina JP, Shortridge AM, Grady SC. Measuring geographic access to health care: raster and network-based methods. Int J Health Geogr. 2012;11:15.
31. Shepard D. A two-dimensional interpolation function for irregularly-spaced data. In ACM Press; 1968 [cited 2017 Mar 21]. p. 517–24. Available from: http://portal.acm.org/citation.cfm?doid=800186.810616
32. Oliver MA, Webster R. Kriging: a method of interpolation for geographical information systems. Int J Geogr Inf Syst. 1990 Jul 1;4(3):313–32.