Real-Time Migration Tracking to Puerto Rico after Natural Hazard Events

Alejandro Arrieta
Florida International University

Shu-Ching Chen
Florida International University

Juan Pablo Sarmiento
Florida International University

Richard Olson
Florida International University

Publication Date: 2021

Executive Summary


Populations frequently move across national borders in response to natural hazard events. This is particularly relevant to the Caribbean where almost 4% of the total population has migrated within the region because of the effects of hurricanes and subsequent flooding. For Puerto Rico natural hazard event migration is doubly important because the island is not only a “receiver” of immigrants, or host community, but also a “sender” or source of immigrants. In the aftermath of Hurricane Maria in 2017, for example, in the island’s “sending” role it is estimated that 3% to 6% of Puerto Ricans left for the mainland United States. In Puerto Rico’s “receiving” role, it is estimated that 66,000 people from other Caribbean islands (primarily the Dominican Republic, Cuba, and Haiti) migrated to Puerto Rico. These migrants have sought shelter, either permanent or temporary, and pose important public health challenges for Puerto Rico. The unpredictability in the numbers of immigrants into Puerto Rico, and the uncertainty of their health conditions upon arrival can easily overload the capacity and jeopardize the effectiveness of the island’s public and private health organizations. One underlying problem is the lack of real-time post-disaster data to track international migration in the Caribbean. Improving migration data would help health organizations make better decisions as they prepare for and respond to natural hazard events.

Research Design

In this study we developed an innovative data collection and estimation method to track international migration flows using Internet-derived “big data” and tested the method’s feasibility and validity. Our method builds on recent evidence that immigrating populations remain culturally connected to their country of origin, mostly through Internet activities. Google searches for information or news about the source country from the host area can be used to track migration. We developed a three-step approach that first identifies a set of search keywords specific to the source country affected by a natural hazard event. Second, we obtained weekly search trends for all source-country-specific keywords that had been searched in the host geographic area. Finally, we used a Dynamic Factor Model (Stock & Watson, 20111) to estimate a common factor that explains variations in all search trends. In this study we limited our scope to using this dynamic factor as a proxy for estimating migration flows from the Dominican Republic to Puerto Rico.

To test the feasibility of our approach and validate our method, we compared our estimated migration flows to annual data collected by the U.S. Census Bureau for the years 2015 to 2018 (U.S. Census Bureau, 20192). Our method showed a 95.5% correlation. We also checked for abrupt changes in our estimated weekly migration flows factor after a well-documented disaster event: Hurricane Matthew and the subsequent flooding that hit the Dominican Republic between October and November 2016. Our preliminary results showed a peak in migration flows from the Dominican Republic to Puerto Rico four months after Hurricane Matthew. It also detected a sharp fall in migration between September and November 2017, when Hurricane Maria devastated Puerto Rico.


Our model offers the possibility of tracking immigration flows to Puerto Rico in real-time. The tool has the potential to help local health departments and private organizations understand migrant inflows and better allocate public health resources to protect both vulnerable migrants and Puerto Rican residents.

Introduction and Literature Review

Weather related and geophysical disasters accounted for the displacement of more than 30-million people worldwide in 2020 (IMDC, 20213). The health and social consequences of such population displacements are considerable (Thomas & Thomas, 20044). Poor health conditions at the point of origin, during transit, and at the destination all make immigrants more vulnerable and represent a major overall social determinant of health (Davies et al., 20095). Immigration is associated with worsening chronic physical health (Mensah et al., 20056), infectious disease problems (Thomas & Thomas, 2004), and mental health issues (Siriwardhana & Stewart, 20137). Many studies have documented the risks, traumas, and problems that migrants face at all stages of their journeys including, but not limited to, human rights violations, torture, linguistic and cultural prejudices, racial discrimination, labor exploitation, high-risk environmental exposure in house and work settings, as well as a lack of food, water, and medication (Wickramage et al., 20188). Additionally, healthcare systems are often unprepared to provide adequate access to treatments, prescriptions, and procedures, which are usually interrupted or discontinued when an immigrant is in transit (Baum et al., 20199). Government labor and immigration policies combined with cultural attitudes and prejudice add more stress to migrants’ health conditions (Castañeda et al., 201510). Overall, the combination of these problems has long-lasting effects on migrant health (Greenough et al., 200811; Jhung et al., 200712; Mills et al., 201313; Self-Brown et al., 201414), and is a major concern from a public health perspective.

The public health consequences of migration are particularly relevant for the Caribbean where intra-regional migration accounts for nearly 60% of the total migration (United Nations, 201915), and disasters are a major cause of displacement (Beaton et al., 201716). A person living on a Caribbean island is three-times more likely to be displaced by a disaster than someone living elsewhere (Ginnetti et al., 201517). The impact of Hurricane Maria in Puerto Rico is a good example of disaster-induced immigration. Around 100,000 to 200,000 Puerto Ricans (3% to 6% of the total population) left the island for the mainland U.S. immediately after Maria (Alexander et al., 201918; Meléndez & Hinojosa, 201719; Rayer, 201820; Santos-Lozada, 201821).

The data that track international mass migrations are muddled (Dijstelbloem, 201722) and flawed (Nature, 201723). Census data, the most accurate source for tracking immigration, is usually outdated and covers only a small number of countries (Laczko, 201624). Official data not only provides a delayed picture of population inflows, but it also does not capture undocumented immigration. It is estimated that 75,000 to 100,000 undocumented Dominicans live permanently in Puerto Rico (Ferguson, 200325), and that nearly 2,000 undocumented Haitians enter Puerto Rico every year (Ross, 201426). A large number of undocumented migrants from the Dominican Republic and Haiti try to first reach Puerto Rico or the Bahamas before traveling on to the United States (Agency, 201727). Technologies for timely monitoring of human mobility (automated border controls, satellite and drone imagery, web- and phone-based registries) are also inaccurate and highly controversial (Dijstelbloem, 2017; Laczko, 2016; Verhulst & Young, 201928).

Internet-derived data provides an opportunity to track migration in real time. Our literature review uncovered two different approaches. First, studies that extract geolocation traces from social media profiles and logins (Hawelka et al., 201429; Messias et al., 201630; Rodriguez et al., 201431; State et al., 201332; Zagheni et al., 201433; Zagheni et al., 201734), or from telecommunication applications (Kikas et al., 201535). Second, studies that infer intention to migrate from search queries (Connor, 201736; Lin et al., 201937; Pulse, 201438; Vicéns-Feliberty & Ricketts, 201639). While both approaches have shown strong correlations with actual migration flows (Righi, 201940), they could be biased because the samples they use are small and limited to a non-random sample of migrants with active user profiles in the selected Internet platform.

In this study we propose a novel approach building on new evidence that immigrants maintain their cultural identity and remain connected to their countries of origin for several years (Abramitzky et al., 201641; Bates & Komito, 201242; Dekker & Engbersen, 201443; Diminescu, 200844). Our approach is to identify search terms that are unique to a source country and then obtain data on how often they are used in a host geographic area. The use of similar search terms between co-nationals living in different countries implies that immigrants are connected to each other. To our knowledge, our method is the first to use this commonality to identify migration flows. In a different approach based on the same implications of connected immigrants, (Vaca-Ruiz et al., 201345) used Yahoo! Meme, a popular social media site in Brazil, to measure the interaction between profiles with IP addresses in different regions of the country. While this approach is more precise for measuring individual connections as it requires access to private data (social media profiles with IP identification), it is biased to a specific social media platform, and it provides a picture of the existing migrant population rather than dynamic migration flow.

We applied our approach to immigration from the Dominican Republic to Puerto Rico after Hurricane Matthew and subsequent flooding in 2016. Hurricane Matthew made landfall as a Category 4 hurricane on Hispaniola in October 4, 2016 (Masoero, 201646), bringing heavy rains to 11 provinces of the Dominican Republic, and displacing more than 18,000 people. The rainfall continued for several weeks causing even more flooding in 19 provinces and displacing more than 20,000 people between November 7-8, 2016 (Davies, 201647). Whether these flood events in the Dominican Republic triggered migration to Puerto Rico is unknown but expected given the geographic proximity and historical patterns on migration. The Dominican Republic has been the largest source of immigrants to Puerto Rico, accounting for almost 20% of total immigrants, followed by Cuba (5%) and Colombia (2%) (United Nations, 2019).


Our study answered two research questions:

  1. Is it feasible to use Internet data to measure migration flows to Puerto Rico from other Caribbean countries?
  2. Is the migration flow variable a valid real-time measure of present migration? In particular, are the modelled estimates of immigration to Puerto Rico from the Dominican Republic after Hurricane Matthew in 2016 close to the observed number of immigrants recorded in census data?

To answer the first question, we used a novel data extraction method to generate a migration flows variable from countries with historically documented immigration to Puerto Rico. Then, for the second question, we tested for significant changes in our predicted migration flows variable after Hurricane Matthew and subsequent flooding in the Dominican Republic in 2016.

Data Extraction

We extracted data in three steps; each step is described in detail below. In Step 1, we extracted a large number of terms (N) from each source country using web news data scraping and tweets published by mainstream media of source countries. We compared terms across all source countries and selected a smaller number of terms (S) that are specific to each source country (i.e., terms that are not searched in other countries). In Step 2, we obtained weekly trends of search volumes for each of the S country-specific keywords in the host geographic region (Puerto Rico). In Step 3, we estimated a Dynamic Factor Model (Stock & Watson, 2011) to obtain a single factor that is common to all country-specificsearch trends.

Step 1: Country-specific Keywords

The first step of data processing was divided into three stages: initialization, model analysis, and result evaluation. Figure 1 shows the complete data processing workflow. The detailed description for each stage is illustrated in the following sections.

Figure 1. Overall Data Processing Flow Chart

Initialization. This stage included data collection and data preprocessing. We used two data collection strategies: web news data scraping and tweets scraping from the mainstream media of various countries in the Caribbean region. Web scraping is the process of gathering information from the Internet. We manually collected and sorted news websites of various countries, dividing news into several categories such as politics, business, arts and entertainment, sports, etc. A group of three graduate students fluent in Spanish, Creole, and English identified and classified news websites after receiving basic instruction in a common classification process. We found several limitations with this data collection strategy. Most news media websites only displayed breaking news, limiting the collection of historical news data through the web crawler. Also, some websites had anti-scraping mechanisms, and small changes in their code stopped the scraping service. As a result, we used tweets scraping as our main data collection strategy. We used Twint (GitHub, 202148), a Twitter scraping tool written in Python to retrieve the tweet data of the official accounts of mainstream media in various countries from the Caribbean region. Twint provides a language parameter that we used to obtain the French tweets from Haiti, Guadeloupe, and Martinique, and Spanish tweets from the Dominican Republic, Colombia, Cuba, Panama, and Venezuela. The time interval was set as “2014 to 2021” in all queries.

After collection, we preprocessed the data. Tweets are short messages, restricted to 280 characters in length, that may include special characters such as emoticons, targets (“@” symbol to refer to other users), hashtags (“#” symbol to mark topics), and other user information (Wisdom & Gupta, 201649). The Twitter data contains a number of fields that characterize a Tweeter’s post. Out of 36 fields in a tweet, we selected the following features: ID (unique number to identify a tweet), date, name, tweet, and hashtags. The content of the tweets was processed with Spacy, a free open-source library for Natural Language Processing in Python (ExposionAI GmbH, 202150). We used the Spacy’s Named Entity Recognition (NER) for Spanish with the es_core_news_sm model. NER is the task of recognizing and demarcating the segments of a document that are part of a name. At the heart of the NER component is a two-step process. The first step is to detect a named “entity,” and the second step is to categorize the entity (Johansen, 201951). Step one involves detecting a word or string of words that form an entity. Each word that makes up the entity represents a “token.” For instance, “The Great Lakes” is a string of three tokens that represents one entity. Inside-outside-beginning tagging is a common way of indicating where entities begin and end. When we know the position of the entities then we can use the underscore (_) symbol to merge each entity and keep them together as a whole. We used four different categories of names: Locations (LOC), miscellaneous (MISC), organizations (ORG), and persons (PER) (Johansen, 2019). Table 1 shows examples of tweets.

Table 1. Entity of Tweets

Content of Twitter Entity Category Entity Term Entity Index [Beginning, End]
Saab advisers paid jet for meeting in Cabo Verde† (Saab, PER), (Cabo Verde, LOC) Saab, Cabo Verde [2, 3], [8, 10]
Francisco Cervelli suffered another concussion† (Francisco Cervelli, PER) Francisco, Cervelli [0, 2]
Ronaldinho will know if he is released after half a year held in Paraguay† (Ronaldinho, PER), (Pa, MISC) Ronaldinho, Pa [0, 1], [11, 12]
148 people were arrested in Paris for riots after the Champions League final† (Paris, ORG) Paris [5, 6]
A vision: Teresa de la Parra* (Teresa de la Parra, PER) Teresa de la Parra [[3, 7]]

†Tweet in English translated from Spanish. Categories: PER=Person name; ORG=Organization name, LOC=Place name; MISC=Miscellaneous.

We then used tokenization to split sentences. Tokenization is one of the most basic steps in text analysis. The purpose of tokenization is to split the text into smaller units called tokens, usually words or phrases. Words that belong to the same entity, connected through the underscore symbol as described in the previous paragraph, were not split. Programming languages are case sensitive, which means that “The” was considered a different token than “the.” Hence, converting all tokens to lowercase was a necessary preprocessing step. We also removed punctuation marks, web links, numbers, and stop words. Stop words are commonly occurring words that do not add meaning to a phrase (for example “the”, “and”, “an”, “a”, etc.). We used NLTK (NLTK Project, 202152), a platform to build Python programs to process human language data, which also provides a library with stop-words for Spanish and French. Figure 2 represents our data preprocessing.

Figure 2. Data Preprocessing Flow Chart

Model Analysis. We started identifying keywords (words or phrases that are best related to the content of a document) in a part of speech (POS). POS is a category of words that have similar grammatical properties and explain how a word is used in a sentence. Keywords were automatically identified using the Stanford Log-linear Part-Of-Speech Tagger (The Stanford Natural Language Processing Group, 202053; Toutanova, et al., 200354), a software implemented in Java that reads text in some language and assigns parts of speech to each word such as noun, verb, adjective, etc. Table 2 shows examples from tweets and addresses the possible POS according to the 5Ws (who, where, when, what, why) (Zhang et al., 200655). These nouns or noun phrases may not consist of one word, so it was necessary to identify those keywords. For example, the entity person name “Teresa de la Parra” was kept as the keyword “Teresa_de_la_Parra”. Note that tokenization and stop words were performed after NER, avoiding the removal of stop words “de” and “la”, and the split of the keyword into “Teresa” and “Parra”.

Table 2. Possible Part of Speech (POS) of 5Ws

5W Possible POS Example
Who Person name Teresa De la Parra
Where Organization name/Place name Cabo Verde
When Temporal noun Anniversary of the birth of Simón Bolívar
What Basic noun, Noun phrase Everything is fiction
Why Noun phrase Refused to attend class

After keyword identification we used the Term Frequency (TF) and Inverse Document Frequency (IDF) algorithms to weigh a keyword (x) in the content of a document and assign its importance based on the number of times it appears in the document. Term Frequency was used to measure how many times keyword x was present in a document. Since every document was different in length, a keyword could appear more often in long documents than in shorter ones (Hakim et al., 201456). Thus, the TF was divided by the document length as a way of normalization:

TF(x) = Number of times term x appears in a documentTotal number of terms in the document

IDF measures how important keyword x is based on the formula:

IDF(x) = loge( Total number of documentsNumber of documents with term x in it)

We calculated the TF-IDF values for all keywords in tweets posted between June 2016 and December 2016, a period that includes Hurricane Matthew and subsequent flooding in the Dominican Republic. The top three keywords with the highest TF-IDF values were selected for each tweet in the Dominican Republic and other source countries. Duplicate keywords within each country were eliminated keeping only N distinct keywords for each source country. Then all distinct keywords were compared across all countries to identify those that are unique or specific to the source country. As illustrated in Table 3, some keywords appeared in multiple countries, but others were unique to a source country. Out of the N distinct keywords we selected keywords that were specific to the source country and represented them as S. It was expected that these S country-specific terms represented the cultural identity of residents from the source country, and therefore it was highly likely that individuals searching for those S terms in Puerto Rico were immigrants from the source country.

Table 3. List of Selected Keywords Appear in Each Country

Entities Colombia Cuba Dominican Republic Panama Venezuela
boxer† 1 0 1 1 1
maggie_smith 0 0 0 1 0
goallllll_of_hungary† 0 0 0 1 0
san_josé_de_apartadó 1 0 0 0 0
hassan_rouhani 1 0 0 0 0
rogers_cup_of_toronto† 0 0 0 1 0
big_league_game† 0 0 0 0 1
great_national_day_of_reforestation† 0 0 0 1 0
†Entity in English translated from Spanish.

Step 2: Keyword Trends

In the second step we obtained search volumes in Puerto Rico for each of the S keywords from Dominican Republic and other comparison source countries. We used Google Trends, an application that analyzes search term trends over time for different geographic regions. Google Trends shows real-time and weekly historical search trends for every city in the world. Data was collected from the Google Trends Application Programming Interface (API) using Pytrends (Hogue & Dewilde, 202057), a Python application for Google Trends that returns search scores over time for all S keywords (Xt of dimension Sx1). A search score x is a normalized search volume index for keyword x that ranges from 0 (low volume searches) to 100 (highest volume searches). We obtained a large set of country-specific scores for searches in Puerto Rico Xt where t represented weeks from June 2016 to June of 2018.

Step 3: Migration Flows Variable

Time variations in each keyword from the set Xt can be explained by factors specific to each search term (fads, individual trends, etc.) and common to all terms. We were interested in a common dynamic factor ft that explains variations in all keywords at time t. It was expected that factor f aggregates search interests about a broad scope of topics related to the source country, and consequently signals changes in migration flows at time t. We applied a Dynamic Factor Model (DFM) (Stock & Watson, 2011), a time-series econometric technique that allowed us to (1) obtain the unobserved common factor (ft), and (2) reduce the large dimension of Xt from S to one common factor. In equations, we assumed that the unobserved factor ft followed a time series process:


where ft and ηt were 1x1, and θ characterized the auto-regressive process of the factor. Each of the keywords in the set Xt was driven by the common factor ft based on the following process:


where Xt and et were Sx1, and β was the factor loading or contribution of the unobserved common factor to the observed individual keyword. We estimated equations (1) and (2) using a generalized principal component estimator in Stata version 16. The estimation was performed only with keywords whose search score time series were stationary based on the Augmented Dickey Fuller test.

Predictive Capability

We took two approaches to answer our second research question. First, we compared our dynamic factor of Dominican Republic immigration to Puerto Rico with annual immigration flows obtained from the American Community Survey (ACS) data. Several researchers have used ACS data, which provides the most accurate assessment of immigration patterns, to test for differences in predicted and observed immigration flows (Cirillo & Gallegati, 201258). We used Puerto Rico ACS Public Use Microdata Sample files as source data for annual flows of immigrants from Dominican Republic to Puerto Rico based on the place of birth (Dominican Republic) and year of entry (within the last 12 months) questions of the 2014 to 2018 surveys. Then we compared the annual number of immigrants from the ACS data to the annual average of our weekly migration flows factor. Our second approach observed weekly migration flows during and after Hurricane Matthew and subsequent flooding that hit the Dominican Republic between October 6 and November 8, 2016 (Davies, 2016; Masoero, 2016).

Ethics Statement

Our research used public anonymized data from Twitter and Google Trends, and public de-identified national data (ACS). Our research methods did not involve direct interaction with individuals or communities, and our data cannot be used to identify individuals. Consequently, our research does not pose risks to human subjects, and it was exempt from IRB approval.

Migration is a core public health ethic issue (Wild & Dawson, 201859), and we recognize that a tool to track migration poses an important moral dilemma. We believe the harm that could result from misusing this tool is outweighed by the higher public health costs of inaction. The potential for harm is limited because our migration flows factor does not provide enough geographic detail to track migration in places smaller than a metropolitan statistical area. We also use aggregate data that includes several people to prevent them from being tracked on an individual basis. A tool to track migration can be used to bring social justice and health equity to immigrant communities. This is especially important if access to the tool is provided not only to governments but to civil society organizations that support immigrants’ health and wellbeing.


Table 4 presents the preliminary results of our data extraction process based only on tweets data given the limitations found with web news data. It presents the number of tweets extracted for Dominican Republic and other source countries, the keywords obtained from those tweets (total number, distinct or unduplicated, and country-specific), and the number of search score time series associated to each country-specific keyword. For this preliminary report only a subset of search score time series were obtained (column F of Table 4) due to the Google Trend restriction of extracting 100 keywords per day.

Table 4. Number of Keywords in Different Source Countries

Countries Number of Tweets (2004-2021) Number of Tweets (Jun 2016–Dec 2016) Number of Keywords (Jun 2016–Dec 2016) Number of Distinct Keywords (Columns D/C %)(Jun 2016–Dec 2016) Number of Country-Specific Keywords (Columns E/D %)(Jun 2016–Dec 2016) Number of Keywords Searched in Google Trends Number of Keywords Found in Google Trends (Columns G/F %)
Dominican Republic 980053 82882 248646 53486 (22) 29195 467 50 (10.7)
Colombia 1846951 137115 411345 60890 (15) 39493 333 50 (15)
Cuba 223677 11234 33702 10077 (30) 0 - -
Panama 1698924 175442 526326 79603 (15) 49615 343 50 (14.6)
Venezuela 10690172 92197 276591 26516 (10) 16566 342 50 (14.6)
Haiti - 6861 120583 12277 (60) 10238 3610 107 (3.0)
Martinique - 7576 22728 11392 (50) 9351 2360 206 (8.7)
Guadeloupe - 340 1020 750 (73) 450 450 123 (27.3)

Figure 3 presents our dynamic migration flows factor of Dominican Republic immigrants to Puerto Rico. It also presents the annual average of our weekly series for years 2015¬–2018 to be able to compare with the census-based annual migration flows data. The figure shows a clear correlation (95.5% correlation) between our predicted migration flows and the census data. It also suggests that the big immigration reduction observed in Puerto Rico occurred mostly between September and November 2017 when Hurricane Maria devastated Puerto Rico.

Figure 3. Dynamic Factor for Dominican Republic Immigrants to Puerto Rico

Figure 4 highlights the period when Hurricane Matthew hit the Dominican Republic on October 6, 2016 and the subsequent floods that affected the country until November 8, 2016. The figure shows the weekly behavior of our migration flows factor during and after these disasters. It shows a significant increase between March and April 2017, 4 months after the disasters.

Figure 4. Migration flows Factor to Puerto Rico and Disaster Events in Dominican Republic


This study presents preliminary results of a novel approach that uses Internet search queries to “nowcast” immigration flows from a specific source country to a host region. We applied our approach to the Caribbean region, exploring immigration from the Dominican Republic to Puerto Rico from June 2016 to June 2018 and the impact of Hurricane Matthew and subsequent floods on the Dominican Republic during October and November 2016.

Key findings

First, we proved the feasibility of our three-step approach to data extraction from Internet search data. For the first step, we showed that extracting country-specific keywords from the Twitter accounts of online news organizations provides a rich resource of current and historical information from migration source countries. For the Dominican Republic we were able to obtain a total of 29,195 keywords specific to this country. For the second step, we showed that Google Trends is a useful tool to for obtaining data on search trends. For the third step, we showed the feasibility of using a Dynamic Factor Model that reduces all country-specific keyword trends to a single dynamic factor common to all keywords. We called this new variable the “migration flows factor,” which could be correlated to weekly flows of Dominican immigrants to Puerto Rico.

Implications for Public Health Practice

We are optimistic about the potential to use our migration flows factor model to monitor weekly immigration to Puerto Rico in real time. We recognize that the Puerto Rican Department of Health could be the primary user of our tool to support their public health mission. The tool can be used to reduce the risk of imported infection diseases and to complement the Department of Health emergency preparedness system. However, from a human-based perspective of public health, we believe that this tool would have achieve a larger impact if it were used by civil society organizations such as the Centro de la Mujer Dominicana or Centros Sor Isolina Ferré.

To achieve our goal, a Spanish language website will be developed to host this tool after we have completed a robust validation of the model. We will seek collaborations with universities in Puerto Rico including an outreach component that facilitates equal access to small and large civil society organizations. By providing equal access to the entire public health system, we can maximize the benefits of using this tool to consider the multiple dimensions that effect immigrants’ wellbeing.


Our study has important limitations. First, our approach relies on Internet access and use. While Internet access is widespread, it is limited among vulnerable populations including immigrants. Our approach relies on search volume, so the migration of small populations with limited access to Internet will not be adequately captured in our model. Our approach relies on Internet searches in Google. While Google is the most widely used search engine in Latin America and the Caribbean, it could be subject to restrictions in countries with authoritarian regimes. In Cuba, for example, some keywords and terms are banned, which biases our estimations for this country. It is important to highlight that our study does not rely on Twitter usage. Our methodology uses Twitter to extract keywords and terms published by news organizations and does not rely on volume of tweets, interactions with tweets, or any other form of Twitter use.

A second limitation refers to the use of Google Trends. We found that it is useful for obtaining search trends, but with narrowed results. Based on a sample of all country-specific keywords, we found that only 10%–16% of all country-specific keywords were available in Google Trends. This is still acceptable given the large number of country-specific keywords obtained from the first step of our methodology, and it reflects the low volume search of most keywords in the host country. However, Google Trends truncates the search scores of low search volume keywords and that limits the use of this tool for small regions. We were able to apply our approach to the whole territory of Puerto Rico but not to small areas (municipalities/barrios or districts) due to this limitation.

A third limitation relates to our strategy for testing the validity of our model. We use census data to compare and validate our results. However, we recognize that census data does not fully capture transitory migration and undocumented migrations. Partially observed migration flows make validation efforts challenging. For this reason, we also looked at the consistency of our migration flows factor in relation to known hazard events (Hurricanes Matthew and Maria) in addition to using census data.

Future Research Directions

Future research will focus on refining our methodology and improving our validation strategy. We can overcome some of the methodological limitations by expanding the number of country-specific keywords and by using a wider variety of Twitter accounts from news media and influencers particularly from French-speaking countries in the Caribbean.

To improve our validation strategy we will expand the number of source countries and the time period over which we estimate migration flows from 2013 to 2021. We also plan to test for changes in migration flow associated with multiple natural hazard events.


  1. Stock, J. H., & Watson, M. (2011). Dynamic factor models. Oxford Handbooks Online. 

  2. U.S. Census Bureau. (2019). 2015-2018 American Community Survey 1-year Public Use Microdata Samples. Retrieved from: 

  3. IMDC. (2021). Global Report on Internal Displacement. 

  4. Thomas, S. L., & Thomas, S. D. (2004). Displacement and health. British medical bulletin, 69(1), 115-127. 

  5. Davies, A. A., Basten, A., & Frattini, C. (2009). Migration: a social determinant of the health of migrants. Eurohealth, 16(1), 10-12. 

  6. Mensah, G. A., Mokdad, A. H., Posner, S. F., Reed, E., Simoes, E. J., Engelgau, M. M., & Group, V. P. i. N. D. W. (2005). When chronic conditions become acute: prevention and control of chronic diseases and adverse health outcomes during natural disasters. Preventing chronic disease, 2(Spec No). 

  7. Siriwardhana, C., & Stewart, R. (2013). Forced migration and mental health: prolonged internal displacement, return migration and resilience. International Health, 5(1), 19-23. 

  8. Wickramage, K., Gostin, L. O., Friedman, E., Prakongsai, P., Suphanchaimat, R., Hui, C., Duigan, P., Barragan, E., & Harper, D. R. (2018, Jun). Missing: Where Are the Migrants in Pandemic Influenza Preparedness Plans? Health Hum Rights, 20(1), 251-258. 

  9. Baum, A., Barnett, M. L., Wisnivesky, J., & Schwartz, M. D. (2019, Nov 1). Association Between a Temporary Reduction in Access to Health Care and Long-term Changes in Hypertension Control Among Veterans After a Natural Disaster. JAMA Netw Open, 2(11), e1915111. 

  10. Castañeda, H., Holmes, S. M., Madrigal, D. S., Young, M.-E. D., Beyeler, N., & Quesada, J. (2015). Immigration as a social determinant of health. Annual Review of Public Health, 36, 375-392. 

  11. Greenough, P. G., Lappi, M. D., Hsu, E. B., Fink, S., Hsieh, Y.-H., Vu, A., Heaton, C., & Kirsch, T. D. (2008). Burden of disease and health status among Hurricane Katrina–displaced persons in shelters: a population-based cluster sample. Annals of emergency medicine, 51(4), 426-432. 

  12. Jhung, M. A., Shehab, N., Rohr-Allegrini, C., Pollock, D. A., Sanchez, R., Guerra, F., & Jernigan, D. B. (2007). Chronic disease and disasters: medication demands of Hurricane Katrina evacuees. American Journal of Preventive Medicine, 33(3), 207-210. 

  13. Mills, J., Burton, N., Schmidt, N., Salinas, O., Hembling, J., Aran, A., Shedlin, M., & Kissinger, P. (2013, Jun). Sex and drug risk behavior pre- and post-emigration among Latino migrant men in post-Hurricane Katrina New Orleans. J Immigr Minor Health, 15(3), 606-613. 

  14. Self-Brown, S., Lai, B. S., Harbin, S., & Kelley, M. L. (2014, Dec). Maternal posttraumatic stress disorder symptom trajectories following Hurricane Katrina: An initial examination of the impact of maternal trajectories on the well-being of disaster-exposed youth. Int J Public Health, 59(6), 957-965. 

  15. United Nations. (2019). International Migrant Stock 2019. 

  16. Beaton, M. K., Cerovic, M. S., Galdamez, M., Hadzi-Vaskov, M., Loyola, F., Koczan, Z., Lissovolik, M. B., Martijn, M. J. K., & Ustyugova, M. Y. (2017). Migration and remittances in Latin America and the Caribbean: engines of growth and macroeconomic stabilizers? International Monetary Fund. 

  17. Ginnetti, J., Lavell, C., & Franck, T. (2015).* Disaster-related Displacement Risk: Measuring the Risk and Addressing its Drivers* (Disasters Climate Change and Displacement: Evidence for Action, Issue. N. R. C. Internal Displacement Monitoring Centre. 

  18. Alexander, M., Zagheni, E., & Polimis, K. (2019). The impact of Hurricane Maria on out-migration from Puerto Rico: Evidence from Facebook data. 

  19. Meléndez, E., & Hinojosa, J. (2017). Estimates of post-hurricane Maria exodus from Puerto Rico. Centro Voices

  20. Rayer, S. (2018). Estimating the migration of Puerto Ricans to Florida using flight passenger data. Retrieved 3/7/2021, from 

  21. Santos-Lozada, A. R. (2018). Estimates of excess passenger traffic in Puerto Rico following Hurricane María. 

  22. Dijstelbloem, H. (2017). Migration tracking is a mess. Nature, 543(7643), 32-34. 

  23. Nature. (2017, Mar 1). Data on movements of refugees and migrants are flawed. Nature, 543(7643), 5-6. 

  24. Laczko, F. (2016). Improving Data on International Migration and Development: Towards a global action plan. Improving Data on International Migration-towards Agenda, 2030, 1-12. 

  25. Ferguson, J. (2003). Migration in the Caribbean: Haiti, the Dominican Republic and Beyond. M. R. G. International. 

  26. Ross, C. (2014). Haitian Illegal Immigration Through Puerto Rico is Skyrocketing. Yahoo! Finance. Retrieved September 24, 2020 from 

  27. Agency, T. U. M. (2017). Migration In The Caribbean: Current Trends, Opportunities And Challenges. N. A. a. t. C. International Organization for Migration - Regional Office for Central America. 

  28. Verhulst, S. G., & Young, A. (2019). The potential and practice of data collaboratives for migration. In Guide to Mobile Data Analytics in Refugee Scenarios (pp. 465-476). Springer. 

  29. Hawelka, B., Sitko, I., Beinat, E., Sobolevsky, S., Kazakopoulos, P., & Ratti, C. (2014). Geo-located Twitter as proxy for global mobility patterns. Cartography and Geographic Information Science, 41(3), 260-271. 

  30. Messias, J., Benevenuto, F., Weber, I., & Zagheni, E. (2016). From migration corridors to clusters: The value of Google+ data for migration studies. 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 

  31. Rodriguez, M., Helbing, D., & Zagheni, E. (2014). Migration of professionals to the U.S. International Conference on Social Informatics 

  32. State, B., Weber, I., & Zagheni, E. (2013). Studying inter-national mobility through IP geolocation. Proceedings of the sixth ACM international conference on Web search and data mining, 

  33. Zagheni, E., Garimella, V. R. K., Weber, I., & State, B. (2014). Inferring international and internal migration patterns from twitter data. Proceedings of the 23rd International Conference on World Wide Web 

  34. Zagheni, E., Weber, I., & Gummadi, K. (2017). Leveraging facebook's advertising platform to monitor stocks of migrants. Population and Development Review, 721-734. 

  35. Kikas, R., Dumas, M., & Saabas, A. (2015). Explaining international migration in the skype network: The role of social network features. Proceedings of the 1st ACM Workshop on Social Media World Sensors 

  36. Connor, P. (2017). The Digital Footprint of Europe's Refugees: Online Searches in 2015 and 2016 Open Window Into Path, Timing of Migrant Flows from Middle East to Europe. Pew Research Center. 

  37. Lin, A. Y., Cranshaw, J., & Counts, S. (2019). Forecasting US Domestic Migration Using Internet Search Queries. The World Wide Web Conference 

  38. Pulse, U. G. (2014). Estimating migration flows using online search data. Global Pulse Project Series. Retrieved 12/12/2020, from 

  39. Vicéns-Feliberty, M. A., & Ricketts, C. F. (2016). An analysis of Puerto Rican interest to migrate to the United States using Google Trends. The Journal of Developing Areas, 50(2), 411-430. 

  40. Righi, A. (2019). Assessing migration through social media: a review. Mathematical Population Studies, 26(2), 80-91. 

  41. Abramitzky, R., Boustan, L. P., & Eriksson, K. (2016). *Cultural assimilation during the age of mass migration (0898-2937). 

  42. Bates, J., & Komito, L. (2012).* Migration, community and social media. Transnationalism in the global city*, 6, 97-112. 

  43. Dekker, R., & Engbersen, G. (2014). How social media transform migrant networks and facilitate migration. Global Networks, 14(4), 401-418. 

  44. Diminescu, D. (2008). The connected migrant: an epistemological manifesto. Social Science Information, 47(4), 565-579. 

  45. Vaca-Ruiz, C., Quercia, D., Aiello, L. M., & Fraternali, P. (2013). Tracking human migration from online attention. International Workshop on Citizen in Sensor Networks 

  46. Masoero, A. (2016). Hurricane Matthew Causes Deaths in Haiti and Dominican Republic. Retrieved November 7th, 2020 from 

  47. Davies, R. (2016). Dominican Republic – Over 20,000 People Displaced by Floods. Retrieved November 7, 2020 from 

  48. GitHub. (2021). Twint Project. Retrieved from 

  49. Wisdom, V., & Gupta, R. (2016). An introduction to twitter data analysis in python. Artigence Inc. 

  50. ExposionAI GmbH. (2021). spaCy: Industrial-Strength Natural Language Processing. Retrieved from 

  51. Johansen, B. (2019). Named-entity recognition for Norwegian. Proceedings of the 22nd Nordic Conference on Computational Linguistics 

  52. NLTK Project. (2021). Natural Language Toolkit. Retrieved from 

  53. The Stanford Natural Language Processing Group. (2020). Stanford Log-linear Part-Of-Speech Tagger. Retrieved from 

  54. Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. Paper presented at the Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 

  55. Zhang, K., Xu, H., Tang, J., & Li, J. (2006). Keyword Extraction Using Support Vector Machine. International Conference on Web-Age Information Management 

  56. Hakim, A. A., Erwin, A., Eng, K. I., Galinium, M., & Muliady, W. (2014). Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach. 2014 6th international conference on information technology and electrical engineering (ICITEE), 

  57. Hogue, J., & Dewilde, B. (2020). pytrends: Pseudo API for Google Trends. Retrieved from 

  58. Cirillo, P., & Gallegati, M. (2012). The empirical validation of an agent-based model. Eastern Economic Journal, 38(4), 525-547. 

  59. Wild, V., & Dawson, A. (2018). Migration: a core public health ethics issue. Public health, 158, 66-70. 

Suggested Citation:

Arrieta, A., Chen, S., Sarmiento, J., & Olson, R., . (2021). Real-Time Migration Tracking to Puerto Rico after Natural Hazard Events (Natural Hazards Center Public Health Disaster Research Report Series, Report 4). Natural Hazards Center, University of Colorado Boulder.

Arrieta, A., Chen, S., Sarmiento, J., & Olson, R., . (2021). Real-Time Migration Tracking to Puerto Rico after Natural Hazard Events (Natural Hazards Center Public Health Disaster Research Report Series, Report 4). Natural Hazards Center, University of Colorado Boulder.