Google’s search history data can be used to target ads, personalize websites and now, according to UCLA researchers, track the spread of sexually transmitted diseases.
Sean Young, an associate professor at UCLA and executive director of the University of California Institute for Prediction Technology, collaborated with researchers from the Centers for Disease Control and Prevention to develop a new way to monitor the spread of syphilis using search data from Google Trends.
The prevalence of syphilis, a sexually transmitted disease that has been linked to an increased risk of contracting HIV, has risen 18 percent between 2015 and 2016, a CDC spokesperson said.
The researchers used machine learning, a use of algorithms that allows computers to learn automatically, to find keywords in past Google searches that might indicate risky sexual behaviors, such as “STD help,” “how to find sex right now” and “sex without a condom.” Young said this study was based on the prediction that people at risk for syphilis would turn to Google for health- and risk-related answers.
Using the locations of the searches and complex computer algorithms, the researchers were able to predict the locations of syphilis outbreaks with a high level of accuracy when compared to CDC data from that year.
The CDC currently gathers its data on syphilis from doctors reporting individual cases and large government surveys. Young said both of these methods are time-consuming and expensive.
“Going around and getting people’s surveys, aggregating the data and then analyzing the data could take years,” he said.
Because of incomplete diagnoses and reporting, the number of reported cases of syphilis is lower than the actual amount of syphilis cases in the United States, a CDC spokesperson said.
Alternatively, Google’s data is free and provided in real time, which allows researchers to gather information about syphilis more efficiently, Young said. The CDC primarily focuses on monitoring present cases of syphilis; but with advanced machine learning, they could also focus on predicting future cases of syphilis, he added.
Young has used a similar analysis of big data to monitor and predict other sexually transmitted diseases, like HIV, as well as other public health trends, such as opioid addiction.
Young said it may take the CDC a couple of years to start using the technology because improvements still need to be made on the prediction model to ensure higher accuracy.
“The potential of this technology is really exciting, but like any other model, it makes mistakes,” he said.
However, countries that don’t have an organization like the CDC can immediately start using this technology to track the spread of sexually transmitted diseases, he said.
“In countries that have no way of knowing how many cases (of disease) are occuring, this technology could be a first step in monitoring disease,” Young said.