Cedars, April 2019

by Breanna Beers and Kassie Kirsch T here’s a saying in the technology in- dustry: if you’re not paying for the product, you are the product. Data collection is the process companies use to track and record your online footprint, and data mining is the practice of examining large databases of that collection in order to generate new information. The internet is the economic “land of the free” — or so the hundreds of thousands of companies of- fering digital services at no direct charge to the consumer would have the world believe. These companies offer a wide range of utili- ties, from organizing an inbox to summoning transportation, and in the digital age, most consumers expect these services for free. For such companies to exist, they must find a way to monetize their services with- out charging users directly. Increasingly, many corporations are turning to user data as a source of income. This data can be auc- tioned to companies gathering information about their customer base, sold to social scientists in chunks for research, or put into an advertising algorithm to increase ad rel- evance and click-through rates. As a result, nearly every service you use and website you visit collects some amount of data about who you are and why you are there. This happens with or without user knowledge, through cookies and browsing history and details you sign away in the pri- vacy policy they know you didn’t read. The information users deliberately en- ter when signing up for a new account, such as name, phone number, and email address, is extremely valuable to many corporations. The data on any person is staggering: name, birth date, sex, address, location history, workplace, school, friends, family members, hobbies, shopping habits, polit- ical leanings, media preferences, browsing records and even medical history. Facebook alone is often a focal point where much of this information gathers. Think of the impli- cations if this information were to be com- bined with the data from Google, or Ama- zon, or Uber or the small, innocuous web pages you bounce between dozens, maybe hundreds of times daily. Today, tracking is the assumption, not the exception. These companies aren’t collecting data without a purpose. They’re doing it because of real economic incentives. If you want to continue to use Gmail, Google Drive and You- Tube for free, you have to ac- cept that Alphabet (Google’s parent company) has to pay the bills. Clearly, looking at the success of these companies, this model works. So where does that data go? Most of the time it feeds advertising algorithms to serve up user-specific product recom- mendations that are more likely to result in a purchase. As users are now inundated with hun- dreds or even thousands of ads per day, click-through rates are becoming an increasingly vital statistic. Market researchers have been strategizing for de- cades about the best techniques to capture your time and atten- tion. In recent years, big data has become one of their most valuable tools. In some ways, this is no different than a sales associate recommending a product to a customer who walks in the store. Howev- er, others argue that a difference in quantity is a difference in kind. The amount of data available to advertisers is far greater than what a sales associate can tell just by looking at someone and could lead to exploitative or discriminatory marketing techniques. The controversy is heightened because the vast majority of online ads run through Google’s AdSense platform, meaning that targeted ad data gets consolidated in the hands of one company. This is an advantage because it means information doesn’t have to be distributed among all the companies that may want to advertise to an person. Howev- er, it also means a lot of data is clustered in one place, which may be a security risk. In some cases, data may pass from the collecting company to a third party. These databases are myriad in function: Some are used for scientific analysis, others for mar- ket research, and sometimes, bad actors may use them for malicious purposes. Fortunate- ly, however, most companies have incentives to keep user data to themselves. It is, after all, their biggest advantage. “We have pipes of informa- tion that are constantly stream- ing past us, and whoever is going to learn how to take advantage of that is going to have the op- portunity to do some things that others are going to miss out on,” said physics professor Dr. Steve Gollmer, who leads data science seminars at Cedarville in an ef- fort to get more students inter- ested in data mining, the process of discovering patterns in large data sets. In many cases, companies can use the data they collect to make their product better. Goo- gleMaps, for instance, uses locationdata from its active users to determine real-time traffic flow. Captchas, those little pictures you have to select to prove you’re not a robot, train arti- ficial intelligence programs to identify objects in pictures. Been identifying a lot of pictures to do with roads and cars lately? Think about why. (Hint: A lot of technology companies are interested in autonomous vehicles.) Data mining can be used not only to im- prove a product, but to improve the world. As more data is available and algorithms improve at sorting through it, data science has become a significant resource for the scientific community in particular. According to Gollmer, computer algo- rithms of the past mined through data to model how human experts made decisions. For instance, to predict the weather, pro- grammers interviewed respected meteorolo- gists on what factors may indicate a coming rainstorm. That informationwas then be built into the algorithm, telling the computer to look at the same factors to make a prediction. However, while useful for simple pre- dictions under constrained conditions, these “expert systems” struggled to correctly identify outliers to the patterns they’d been prescribed. Whenever a situation didn’t fit the model, the system failed. When the amount of data available ex- ploded because of the rise of the internet, pro- grammers experimented with a technique. Instead of inputting the knowledge of experts, analysts let the algorithms search through the data to find associations on their own. For instance, a program could examine histori- cal weather data and atmospheric conditions to make a prediction. The programmer who wrote the algorithm may not know what fac- tors the computer looks at to make that de- cision, but in many cases, the results can be even more accurate than human experts. This development took the vast quan- tities of data available and made them leg- ible. Some associations were obvious. Oth- ers were new and previously unpredicted, discovered only by a computer powerful enough to comb through terabytes of seem- ingly random strings of information that are the product of millions of human choices. These algorithms are the translators that make the collected data useful to humans looking to predict behavior, recommend products, describe correlations, or even di- agnose diseases. Data mining could also potentially prevent misdiagnosis by doctors result- ing in improper drug administration. One 2017 study compiled data from patients using nonsteroidal antiinflammatory drugs (NSAIDs) and gastroprotective agents, which require a particular clinical guideline for co-prescription. According to the study, April 2019 6 OFF CAMPUS Yes, Big Brother Has Your Information The diamonds and the rough of data mining These companies aren’t collecting data without a purpose. They’re doing it because of real economic incentives.

RkJQdWJsaXNoZXIy MTM4ODY=