Loading Events

CITP Reading Group: Works in Progress (WiP) – Ryan Amos – Consumer Protection on the Web with Longitudinal Web Crawls and Analysis

Wednesday, February 23, 2022
11:00 am - 12:00 pm


306 Sherrerd Hall

The world wide web has brought with it new consumer protection hazards, such as deceptive reviews and online tracking. While many academics have studied consumer protection on the web at specific points in time, we approach this problem from a longitudinal perspective, exploring how consumers’ rights to privacy and to be informed have been impacted by the web. Our work highlights the key role in study of consumer protection issues played by longitudinal analyses and longitudinal data collection — data collected over repeated, time-spaced passes.

We investigate consumer protection issues on the web through longitudinal studies in two landscapes: website privacy policies and reviews on Yelp. We approach both problems by collecting data with automated, repeated visits to the websites of interest to collect large scale datasets. In our study of privacy policies, we aggregate Internet Archive’s crawls to perform longitudinal collection, and in our online reviews study, we crawl the data ourselves. We collected 1M privacy policies spanning 22 years and 12.5M reviews over 11 months.

We used our data to study the evolution of privacy policies raising concerns with rights to privacy and information. We find gaps in disclosure of privacy-related practices. We show declining readability over the long term, doubling in length and becoming more complex. We show disparities in website-reported and independently-observed tracking. In our study of online reviews we raise concerns with the right to be informed. We present the first study of “reclassification,” wherein a platform changes its filtering decision for a review. We find that reviews routinely move between Yelp’s two main classifier classes (“Recommended” and “Not Recommended”), up to five reclassifications on a single review. We identify demographic disparities in review prevalence and filtering decisions.

By showing phenomena that cannot be studied without longitudinal data collection and analysis, we emphasize the importance of longitudinal study for consumer protection issues online. Our work helps lay the groundwork for future work on these issues through our software and data releases, easing the pathway for future researchers.

This reading group is open to any Princeton affiliate, including faculty, staff and undergraduate and graduate students. To receive the Zoom link to join the WiP Seminar please contact Ben Kaiser at