T&S Certificate – IT Track

Certificate Student Independent Work

2023 Certificate Graduates Independent Work

Ayers, Christien, MUS
Title of Project: The Effect of the Internet on Music Consumption Patterns

The Internet, more than 30 years after its public release, is generally understood to have played a significant role in changing the way global society now organizes itself. Thinking about the ways in which people today communicate, learn, spend their time, relate, and consume, the Internet has proved pervasive in each instance. My independent work analyzes how society’s relationship to music consumption has been impacted by the Internet. Through a survey of a variety of sociological literature diving into older music consumption patterns, I account for the ways in which society consumed music before the Internet was used as a tool for consuming and sharing music. Then, I consider Napster and its effect on the music industry in order to understand the way in which the Internet radically altered consumers’ relationship to music. From there, I use Napster as a case study to draw conclusions on consumers’ relationship with music in the 21st century vis à vis the Internet, providing insight into patterns that have remained the same and those that have changed.

Chloe Chen, COS 
Title of Project: Increasingly Ambiguous Privacy Policies: Concerns and Regulations

Privacy policies are a critical and ever-evolving tool for understanding how companies collect, use, and store data. However, while many regulations exist in various jurisdictions for ensuring privacy policies contain necessary content, there is virtually no regulation regarding the nature of the language that is used in privacy policies. In this paper, I examine previous research on the negative effect that ambiguous language in privacy policies can have on user understanding of privacy risk. I then analyze the ambiguity of language in existing privacy policies using Python, and find that privacy policies have evolved to include proportionally more ambiguous language over time. Finally, I assess the indirect impact that the General Data Protection Regulation (GDPR) may have had on the ambiguity of privacy policy language, and explore other ways that the language of privacy policies can be quantified and regulated.

Elizabeth Dorman, COS
Title of Project: Dream Garden: Exploring Location-Based, Collaboratively-Created Augmented Reality Spaces

Despite the potential for connecting strangers in the digital realm, current research has not explored location-based augmented reality experiences that enable strangers to connect by building artifacts collaboratively. In my independent work I created Dream Garden, an augmented reality application (AR) that lets people place 3D flowers into the physical world to build a collaborative location-based garden. Anyone with the app can access and see the flowers previously planted by strangers, as well as plant their own flowers to grow the garden. I evaluated this app with 10 participants, with 5 visiting the digital garden more than once, to evaluate their sense of connection to the other participants as each participant added one digital flower to the garden. I found that participants were joyful about building a shared space, were excited about the dynamic nature of the garden, felt a connection to the physical location of the digital garden, and expressed a sense of belonging to a community of strangers but not necessarily an emotional connection. Significantly, this research gives us insights on how we can use augmented reality as a tool to bring people together in real life and fostering senses of connection, creating technologies that bring us together instead of driving us apart.

Jaelin Haynes, COS
Title of Project: The Black Purposes of Web3: A Technical and Societal Analysis

Web3 is a system of applications built on the blockchain, an immutable, distributed ledger of blocks linked together with cryptographic hashes. This emerging technology is being promoted by some as a solution to the centralization and user privacy issues of the current web. Given the pattern of technologies becoming mainstream and having negative effects, especially on marginalized groups, it is important to consider the potential societal implications of this new technology before it is widely adopted. With background on the previous iterations of the web, Web3’s value proposition, and its technical details, I focused on African Americans in particular, exploring the history of Black Americans and technology and drawing on that history to investigate the potential uses and impacts of Web3. Specifically, I found themes of the digital divide, surveillance and privacy, coded racial bias, and technological creativity, and completed a data analysis and visualization of African American sentiment towards new technologies using Python libraries. The data analysis showed that Black participants indicated generally negative attitudes towards new technologies compared to their White counterparts. Even while being informed about scientific and technological innovations, African Americans showed unfavorable attitudes, potentially contradicting the idea that lower levels of use and access are results of less education. Based on the persistence of past themes, Web3 may not be the ultimate solution to the issues of Web 2.0 as some may claim, especially for African Americans, but there are still some valuable and beneficial principles in its development.

Rahul Jain, ORFE
Title of Project: An Examination of the Effect of Mobile Fintech Adoption on Microfinance Institutions in India

Microfinance Institutions, in India especially, have matured significantly over the past decade and a half, with increased government support, buy-in from private institutions, and the adoption of data analysis and mobile financial technologies. In this independent work, I first begin by discussing the history and regulations of MFIs in India, breaking it into four phases: origination, maturation, buy-in from private investors, and adoption of technologies. Following this overview, I perform exploratory data analysis for select MFI operating and loan repayment metrics, finding a consistent increase in loans per loan officer but no large change in operating expense / loan portfolio and loan repayment metrics (loan loss rate and 30-day and 90-day portfolio at risk) over time. With an understanding of the data distributions, I run a regression and p-value statistical testing for those metrics with the rural wireless teledensity (as a proxy for mobile phone and internet access) to see the effect of mobile fintech adoption. I conclude that operations have improved, with the regression on loans per loan officer having a statistically significant result with a P-value of 0.0015 for the coefficient around 4 as well as an R-squared value of 0.74. Regarding loan repayment metrics, I could not reject the null hypothesis as no significant linear relationship was found, likely due to too many confounding factors. Lastly, I use the overview of MFIs to date in conjunction with the quantitative analysis to provide recommendations for further MFI development, reiterating the value of strong government and partner organizations relationships, adoption of novel technologies, internal governance, and quantification of performance metrics.

Rohan Jinturkar, COS (completed IW as a junior spring 2022)
Title of Project: Investigating Racial Bias Trends in the Text of US Legal Opinions

There are many instances of racially biased outcomes in the American legal system. However, it is unclear if such bias also exists in the text of judge opinions, and if it varies across time periods and regions. We approximate GloVe word embeddings for legal opinions at the federal and state level from 1860 to 2019. We find evidence of racial bias across nearly all regions and time periods, as traditionally Black names are more closely associated with negative/unpleasant terms whereas traditionally White names are more closely associated with positive/pleasant terms. We do not find evidence that older opinions exhibit more bias, or that opinions from Northeastern states show greater change in racial bias over time compared to Southern states. These results counter the principle of impartiality in legal settings and demonstrate the need for further research into institutionalized racial bias. Lastly, we survey approaches for reducing bias across the legal system.

Hannah Kapoor, SPIA
Title of Project: 18 Words to Reform the Internet | An Evaluation of the Governance of Evolving Technologies: The Case of Section 230 of the Communications Decency Act

What can be learned from debates about reforms surrounding Section 230 of the Communications Decency Act (CDA) to inspire the future governance of evolving technologies that challenge fundamental rights?  Passed in 1996, Section 230 grants internet service providers immunity from certain types of liability for user-generated content; the law has come under fire in the evolving communications landscape.  Policy attempts to reform Section 230, therefore, represent a compelling case study to inform how policymakers craft legislation for technologies in an evolving landscape. Through a comparative analysis of stakeholder interests from the onset of the deployment of online content platforms to their more matured state, the research travels from 1959 to the present day, and reviews thousands of pages of historical accounts, Congressional testimony, and court briefs, to build comparative stakeholder analyses presenting the underlying interests, motivations, and arguments that have framed the Section 230 debate across time and sector

Despite the widespread criticism of Section 230, the research finds that the core stakeholder motivations remain unchanged across time: to mitigate speech harms and uphold freedom of speech online. However, it is also observed, as technology has evolved, that stakeholders increasingly approach reform on the basis of profit, partisanship, and reactive responses to specific instances of social harm. The research reveals the perils of limited stakeholder engagement across sectors in the curation of legislation and the public pressure endured by policymakers that encourages them to react in response to specific harms. In a techno-moral context, policymakers are advised to consider lawmaking as an instrument of governance that promotes and protects foundational rights, such as freedom of speech. A “Precautionary Agile” approach to the governance of evolving technologies is recommended, coalescing “ex-ante” and “prohibitive” approaches of governance.

Henry Koffler, ORFE
Title of Project: A Pricing Analysis of European Cap & Trade Carbon Futures: Trading Strategies and Implications

The global response to changing climate conditions has overwhelmingly been in favor of strong regulation. However, the European Union, in their implementation of their Emissions Trading System, has demonstrated that there is an alternative where economic progress is not only fundamentally compatible with, but necessarily commands environmental protection. In this cap & trade system, certain corporations are required by law to purchase rights to emit CO2 to offset their emissions. That said, many are arguing that financial speculation makes pricing nigh impossible for corporations who are unable to opt out. As such, my paper seeks to evaluate the veracity of the claim that the European Union carbon credit prices are significantly driven by market speculation and not accurately predicted. To accomplish this goal, my independent work blends traditional financial modeling approaches such as Least Absolute Shrinkage and Selection Operator regression, the division of historical prices into market regimes using a Gaussian Mixture Model, and constructs a Long Short-Term Memory network to accurately price carbon credits. After showing that carbon credit prices are in fact readily modeled with precision, my research concludes that carbon prices are indeed strongly correlated to a slew of world states (such as weather) as well as commodities (such as oil or coal) and that market speculation is not a significant factor. Having established a methodology to comprehensively price carbon credits, my independent work provides policy recommendations to increase adoption of  Thailand’s new carbon credit market, especially amongst the rural farmers who will be its primary users. Additionally, my paper reflects on the ethical questions surrounding environmental investing and financial speculation. Principally, why financial speculation is notably distinct from gambling and, in fact, provides strong benefits to the carbon credit market.

Colton Loftus, COS (completed IW as a junior spring 2022)
Title of Project: Analyzing Experiences with Speech Controlled Accessibility Software and Developing a Solution for the Linux Desktop

For individuals with disabilities affecting the use of their hands, typing and using a mouse can be not only inconvenient, but also painful. This problem is especially prevalent within the software ecosystem of the open source operating system, Linux. While Windows and MacOS both have proprietary disability software for controlling the computer through voice, Linux users do not have access to these same proprietary solutions. In my independent work, I developed a voice controlled accessibility program that can help solve this issue. My program can be used for a wide array of actions across the Linux desktop. It can control windows, press keys, dictate text, and much more. It can also be customized or run scripts from the user to perform new behavior. While developing my program, I also wanted to better understand how to design and implement policies pertaining to accessibility software. To do this I held a series of software demos with my program.  By the end of my research, I developed a series of key takeaways: 1) While accessibility designers don’t always have the resources to train their own models, they can nonetheless design applications with modular and customizable behavior. In my application, users can interact with the machine learning backend, switch models, and customize command names. All of these choices push back on the idea that the voice recognition backend should be a black box for users. (2) Graphical user interface (GUI) programs should limit mouse usage and prefer keyboard shortcuts when possible. Throughout my software demo, it was particularly difficult for users to mimic commands like dragging or dropping through just voice. However, keyboard shortcuts were quick and easy for both voice and hands to perform. (3)  Workspace designers need to take into consideration those with alternative input methods. Throughout the development process, it was difficult to find public places quiet enough to use voice controlled software while not disturbing others. In addition to my open source code, these conclusions from my policy research will help future designers  create more accessible workspaces, communities, and software applications.

Katie McLaughlin, COS
Title of Project: A Systems Approach to Mitigating Harms of Content Recommender Systems

Content recommender systems play a critical role in the dissemination and discussion of information. Therefore, it is crucial that there are structures and processes to support the development of safe, trustworthy recommender systems. To responsibly deploy these models, Machine Learning (ML) practitioners must go out of their way to navigate a complex ecosystem of regulation, organizational structures, and resources. In this paper, I take an approach rooted in systems theory to detail a holistic overview of the content recommendation ecosystem and the challenges that arise from its structure. Through a survey and semi-structured interviews, my research highlights how these issues stem from the absence of a shared language around harms. Furthermore, practitioners lack a framework to evaluate the ethical and societal implications of their recommender systems. I detail how the current approach to identifying and mitigating harms does not adequately prevent these harms. I also identify challenges that arise from the current organizational arrangements of knowledge and expertise. Based on these findings, I recommend how we can use oversight mechanisms to establish a standard for harm mitigation approaches. I further suggest we modify the existing ecosystem to distribute responsibility and equip practitioners to confidently deploy safe content recommender systems.

William Olson, COS
Title of Project: Faces at Face Value: An Analysis of Face Recognition Technology Policy and Performance

Facial recognition is one of the best developed and widely used applications of machine learning and examples of artificial intelligence in 2023. The development of the technology and the ubiquity of high-quality video recording devices like traffic cameras, surveillance cameras, and police body cameras enables the permeation of this technology throughout all spheres of life. Such technology invites the fear of constant surveillance and the decline in individual privacy, particularly in public areas. This study aimed to comprehensively gauge the current policies surrounding the use of face recognition in the United State. Specifically, I embarked on a two-part policy analysis. The first component examined the use of face recognition by law enforcement at the federal, state and local levels. I found impactful regulation at the state level, the most effective of which I deemed to be moratoriums on the technology for this use case until further improvements are made to its accuracy. The second component examined the regulations around consumer data privacy which govern the collection, use, and distribution of our face images by public and private entities. I found the most impactful regulation at the state level, and propose that all regulation should be modeled on the Illinois Biometric Information Privacy Act; this legislation grants consumers the ability to seek monetary compensation when their face images are misappropriated. The second component of this project approached the problem of age-invariant cross-demographic face recognition; that is, matching current photos against outdated ones and examining accuracy rates across demographic groups. I assembled a novel dataset of 167k face images and tested using Amazon Rekognition, a leading commercial provider of face recognition. I found that this system performs worse on minority groups, in particular, darker skinned individuals of both sexes. I concluded that further development is necessary for both public policy and technological implementation of face recognition.

Hien Pham, COS (completed IW as a junior spring 2022)
Title of Project: Community Mesh Networks: A Local Solution to the Digital Divide

In a post-Covid-19 world, U.S. states face an unprecedented opportunity to secure funding for broadband infrastructure development. While the national conversation is focused on geographic scale and speed, the long-term resilience and community aspect of broadband should not be overlooked. Community mesh networks (CMNs) are community-owned and operated computer networks that provide affordable or free Internet access to local residents. A community-owned and operated network infrastructure not only helps tackle the broadband issue, but also provides essential development in digital literacy, civic engagement, and emergency preparedness for communities it serves. My paper explores how communities have deployed mesh networks and explains why community networks are a critical piece of the solution to the digital divide puzzle and should be an essential element in states’ broadband development plans. To do so, I focused on two CMNs at different development stages in the US north-east region. I traveled to NYC and Philadelphia to visit public sites in the networks and conducted interviews with volunteer network operators to learn about their experiences. I concluded with specific actions that policymakers can take to empower CMNs and communities they serve, informed by the visits and interviews. Some proposed actions to support CMNs at various stages include making public backhaul accessible to CMNs through an application process, implementing application pipelines for funding and partnerships between public departments and CMNs, and connecting CMNs to public and community organizations that can help provide volunteers or technical consulting.

Richard Qui, ECO
Title of Project: Airbnb’s Alarming Aftermath: An Analysis of Airbnb’s Effect on San Francisco’s Rent and Housing Market

San Francisco is facing a rent and housing crisis due to a lack of housing units available for its residents. The entry of Airbnb, a hospitality technology company, into this city, has threatened to exacerbate this effect, taking away long-term rent and housing units towards short-term rentals and possibly leading to increases in rent and housing prices in one of the most expensive cities to live nationally in the United States. In particular, San Francisco Ordinance 218-14 legalized short-term rentals in the city of San Francisco, allowing many owners to rent out their primary units to tourists and short-term residents under numerous restrictions. This paper details how SF Ordinance 218-14 affected San Francisco’s rent and house prices, along with the overall trend of Airbnb’s effect on the city. Furthermore, this paper experiments with a form of “regulation” by determining whether limiting Airbnb rentals to entire room/house units or private/shared room units is a viable strategy to further reduce Airbnb’s impact while still allowing Airbnb to run its operations and benefit tourism in the city. I conclude that Airbnb has a statistical significance on increasing rent and house prices in San Francisco short term and long term, along with recommending that only allowing private/shared room units is a possible strategy to regulate Airbnb’s impact on San Francisco. This paper also summarizes the impact of a technology company like Airbnb in light of the sharing economy on society concerning rent and housing units, along with how government regulations should work directly with technology companies to enact these laws and minimize their impact on society, as the government itself may not be powerful enough today to enforce its own rules since Ordinance 218-14 had many flaws in its enactment.

Katelyn Rodrigues, COS
Title of Project: The Last Thing We Forget: Applying Natural Language Processing to Decode Memories Evoked by Modern Music

With the pinnacle of the digital era upon us, the widespread accessibility of streaming platforms has disrupted the music industry. Now, more than ever, music has the potential to fully embrace its role in uniting all listeners in experiencing an emotionally transformative force regardless of their geographical location, demographic background, language, or culture. Through the lens of online music streaming platform comment forums, my independent work analyzes the potential for modern music genres to evoke memories and shared experiences. Inspired by a research study conducted in the Princeton University Music Cognition Lab where participants recorded their music-evoked autobiographical memories (MEAMs), my research specifically explored the research goal in the context of YouTube music video comment threads. After rigorous processing and sanitizing of this data, a series of Natural Language Processing (NLP) techniques surfaced thematic elements within listeners comments on modern music. Beginning with an introductory TF-IDF analysis and then migrating to more complex techniques like PCA dimension reduction, LDA topics analysis, and visualizing cosine similarity between the YouTube comments and their corresponding song lyrics, the results yielded fascinating themes with shared memories at both the song and genre levels. While applying NLP techniques to original data consisting of unconstrained, freely available responses is relatively uncharted territory in the digital music space, the findings were definitive and provide a basis for further analysis of this multi-dimensional data that can aid music video creators in developing engaging content that can unite listeners globally.

Iroha Shirai, COS (completed IW as a junior spring 2022)
Title of Project: Analyzing Gender Biases of STEM-Related Keywords Within United States and Japanese Twitter Posts

Though there has been increasing discussion surrounding the low representation of females in STEM (Science, Technology, Engineering, and Mathematics) fields and thus more movements to increase female representation in STEM fields, the number of females in STEM fields remains relatively low. Furthermore, there continues to be a difference in these representations among different countries. In my work, I focus specifically on the United States and Japan, looking at how male- and female-related keywords are used in context with STEM-related keywords within US and Japanese Twitter posts. I collected and created my own datasets and trained Gensim’s Word2Vec models to create word embeddings. Then, cosine similarities are calculated between gender-related and STEM-related words in the word embeddings. The analyzed results showed that the calculated difference between male- and female-related average cosine similarities was greater in the US than in Japan for the 2019 – 2020 Twitter post. In contrast, this difference was greater in Japan than in the US for the 2009 – 2010 Twitter posts. The results also suggested that the difference of these calculated differences between the US and Japan was larger in 2019 – 2020 than in 2009 – 2010.

Niva Sivakumar, COS (completed IW as a junior spring 2022)
Title of Project: Understanding and Detecting Hateful Users on Twitter via Graph Theory and Machine Learning

From July to December 2020, 3.8 million tweets were removed by Twitter for falling under the category of hate speech, and 1.1 million users were flagged and suspended as hateful accounts. Recent work has typically depended on the tweets themselves, often using linguistic contexts and vocabulary. However, there’s often a need to go beyond pure textual classification, potentially integrating features about users and social groups in addition to the content of the tweets. Using a publicly available dataset of Twitter users, this paper seeks to answer three questions: (1) What insight does the distribution of hateful users in the top 100/200 influential users of the retweet graph M. Ribeiro et al provide? (2) How likely are hateful users to retweet other hateful users, and normal users to retweet other normal users? (3) Can we train a preliminary model to identify hateful users based on a limited number of numerical features, none of those having to do with the actual content of their tweets? Using the PageRank algorithm, we find that hateful users seem to be just as influential as normal ones. Using the reciprocity of vertex-induced subgraphs, hateful users are almost twice as likely to retweet each other as normal users. After training a neural network on the annotated users over 10 numerical features, we were able to reach 92.3% accuracy in classifying a user as “hateful” or “normal” without looking at their tweets at all. Our findings indicate that user-based classification of hateful speech on social media is effective and could strengthen or corroborate text-based classification.

Anna Sivaraj, COS (completed IW as a junior spring 2022)
Title of Project: Content Warnings on Social Media: An Evaluation of Instagram’s Sensitive Content Screen

As social media platforms become more ingrained in society, it is critical to mitigate the potentially negative impacts of sensitive content on users’ mental health. In order to help users avoid offensive or disturbing content, Instagram places a content warning, the Sensitive Content Screen (SCS), over posts with sensitive content. However, since its introduction in 2017, the SCS continues to appear on Instagram without significant changes to the original design and function. This independent work investigates the effectiveness of the SCS. By conducting a historical case study of health warning labels on cigarette packages, I derive lessons from health warning labels on cigarette packages and extend them to propose improvements to the SCS. Furthermore, I survey college students to gather their impressions about the current and the proposed SCS. Drawing from the case study and survey results, I suggest recommendations for how Meta Platforms, Instagram’s parent company, might improve the existing SCS and better protect vulnerable populations. In summary, I recommend that Meta Platforms (a) strengthen and modify the existing SCS, (b) add text to the warning message with the categories of sensitive content that the post may fall into, and (c) implement a larger and more comprehensive strategy aimed at protecting vulnerable users from the harmful impact of exposure to potentially sensitive content on Instagram.

Morgan Teman, COS
Title of Project: You Can’t Hack Democracy: Preventing Foreign Election Interference Using the 2017 French Presidential Election as a Case Study

The 2017 French Presidential election provides a fascinating case of attempted foreign influence through a disinformation and hacking campaign that ultimately did not discourage the public from electing the targeted candidate. No singular entity is entirely responsible for thwarting the attack; a combination of circumstances and efforts coincided to delegitimize what became known as #MacronLeaks, when 15 gigabytes of data were stolen from Emmanuel Macron’s campaign team’s servers and shared on the Internet. My independent work analyzes the effects of multiple involved parties’ defensive actions–including cyber-blurring, multi-level and encrypted communication, cybersecurity training, public transparency, and fact-checking, among others–and recommends a joint strategy to prevent such an event from recurring, including an offensive cybersecurity push by the government, secure servers for campaigns, and proactive bot investigations by social networks.

2022 Certificate Graduates Independent Work

Jeremy Bernius, SPI
Title of Project: Algorithmic (In)Justice: The Bias and Unfairness of Risk Assessment Instruments Used in Sentencing

In the age of Big Data, all areas of life rely on data analytics, algorithms, and artificial intelligence to make essential decisions and to facilitate normal operations; the criminal justice system is no different in this regard. Algorithmic risk assessment instruments (RAIs) inform nearly every stage of the criminal justice process, such as pretrial detention, corrections, probation, and are used increasingly more in sentencing. These instruments predict on offenders’ risk of recidivism after inputting their demographic and behavioral data in a statistical model trained on samples from historical populations. In the sentencing context, this risk prediction then influences a judge’s decision on the type, length, and severity of the offender’s treatment.

My independent work research adopts an interdisciplinary approach to research the racial disparities and unfairness in the design and application of algorithmic risk assessment instruments used in sentencing. Specifically, I complete a cost-benefit analysis on their implementation, explore their accuracy rates and different errors, and critique the type of data they use. First, I draw on a case study of the Commonwealth of Virginia’s proprietary tool called the Nonviolent Risk Assessment (NVRA). With over two decades of use, state and independent researchers have extensively studied Virginia’s experiment. In particular, I discover that while the NVRA can accurately predict low-risk offenders to be diverted to non-carceral sentences, it suffers from a lack of alternative sentencing options, judicial resistance to its use, and growing racial disparities in sentencing decisions. Second, I dive into the ethical and mathematical debate spurred by ProPublica’s report on the racial bias of COMPAS, a leading RAI used in several jurisdictions. After reviewing relevant literature, I argue that model error parity, the state of members across racial groups receiving equal proportions of false negatives and false positives, is a salient measure of algorithmic fairness. Lastly, I rely on case law and legal theory to illustrate that the inclusion of certain demographic characteristics in risk predictions, such as an offender’s race or socioeconomic status, likely violates constitutional protections whereas data on an offender’s behavior, such as their criminal history, is permissible. To close, I find that RAIs, as they are designed and implemented, foster racial bias and unfairness, and I outline several implications my results have for future policies on the use of algorithmic risk assessment instruments in sentencing.

Marina Beshai, COS (completed IW as a junior 2021)
Title of Project: Political Movements in the Age of Social Media: An Analysis of Twitter’s Role in the Egyptian Crisis

Governments worldwide and the media often blame social media companies for civil unrest rather than the associated individuals.  Claiming social media to be a threat against democracy, governments heavily moderate platforms and suppress activists. Drawing on more than six million #Egypt tweets published during the 2011 Egyptian crisis, this study explores the relationship between the in-person demonstrations and the online Twitter movement to observe how these components complemented and influenced one another. The rise and fall of #Mubarak (on Twitter) with the subsequent rise of #noscaf (No Supreme Council of the Armed Forces) goes to show how the grievances of protesters mirrored that of the topics trending online. They were controlling the narrative to a certain extent. And the sheer number of associated country hashtags (154 out of the 195 present-day countries were associated with #Egypt), never mind the use, imply a connected, worldwide community. Natural Language Processing (NLP) showed that English speakers were consistently more negative in their tweets than their Arabic counterparts. Not once did Arabic users express a more negative outlook than their English counterparts. Despite the large gap between the two groups. the correlation coefficient between the Arabic and English scores was 0.46, so there was a strong linear relationship between the general moods of the two parties. And a holistic analysis of tweets during the internet blackout in Egypt showed that many users around this time were increasingly concerned for the safety of protesters. On January 28, 2011, the first day of the internet blackout in Egypt, the frequency of tweets greatly increased by 82,020 tweets which comprise 1% of the total tweets including #Egypt in 2011. Topic modeling showed that on seven out of the ten days of the internet blackout in Egypt, ‘freeEgypt’ or ‘freedom’ were one of the most frequently used words that aptly describe users’ general attitude. In all, results suggested that there is a give and take relationship whereby users inside the country greatly influence the platform at the start of demonstrations, and in turn, receive support and aid from users outside of the country later on.

Justin Curl, COS
Title of Project: Please Pay Attention: Using YouTube’s ad algorithm to analyze the presentation of unwanted information

How do you get people to pay attention to or process unwanted information? Our research studies the effect of user behavior — how often a user skips ads — on the amount, length, and type of ads a user sees on YouTube. In our experiment, we reverse engineered aspects of YouTube’s ad algorithm using bots built with Selenium in Python to simulate three types of user behavior: positive towards ads, in which users never skip ads; neutral towards, in which users skip ads 50% of the time; and negative towards ads, in which users always skip ads. Overall, we found that while there does not seem to be a meaningful relationship between how often users skip ads and the number of ads users see, we did find that the users who skip ads more often are shown shorter ads that are less frequently skippable. These findings have interesting policy implications for organizations trying to convey important, though unwanted information: make viewing your messages mandatory and keep them short.

Audrey Laude, COS
Title of Project: Performance Decay in Machine Learning: Temporal Effects across Policing, Epidemic, and Financial Forecasting

As time passes, data used to train a machine learning model often becomes “outdated” which in turn hinders model performance, most often measured via prediction accuracy. This concept is observed and described tangentially with various terms, such as model decay and concept drift, among others; however, none of these completely capture the whole picture of temporal performance decay across fields. Hence, the goal of this project is to perform an exploratory analysis of performance on several datasets in different fields: the Stanford Open Policing project (annual scale), FluSight (weekly scale), and Dow Jones performance (daily scale). For the Stanford Open Policing project, logistic regression models were trained on Washington state data from 2009-2011 and tested on each year between 2012-2016. Whereas there appeared to be decay in the accuracy and AUC as years passed in the SOPP data, this seemed to be intrinsic to the data. For FluSight, models’ abilities to forecast 1, 2, 3, and 4 weeks into the future as well as the nature of the decay across this time period were analyzed, showing a negative relationship between performance and time and suggesting that an exponential model may describe this best. Lastly, two different models (a neural net and random walk model) were used to make time-series forecasts on the Dow Jones daily performance to see how error changes the further away one is from the last known stock price. Although complex modeling techniques did not outperform the base model, these still displayed performance decay consistent with the theoretical rate of error. Thus, this project provides initial insights with respect to the nature of performance decay across fields and asks questions that lay the groundwork for future research directions in the area.

Yu Jeong Lee, ECO
Title of Project: “Belong Anywhere”? A Dynamic Model of Airbnb Expansion and the Affordable Housing Crisis in New York City

Since its inception as an air mattress rental service in 2008, Airbnb has redefined our understanding of hospitality by opening residential homes to tourists. As of 2019, Airbnb offers over 7 million listings around the world, empowering tourists to “Belong Anywhere,” per the company’s motto. However, by crowding the residential housing market with seasonal vacation rentals catering to tourists, housing advocates argue the sharing economy giant is making it increasingly difficult for long-term residents to find anywhere to belong. According to the New York Housing Authority, competition between vacation rentals and residential units are driving the City’s shortage of affordable housing to a “crisis point,” with Airbnbs comprising up to 20% of the rental inventory in certain neighborhoods.

My research evaluates the effectiveness of New York City’s 2016 Multiple Dwelling Law banning the advertisement of entire-property rentals for less than thirty days on online marketplaces, including Airbnb. Specifically, I evaluate the effectiveness of the policy intervention in the context of Airbnb’s introduction of the Smart Pricing feature in 2015–a dynamic pricing algorithm that adapts listing prices to maximize host revenues. I find that though Smart Pricing increased median host revenues by $218 on average throughout the five boroughs, it didn’t pose a significant counter incentive to the rental regulation policy. In fact, through a difference-in-differences study comparing short-term rentals in New York City and neighboring Jersey City, NJ, I find that the 2016 policy decreased the share of illegal (short-term, entire-property) Airbnb listings in New York by 2.9%, or a decrease equivalent to 26% of the rental inventory growth in New York between 2011-2017. This finding suggests the advertisement ban was effective in curbing short-term rentals that crowd out much-needed housing supply, though it is unclear whether these Airbnb units were reverted to housing units or longer-term vacation rentals. Nonetheless, by incorporating Airbnb’s Smart Pricing feature to evaluate the incentives surrounding policy compliance and enforceability, I expand upon existing literature on online marketplace dynamics, platform design, and its implications for policy-making.

Yana Mihova, SPI (completed IW as a junior 2021)
Title of Project: Bill Gates, Drinking Bleach and 5G Radiation: The Role of Right-Wing Media in Spreading Coronavirus Misinformation

From initial reporting of the COVID-19 virus in early 2020, misinformation fueled the pandemic by spreading doubts about its authenticity. Due to the novelty of the pandemic, there was a gap in research on the effects of type of media consumption and its impact on believing misinformation about the pandemic. Since I was interested in investigating the ways that media consumption can impact societal perspectives on a particular topic, I decided to investigate the relationship between the spread of COVID-19 misinformation on media platforms and the outstanding consequences in society of this misinformation. I looked at this relationship by observing the type of news source an individual consumed and their likelihood to endorse COVID-19 misinformation, as measured by belief of COVID-19 conspiracy theories and distrust in public health officials. My analysis found that a statistically significant positive relationship existed between individuals who reported consuming only right-leaning media and their tendency to endorse COVID-19 misinformation. When taken into context with previous research indicating right-leaning media reported significantly more COVID-19 misinformation than moderate and left-leaning media, my findings indicate a correlation between reporting of false information and likelihood to endorse COVID-19 misinformation. This study brings to light the dangers of fact-less reporting and how it can have detrimental effects on societal outcomes.

Lindsey Moore, COS
Title of Project: Top Attrition Factors To Be Addressed in Female Engineering Interventions:  A Comparative Study of Current and Switched Out Female BSE Students at Princeton University

How do we fix the leaky pipeline of female engineering students at the college level? My research studies which attrition factors engineering interventions must address in order to lower the female engineering attrition rate at Princeton University. I conducted a survey of current BSE students and students who chose to switch out of BSE to determine consistent characteristics of persistent engineering students at Princeton University. From this survey, nine consistent characteristics of persistent female engineering students were found. The top three attrition factors related to these characteristics were students’ pre-collegiate academics, self-efficacy/self-confidence, and social support. These factors should be addressed in female engineering interventions to help lower the attrition rate. I also recommended some possible interventions that could address these factors, such as offering Calculus I to students before their first year and encouraging more female BSE students to take the EGR sequence of introductory classes.

Sara Sacks, SPI
Title of Project: Infertility is Not Just a Rich, White Woman’s Problem: Addressing Disparities in Access to Assisted Reproductive Technologies

My independent work considers disparities in infertility and the way they correspond to the broader disparities in health. The research offers a synthesis of scholarly literature about health disparities along the axes of socioeconomic status (SES), race and ethnicity, and region. I argue that the trends in health disparities pertaining to fertility care correspond to those found in the general health disparity literature around these specific dimensions. The research further analyzes the recent policies proposed in Congress to alleviate these disparities in access to ART, specifically the policy dubbed “IVF for All” and the Infertility Awareness Act.

Tara Shawa, SOC
Title of Project: Technological Gender Socialization: Examining Gender Representation and Reinforcement through the YouTube Kids Recommendation System

The focus of my independent work research lies in the intersection between childhood gender socialization, media effects, and digital technology. The research question is twofold: first, how is gender represented in content on the YouTube Kids platform? And second, to what extent does the recommendation system reinforce these gendered representations? To address these questions, I begin with a review of literature in the key fields that intersect on this topic. I start with gender theory and childhood socialization theory, then media effects and digital media, and lastly the platform through which they convene: YouTube Kids. Grounded in this cumulative knowledge, I conduct a mixed-method analysis, comprised of qualitative case studies and quantitative content analysis, to examine representations of gender and their potential exacerbation throughout various types of content. I find that gendered representations on YouTube Kids reflect normative constructions of gender, and that there is potential for the platform’s recommendation system to suggest increasingly gendered content to users.

Rachel Sylwester, COS
Title of Project: Are We Fair Yet? A Practical Analysis of Bias and Barriers to Fairness in Mortgage Lending

My independent work investigates observed bias in the U.S. mortgage lending system, where two underwriting algorithms are used to make approval decisions on 90% of mortgage applications. Recent work has exposed disparate rates of approval between racial groups by these supposedly “colorblind” underwriting algorithms.  Using real loan application data, this research explores the effectiveness of an algorithmic fairness intervention in mitigating the system’s disparate impact on minorities. I implement disparate impact remover, a preprocessing method, which edits features in the data to improve group fairness while preserving relative rank within groups. The results suggest that an algorithmic intervention, such as disparate impact remover, is effective at reducing the disparate denial rate of Black applicants without significant effects on accuracy or total cost. If implemented, such an intervention could affect millions of applicants each year due to the industry-wide use of automated underwriting systems. However, this paper finds that the effectiveness of an algorithmic intervention, such as the one explored in this paper, is substantially limited by structural and legal barriers. Thus, my research results in recommendations for technical steps to mitigate automated underwriting system bias but also regulatory and legal changes to enable and support such measures.

Henry Vecchione, COS (completed IW as a junior 2021)
Title of Project: Pan-app-ticon: What to Do About Ring’s Partnerships with Police Departments

I take issue with how Ring, the Internet-connected security camera company owned by Amazon, has pursued mutually beneficial partnerships with local police departments. The partnerships incentivize police to distribute Ring devices in their communities and grants the police access to a “Law Enforcement Portal” that enables them to select an area on a map, specify up to a 12-hour window of time, and send requests for footage from those hours to Ring owners in that area. I argue that cost and inefficiency are a significant barrier to surveillance creep and that this interface reduces that cost too much. I support this argument with two Supreme Court cases, U.S. v. Knotts (1983) and U.S. V. Jones (2012), the comparison of which illustrates how technological advancements can fundamentally change one’s expectation of privacy and the invasiveness of criminal investigation. I then examine the ACLU’s Community Control Over Police Surveillance (CCOPS) model bill and real legislation based on it, which require public approval for new surveillance technologies. I find that much of it doesn’t adequately protect against connected surveillance devices like Ring because they are not a “new technology”, rather an old technology that is harmful in how it’s used and efficiency it creates. This allows Ring to bypass approval. I then propose changes that Ring can make to their products and changes that legislatures can make to their bills to minimize harm. I suggest that Ring could use image recognition to blur faces in video sent to police, only removing the blur on order from a judge, or it could change the law enforcement interface to prohibit bulk requests or require more information. Legislatures should also alter how “new technology” is defined, requiring reapproval if a technology increases surveillance efficiency a meaningful degree even if it resembles an approved technology on the surface.

Jacqueline Xu, ORF
Title of Project: Promoting Sustainable Habits: A Network Analysis of Hard-to-Maintain Behaviors 

People’s decisions, actions, and opinions are manifestations of not only their personal values, but their social environments as well. Extensive research by sociologists and mathematicians in the past century have culminated into a network diffusion model that explains how individuals weigh the utility of available options based on their personal preferences and the perceived decisions of their local network. But while this model is a good simplification for consumer behaviors—one-off decisions that have immediate consequences—they do not adequately represent the diffusion of hard-to-maintain (HTM) behaviors, which include healthy habits like exercising regularly and environmental behaviors like becoming vegetarian. Unlike consumer behaviors, these decisions require persistent cues for upkeep and statistically exhibit slower rates of adoption. As such, they cannot be directly understood through existing diffusion models. This paper modifies previous studies to analyze the spread of habitual behaviors by incorporating a parameter for behavioral loss. Compared to simple behavioral diffusion, the results indicate that final adoption levels are generally similar while adoption rates are much slower. These findings suggest that low adoption levels at one point in time do not necessarily predict low levels at a future date.

Melody Zheng, COS (completed IW as a junior 2021)
Title of Project: Analyzing the Digital Divide: A Quantitative and Qualitative Study of Six United States Cities

As society grows increasingly dependent on information and communication technologies (ICTs), it becomes crucial to address the digital divide still present in many communities. In my work, I focused on identifying a digital access policy that the city of Oakland, California should adopt. To do so, I compared the initiatives of three cities in the same population range that have been “successful,” such that the rate of Internet and computer access for historically underserved populations has increased from 2015 to 2019, with the initiatives of three cities that have not been successful. Using data from the U.S. Census Bureau’s American Community Survey, I tracked the rates of Internet and computer access for five different demographics over the five year period and chose the three cities with the best average rate and the three with the lowest average rate. I then analyzed qualitative data to identify whether the selected cities focused on Internet access, computer access, and/or digital literacy training in their digital access initiatives, although two of the three unsuccessful cities had little to no such information publicly available. When comparing the three successful cities to the remaining unsuccessful city, Oakland, I found that while the four cities generally addressed all three aspects, the successful cities had a greater focus on community resources. Therefore, I argue that Oakland should invest more resources on digital literacy programs and publicly available ICTs, especially since they were not able to offer entirely free home Internet plans and digital devices. Community resources and technology classes would be more accessible to a greater number of households and hopefully lead to improved financial situations as well.

Previous Certificate Graduates’ Independent Work