T&S Certificate – IT Track

Certificate Student Independent Work

2024 Certificate Graduates Independent Work

Ajjarapu, Nikhil (COS)
Title of Project: Rescript: An LLM-powered Application to Track Congressional Policy Meetings

Bendarkawi, Jad (ECE)
Title of Project: The Swarm Garden: Human-Swarm Interaction for Self-Adaptive Art and Architecture

Why do our technologies instill so much fear in us? Why do our lives today feel so alienating when our technologies are supposed to improve our well-being? What if machines were more like animals and plants? Questions like these motivate the development of The Swarm Garden: An Interactive Architecture Exhibit, an opportunity for humans and robots to create unique, holistic experiences with technology for beauty and human wellbeing. At the intersection of swarm intelligence, architectural design, art-making, and dance, The Swarm Garden demonstrates an experimental, nature-inspired interactive architecture exhibit where 36 robotic flower modules bloom in response to human presence and can exhibit complex long-range and real-time responses through self-organization. Each module exploits the bistability of confinement – or the ability for a sheet to buckle into flower-like patterns when pulled through a ring. Through direct interaction with the flower modules and a wearable device for capturing dance gestures and movement, visitors are empowered to discover emergent behaviors in the swarm by manipulating the propagation of blooming patterns, LED light directions, and LED colors through various interaction modalities. After the hardware and software development and deployment of this human swarm system, we evaluated the interaction modalities used in public exhibition to discover an overwhelmingly positive response from the audience, demonstrating the success of this technology in delivering beautiful and holistic experiences through human-swarm interaction. We also find the evaluation of our system by a professional dancer successful in its application to human-swarm collaborative dance performance and improvisation, having served as a novel method for emergent, synergistic, and beautiful improvisational and choreographic performance outcomes. We envision futures where dancers, artists, and performers can utilize architectural swarms like The Swarm Garden as extensions of their artistic works and employ swarm intelligence to create embodied experiences with technology. Overall, the field of human-swarm interaction provides us the technological groundwork to create opportunities for us to reimagine our relationship with technology, and The Swarm Garden serves as a beacon for us to speculate a joyous future of coexistence between humans, machines, and nature through artistic and architectural robotic swarm applications.

Bhakta, Kareena, SPI (completed IW as a junior 2023)
Title of Project: When Sharing Is Not Caring: An Examination of UNHCR’s Data Sharing Policies in Bangladesh and Kenya

The implementation of biometric technology during refugee registration has the potential to speed up the process, improving access to aid and leading to less fraud since it can identify individuals based on unique characteristics such as iris scans or fingerprints. At the same time, this technology has introduced the issue of how organizations such as the United Nations High Commissioner for Refugees (UNHCR) should manage this sensitive data given that these populations are often fleeing conflict, extreme living conditions, and more. I focus on examining how UNHCR can build trust with refugee populations through their data protection policies and how to increase mechanisms for these populations to hold UNHCR accountable with regard to biometric data management. I analyze case studies of data sharing in Bangladesh and Kenya through four major categories: (1) policies governing data, (2) consent and information sharing, (3) accountability and redress, and (4) responding to failures and adaptations. Based on UNHCR’s published policies and first-hand accounts recorded by other scholars, I determine that there are few required actions of due diligence before data sharing (including data sharing agreements and impact assessments) in addition to miscommunication with refugees about the management of their data and limited mechanisms for complaints or redress. To conclude, I propose seven recommendations for UNHCR that are bucketed into three categories: (1) increasing refugee empowerment, (2) strengthening management of data, and (3) decreasing reliance on biometric data.

Castleman, Jane, COS (completed IW as a junior 2023)
Title of Project: Your Answers Are Protected By Law: Evaluating Protections against Reconstruction and Re-identification Attacks on the U.S. Census

In 2021, the U.S. Census published a report revealing that the privacy protections they used for the 2010 U.S. Decennial Census Data were insufficient, saying the results “were alarming” and “provided conclusive evidence” that stronger privacy protections were necessary. My project aimed to build a framework for evaluating the impacts on private databases by reconstruction and re-identification attacks. It connected the characteristics of these attacks to the legal definitions of privacy in the Title 13 U.S. Code, which states that the Census Bureau cannot publish data from which an individual can be identified. By evaluating the distances between the reconstructed and target database, as well as significant rates of reconstruction and re-identification, policy makers and researchers can better understand whether or not these attacks constitute a violation of individual privacy. In the Census Bureau’s attack, I argue that there was a significant rate of reconstructed individuals. For blocks of size 1-9, 10-49, and 50-99, there exist population uniques that can be exactly reconstructed, therefore violating the Title 13 U.S. Code. Additionally, a significant rate of re-identification can be determined by setting bounds for the re-identification rate and the accuracy of re-identifications. Given there were 178 million re-identifications in their worst-case attack, I argue the rate and accuracy of re-identifications are beyond significant bounds. Overall, this framework begins to bridge the gap between mathematical formalizations of privacy and legal definitions by creating a method to evaluate reconstruction and re-identification attacks. It also hopes to help policymakers understand when and why to apply increased privacy protections to their data.

Chen, Alina, COS
Title of Project: Virtual Ad-demic: Examining Skew in Facebook’s Ad Delivery for Vaccine Information

Social media platforms, such as Facebook with its three billion global users, play a critical role in public health communication. Vaccine hesitancy, a top global health threat as per the World Health Organization, is fueled by misinformation across these networks. As the world grappled with the rapid spread of COVID-19 in 2020, discussions around vaccination, vaccine mandates, and public health measures intensified both online and offline, and heightened the need to understand vaccine discourse on social media. My independent work focused on Facebook’s vaccine-related advertisements from 2018 to 2024, utilizing data from the platform’s Ad Library. I studied 31,362 ads from 1,297 advertisers, analyzing narrative content, advertising strategies, and delivery patterns, with the aim of determining how these elements influence public vaccination perceptions, assessing ad distribution efficacy across demographics, and identifying potential biases in Facebook’s ad delivery algorithms. My findings demonstrate significant misalignments in ad distribution versus intended demographic targets, indicating possible biases that could affect public health communication, and emphasizing the need for transparency in social media algorithms to ensure equitable information distribution.

Dong, Stephen, COS (completed IW as a junior 2023)
Title of Project: SamplAR – Augmented Reality for Family-Style Restaurant Ordering

While there has been research on the commercial benefits of augmented reality (AR) within the restaurant setting, there is limited research exploring the social potential of AR within the restaurant setting, particularly within restaurants that dine family-style. This is especially important because restaurant ordering is inherently social, and many restaurants that dine family-style are Asian, which comes with difficulties such as language barriers and a lack of understanding of specific foods. My hypothesis was that AR could help solve these issues while making family-style food ordering more collaborative, social, and fun. In my independent work, I built a collaborative ordering platform for family-style restaurants called SamplAR and evaluated the application with 5 dyads to study the impact of the application as well as uncover insights about the dynamics underlying ordering at family-style restaurants. My hope is that my research can help us build digital technology within restaurants that encourage socialization.

Esparraguera, Liam, COS (completed IW as a junior 2023)
Title of Project: a11ystudy: An Explorable History of Web Accessibility

Despite the growing prevalence of web-based technologies, the study and development of accessible online interfaces has remained a challenge and site of ongoing progress since the dawn of the Internet-connected age. Statistical reports on the modern state of web accessibility have been published, yet, there exist few detailed reports that span the history of the web, a solution which could serve as a crucial utility in the development of accessible interfaces. In this project, I develop a novel approach to documenting the evolution of digital accessibility through the integration of web content archives with software for the programmatic evaluation of web page accessibility. The final result is a two-tool system for the collection and exploration of time-series data on web content accessibility: a11ystudy, a command-line interface for the evaluation of archived web pages, and a11ystudy-web, a companion web application allowing users to visualize exported data to explore trends in web accessibility. These tools are used in conjunction to generate and visualize a sample dataset that documents the conformance of the top 100 webpages to the Web Content Accessibility Guidelines from 2012 to the present. With this project, I build the foundation for a toolset that enables individuals, regardless of technical expertise, to interrogate the past and present state of web accessibility in order to pursue a more equitable future for digital technologies.

Frascella, Anthony (Ted), SPI
Title of Project: Under the Electric Eye: An Analysis and Assessment of Risk Factors For Abuse of State Surveillance

The surveillance technology is an ever growing presence in the world and individuals living under all regime types must reckon with risks to personal privacy and political rights as a result. My independent work explores the impact of surveillance technology on society, focusing on its dual role in enhancing public safety and posing risks to privacy and freedom. It examines how different regime types deploy surveillance systems and their consequent effects on populations. The study utilizes a mixed-methods approach to analyze the correlations and causal relationships between factors such as access to technology, political rights, and security conditions, and their impact on privacy. This research reveals that surveillance integration into daily online activities raises significant privacy concerns, highlighting the societal challenge of balancing security with individual rights. The work underscores the importance of robust legal frameworks and external oversight mechanisms in preventing surveillance abuses and protecting democratic freedoms. It calls for international cooperation and stricter legal guidelines to safeguard privacy and human rights in the digital age, stressing the societal implications of technological misuse and the necessity for policy that upholds human rights.

Gil, Irene, COS
Title of Project: Virtual Museum Hub: A New Lens to Breaking Down the Antiquated Structures of the Arts Industry to Reinvigorate a Social Community

In recent years, the growth of the museum industry in the US has been slow. However, this is not in line with the sentiment of most museums, where it has been an age-old goal for them to nurture interest and education in the arts. This goal of most museums has been prevented by the barriers felt by many visitors, whether they be financial, logistical, or simply a lack of exposure. A lot of these barriers stem from the antiquated structures that the museum industry is tied down by. For the quantitative framework of my research, I analyzed data on the top 20 museums in the US to gather insights on museum operation and asked the question of where the barriers to break down come from. For the quantitative framework, I analyzed the Google Arts & Culture feature, using the Google Trends tool, studying how effective the feature is and what impact this technology has on society. Bringing in my IW work of a digital museum reservation platform, the Virtual Museum Hub, I argue that there is a different angle in which we can use technology to further the arts education and use museums as a way to cultivate social engagement.

Grover, Ananya, COS
Title of Project: Navigating the News: A Pro-Social LLM Chatbot for Enhancing Viewpoint Diversity in Online News Consumption

In response to concerns about media polarization and harmful machine-generated content, my independent work explores the development and evaluation of PRISM, a socially beneficial chatbot designed to assist individuals in quickly accessing alternative perspectives on news found on the internet and social media platforms. I compare three different Large Language Model (LLM)-based approaches, namely Zero-Shot Learning (ZSL), Chain of Thought (CoT), and Multiagent Group Chat (MGC), at performing the task of writing comparative summaries of up to three news articles, finding that the Chain of Thought method performs the best. This independent work presents both a tool and a set of findings that reinforce its need, highlighting the potential of LLM-based tools in facilitating critical engagement with news media and enhancing users’ access to diverse viewpoints to promote healthier media diets and discourse.

Knoll, Theo, COS (completed IW as a junior 2023)
Title of Project: ARctic Escape: Promoting Social Connection, Teamwork, and Collaboration Using a Co-Located Augmented Reality Escape Room

Escape rooms are interactive games where players collaborate to discover information about their environment to accomplish a shared goal. While physical escape rooms provide groups with fun, social experiences, they require a gameplay venue, props, and a game master to play, all of which detract from their ease of access. Existing augmented reality (AR) escape rooms demonstrate that AR can make escape room experiences easier to access, but many AR escape rooms are single-player, and therefore fail to maintain the social and collaborative elements of their physical counterparts. I created ARctic Escape, a two-person, co-located AR escape room designed to promote social connection, collaboration, and communication. I evaluated ARctic Escape by conducting semi-structured interviews with four dyads to explore the sociological implications of AR technology as applied to escape rooms and to learn about participants’ interpersonal dynamics and experiences during gameplay. I found that participants thought the experience was fun, collaborative, promoted discussion, and inspired new social dynamics.

Lee, Alison (Alice), COS (completed IW as a junior 2023)
Title of Project: Examining Gender Differences in Investor Questions Towards Entrepreneurs: A Shark Tank Case Study

Venture capital allows startups to scale and be successful quickly, yet women have historically been excluded from a majority of this funding. Venture capital funding for all-women teams has hovered around the 2% mark for over a decade. We explore the potential explanation that investors ask entrepreneurs different questions on the basis of gender, perhaps subconsciously revealing a cognitive bias that assumes women are more likely to fail. This is demonstrated with more popular usage of ‘potential’ words like “aspire” and “hope” when talking to men and more popular use of ‘prevention’ words like “risk” and “careful” when conversing with women. Using a newly-created Shark Tank dataset, we use natural language processing to create classifiers that take in investor questions and predict the gender of the entrepreneur. We find that the classifiers are not meaningfully more accurate. However, we find some qualitative evidence of investors asking entrepreneurs different questions.

Liu, Zi Han, SOC
Title of Project: The Datafication of Listening: How “Spotify Artists” and “Spotify Wrapped” Shape Value in the Streaming Music Economy

Streaming has completely transformed the recorded music economy. Building on the research of my senior thesis, which examined how tastemakers and consumers shape value in the streaming music economy, this independent work focused on the sociotechnical changes that emerged since the rise of Spotify. Specifically, I conducted a comparative content analysis between Spotify’s artist-facing interface (i.e. “Spotify for Artists” or “Spotify Artists”) and its consumer-facing interface (i.e. “Spotify Wrapped”) to show how streaming platforms refashioned the music listening experience by way of behavioral data. In turn, this repackaged music navigation by the centrality of curated social identities rather than sound. For “Spotify Artists,” this is achieved by segmenting audiences based on the listener’s relational bond with their favorite artists or songs. Conversely, for “Spotify Wrapped,” consumers built intimate algorithmic identities by engaging with Spotify’s curated listening profiles.

Maynard, Lauren, COS (completed IW as a junior 2023)
Title of Project: Nonprofits Unlock the Decentralized Landscape: Addressing the Fears, Uncertainties, and Doubts of Leveraging Blockchain Technology in Civil Society

As demonstrated in the open-source software (OSS) movement, many sectors were actively shaping and influencing the development and use of this technology. Fears, uncertainties, and doubts left nonprofits hesitant initially to adopt OSS technology. However, as the technology matured and nonprofit organizations saw the cost savings, increased efficiency, scalability, and better performance that OSS offered, they began to recognize the competitive edge and other advantages of using OSS and increasingly adopted it. Unfortunately, by then, many of the organizations that had hesitated to adopt OSS had missed the opportunity to shape and modify earlier stages of OSS. To better understand how civil society can participate in the decentralized future of the web, TechSoup Global conducted qualitative research sponsored by Filecoin, a decentralized storage network. I was recruited to identify a robust use case for decentralized technology to prevent nonprofits from falling further behind in the digital divide. I conducted a mixed methods analysis that included qualitative interviews, surveys, and a literature review to develop user personas using Figma, a design and prototyping tool. This research framed the landscape analysis and revealed best practices for the nonprofit adoption of decentralized technology. My independent work identified that IT professionals are the subject matter experts leading the way towards accepting decentralized technology–as they increasingly recognize the advantages and can implement them if adequately supported. Therefore, this strategy’s objective was to inform stakeholders about the benefits of decentralized technology and streamline adoption efforts by surfacing IT professionals’ challenges when they advocate for decentralized technology’s use. We hope that with TechSoup’s knowledge and resources nonprofits will be empowered to leverage and shape the new decentralized space to meet their organizational needs by making the digital future as secure, transparent, and open as possible.

Parikh, Yash, COS
Title of Project: Longitudinal Web Privacy Monitoring: Toward a Regulatory Tool

Recent years have brought unprecedented amounts of regulation on third-party data collection. There have been more state-level privacy laws in the past three years than ever before, and private companies are separately choosing to regulate third-party data collection. Regulators will have a large role in determining what changes should occur and how effective these changes are. To empirically measure the impact of regulatory changes and to enforce their policies, regulators need longitudinal web privacy monitoring tools. Current tools are too complex and/or cannot collect data about privacy violations over time. To address this gap, I conducted user interviews to determine the features regulators need in their web privacy monitoring tools. I created Longitudinal, Automated Monitoring of Privacy (LAMP), a proof-of-concept automated web-scraper for longitudinal web privacy monitoring to fill these needs.

Rubenstein, Alison, ORF
Title of Project: Sparking Interest: Investigating Drivers of Public Interest Through an Analysis of Google Trends for Extreme Weather Events

Understanding drivers of public interest can reveal solutions to collective action problems such as climate change and enable widespread behavioral change. Since search engines, like Google, are easily accessible and frequently used, online search behavior data can serve as an insightful metric to gauge public interest at a given place and time. In this independent work, I used Google Trends data for climate change and natural disasters to understand behavioral patterns related to these topics and to investigate the advantages and disadvantages of using online search behavior data in research. Strong relationships were observed between searches for climate change and searches for heatwaves, as well as between climate change searches and the volume of climate related news publications. Additionally, the results suggest that extreme weather events, like hurricanes, capture widespread search attention at the times of these events, which has important implications for information messaging campaigns. Cross-regional analysis observed more similar behavioral trends between locations that are closer together, have similar weather experiences, and have similar political affiliations and education levels. Through this research, I assessed multiple statistical methods for characterizing and interpreting online search data and demonstrated the importance of contextualizing search data when modeling and analyzing trends. Overall, this research demonstrated that online search behavior data can provide valuable insights related to public interest but considering many possible meanings of user data is essential when matching user search data to scientific data.

Song, Emmy, COS (completed IW as a junior 2023)
Title of Project: Cracking the Bamboo Ceiling: Predictive Factors for Asian American Promotion in the Workplace

Under the model minority myth, Asian-Americans are stereotyped as high-achieving, well-educated members of society who are able to find well-paying jobs. They make up only 6.2% of the United States population but are well over-represented in the workplace, composing 13% of working professionals. However, Asian-Americans falter at receiving promotion to the upper echelons of leadership, where White Americans dominate. While White Americans make up 69% of the U.S. workforce, they compose 85% of executives, senior officers, and other higher-level managers. On the other hand, Asian Americans make up 13% of the workforce and only hold 6% of top positions. I study the mechanisms as to why Asian-Americans are not promoted at the same rate as White Americans by drawing upon employee responses to surveys about their work experiences. By utilizing a random forests model to identify the most relevant factors, I determine that demographic attributes such as race and gender, as well as level of career support and training are largely impactful in determining Asian-American promotional outcomes. In addition, indirect values of grit and job support were more important to advance promotions for younger age groups, while ambition and monetary motivation came to the forefront for older ones. Finally, my intersectional analysis of race and gender confirms previous hypotheses that being male and more experienced adds an advantage in promotion, regardless of race.

Vuono, Ryan, SPI
Title of Project: Towards Best-in-Class Biometric Data Protection for Refugees: A Comparative Review of UNHCR, Oxfam, and ICRC Policies

Biometrics have rapidly increased in popularity as a method for verifying the identities of refugees, both by governments and humanitarian organizations. With over 100 million individuals falling under its mandate, UNHCR has an immense responsibility to ensure that their personal data is protected, especially so in the case of biometrics, which are permanently and irreversibly connected to one’s identity. To ensure that this incredibly sensitive form of data is used exclusively to achieve its primary mission to “safeguard the rights and well-being of refugees,” it must have best-in-class protection policies. In my paper, I compare UNHCR data protection policies with those of Oxfam and ICRC, two other organizations in the humanitarian space. I analyze the text of each organization’s policy documents and compare them across four main criteria—scope and specificity, data collection and data storage, third-party sharing, and flexibility. This analysis revealed that UNHCR policy could do more in order to improve its specificity of language, stringency of its requirements, and security of its held data—especially with regard to biometrics. Taking inspiration from the strongest aspects of Oxfam and ICRC policies, I provide five major recommendations to help better achieve the agency’s mission of safeguarding the rights and well-being of refugees: (1) creating clearer delineations between biometrics and other kinds of identifying data; (2) conducting a comprehensive assessment of the risks of collecting, processing, storing and sharing biometrics, and sharing the information with possible data subjects; (3) strengthening point-of-storage protections; (4) developing a stricter data-sharing framework to ensure that biometrics are only shared when absolutely necessary and in very specific ways; and (5) implementing an annual or bi-annual review process to update its policies to keep pace with the rapid pace of biometrics’ development. UNHCR has clear opportunities for improvement in order to ensure the sustained protection of refugees in an increasingly technological world.

Waseem, Shanzey, (COS) (completed IW as a junior 2023)
Title of Project: Video Games: There’s No Time for Violence

Psychological research depicts how violent video games cause real-life violence, particularly in the youth. While vast literature investigates various policies to curb the specific harms to this protected population that are a result of violent gaming, I use guardians’ perceptions to demonstrate the demand for more stringent content regulation and the introduction of time regulation. By designing a meticulous survey to collate a deep analysis of guardians’ perceptions, and despite the fact that psychological research that shows that perception is not reality, by comparing generalized perceptions to more specific observations and including significance levels to value the conclusions, the results showed a significant demand for industry-level content and time moderation policies; however, it also displayed the lack of awareness guardian’s have on video gaming policy, literature and hence, results-based regulations. As such, the policy implications are to look at developer and educational end changes that can be incorporated.

Wilks, Torre, SOC
Title of Project: Pretty Hurts: The Intersectionality of Race, Weight and Socioeconomic Status on Algorithmic Bias

Qualitative interviews that describe how content creators perceive algorithms as contributors to the mistreatment of people of color on social media. The participants of my study, all with more than 20,000 followers, believe societal factors like race, weight and socioeconomic status create algorithmic bias on social media platforms. The consequences of this bias leads to creators of color having a difficult time achieving virality and adequate compensation on social media. Biased algorithms shadowban and moderate Black content creators harsher than white creators, which makes it harder for people of color to be discovered on apps and causes them to be paid less. Moreover, the AI technology brands use to determine which creators they should sponsor are influenced by racial stereotypes that discourage partnerships with people from low socioeconomic statuses. As social media platforms progressed, the influence of a content creator has permeated beyond the realm of entertainment, and now has jurisdiction over our economic and political decision-making. Meaning, if algorithmic bias prevents Black creators from achieving the same amount of power and privilege on these platforms, then social media companies should be responsible for making their technology more transparent and equitable.

Woo, Melissa, ORF
Title of Project: From Black Box to Glass Box: The Impact of Data Complexity on Machine Learning Explainability

My research focuses on enhancing transparency and trustworthiness in credit scoring for lending by evaluating post-hoc feature attribution methods, which explain model decisions by quantifying the significance of each input feature in the outputted result. By analyzing performance across various data complexity contexts such as feature correlation and target expression, I identify the best-performing methods, finding that SHAP-based methods and particularly On-Manifold SHAP are highly effective for explaining model decisions given data with linear target expressions and high feature correlation. These insights help improve model interpretability and decision-making in credit lending, addressing concerns about fairness and accountability in complex machine learning models.

Yang, Katherine (Kathy), COS
Title of Project:Proving Causality in Disparate Impact Housing Discrimination Cases After Inclusive Communities

In the United States, a long history of discrimination has resulted in persistent demographic disparities in housing. The Fair Housing Act was intended to lessen the extent of segregation and provide legal recourse to victims of housing discrimination. Under the act, “disparate impact” claims provide an option to object to insidious policies that are facially neutral but contribute significantly to disparities. In the past two decades, these claims have grown harder for plaintiffs to prove. Notably, the introduction of new causation standards has prevented plaintiffs from moving past the initial “prima facie” stage in court using traditional statistical methods. In parallel to this increased burden is the explosion of a “causal revolution” in the scientific world. My goal was to investigate whether Judea Pearl’s structural causal models—drawn from theoretical computer science—could help plaintiffs better prove causation in disparate impact housing discrimination cases. To that end, I conducted a case study analysis of the seminal Inclusive Communities v. Texas Department of Housing & Community Affairs case—tracing the evolving causation arguments through multiple iterations and constructing a proof-of-concept causal diagram for the facts of the case. I conclude that structural causal models show promise for application in disparate impact housing claims given their flexibility, robustness, and accessibility. In addition, the cross-pollination of computer science and law in this project unearthed fundamental philosophical discrepancies between scientific and legal definitions of causality that remain to be resolved through future work.

Zhang, Jasmine, COS
Title of Project: Click for a Cure: Analyzing Healthcare Advertisements on Facebook and the Role of the Delivery Algorithm

Advertisement delivery algorithms play a uniquely impactful role in our online experiences, determining what topics we see and what perspectives we hear. As ad delivery algorithms are powered by machine learning, their outcomes may be biased or skewed by gender, race, or age due to biases in the training data. In my research, I initiate the conversation in the domain of healthcare. My objective is to generate a holistic understanding of Meta’s healthcare advertisement space, and to study whether healthcare ad delivery may be discriminatory. I collect existing Meta advertisements using the most popular keywords in the current healthcare discussion. I conduct textual analysis on the ads within each keyword, examining popularity, complexity, formality, and point of view used. I further conduct thematic analysis, focusing on the topic of climate anxiety. By coding and identifying themes across a random subset of climate anxiety ads and newspaper articles, I find that the perspectives and themes reflected in the advertisement space parallel the discussion in news media. I further examine the role the delivery algorithm plays in influencing the delivery of such ads across demographic categories by running my own climate anxiety ads in conjunction with ads for nature, ChatGPT, and social media influencers. By comparing the demographic distribution reached across the different ads and observing that they are not substantially different, I conclude that the ad delivery algorithm does not treat climate anxiety ads differently from ads in the other categories I study. Rather, the algorithm delivers all the ads to predominantly older male audiences, suggesting that advertisers aiming to reach gender-balanced or younger audiences should use explicit targeting features indicating these goals.

2023 Certificate Graduates Independent Work

Ayers, Christien, MUS
Title of Project: The Effect of the Internet on Music Consumption Patterns

The Internet, more than 30 years after its public release, is generally understood to have played a significant role in changing the way global society now organizes itself. Thinking about the ways in which people today communicate, learn, spend their time, relate, and consume, the Internet has proved pervasive in each instance. My independent work analyzes how society’s relationship to music consumption has been impacted by the Internet. Through a survey of a variety of sociological literature diving into older music consumption patterns, I account for the ways in which society consumed music before the Internet was used as a tool for consuming and sharing music. Then, I consider Napster and its effect on the music industry in order to understand the way in which the Internet radically altered consumers’ relationship to music. From there, I use Napster as a case study to draw conclusions on consumers’ relationship with music in the 21st century vis à vis the Internet, providing insight into patterns that have remained the same and those that have changed.

Chloe Chen, COS 
Title of Project: Increasingly Ambiguous Privacy Policies: Concerns and Regulations

Privacy policies are a critical and ever-evolving tool for understanding how companies collect, use, and store data. However, while many regulations exist in various jurisdictions for ensuring privacy policies contain necessary content, there is virtually no regulation regarding the nature of the language that is used in privacy policies. In this paper, I examine previous research on the negative effect that ambiguous language in privacy policies can have on user understanding of privacy risk. I then analyze the ambiguity of language in existing privacy policies using Python, and find that privacy policies have evolved to include proportionally more ambiguous language over time. Finally, I assess the indirect impact that the General Data Protection Regulation (GDPR) may have had on the ambiguity of privacy policy language, and explore other ways that the language of privacy policies can be quantified and regulated.

Elizabeth Dorman, COS
Title of Project: Dream Garden: Exploring Location-Based, Collaboratively-Created Augmented Reality Spaces

Despite the potential for connecting strangers in the digital realm, current research has not explored location-based augmented reality experiences that enable strangers to connect by building artifacts collaboratively. In my independent work I created Dream Garden, an augmented reality application (AR) that lets people place 3D flowers into the physical world to build a collaborative location-based garden. Anyone with the app can access and see the flowers previously planted by strangers, as well as plant their own flowers to grow the garden. I evaluated this app with 10 participants, with 5 visiting the digital garden more than once, to evaluate their sense of connection to the other participants as each participant added one digital flower to the garden. I found that participants were joyful about building a shared space, were excited about the dynamic nature of the garden, felt a connection to the physical location of the digital garden, and expressed a sense of belonging to a community of strangers but not necessarily an emotional connection. Significantly, this research gives us insights on how we can use augmented reality as a tool to bring people together in real life and fostering senses of connection, creating technologies that bring us together instead of driving us apart.

Jaelin Haynes, COS
Title of Project: The Black Purposes of Web3: A Technical and Societal Analysis

Web3 is a system of applications built on the blockchain, an immutable, distributed ledger of blocks linked together with cryptographic hashes. This emerging technology is being promoted by some as a solution to the centralization and user privacy issues of the current web. Given the pattern of technologies becoming mainstream and having negative effects, especially on marginalized groups, it is important to consider the potential societal implications of this new technology before it is widely adopted. With background on the previous iterations of the web, Web3’s value proposition, and its technical details, I focused on African Americans in particular, exploring the history of Black Americans and technology and drawing on that history to investigate the potential uses and impacts of Web3. Specifically, I found themes of the digital divide, surveillance and privacy, coded racial bias, and technological creativity, and completed a data analysis and visualization of African American sentiment towards new technologies using Python libraries. The data analysis showed that Black participants indicated generally negative attitudes towards new technologies compared to their White counterparts. Even while being informed about scientific and technological innovations, African Americans showed unfavorable attitudes, potentially contradicting the idea that lower levels of use and access are results of less education. Based on the persistence of past themes, Web3 may not be the ultimate solution to the issues of Web 2.0 as some may claim, especially for African Americans, but there are still some valuable and beneficial principles in its development.

Rahul Jain, ORFE
Title of Project: An Examination of the Effect of Mobile Fintech Adoption on Microfinance Institutions in India

Microfinance Institutions, in India especially, have matured significantly over the past decade and a half, with increased government support, buy-in from private institutions, and the adoption of data analysis and mobile financial technologies. In this independent work, I first begin by discussing the history and regulations of MFIs in India, breaking it into four phases: origination, maturation, buy-in from private investors, and adoption of technologies. Following this overview, I perform exploratory data analysis for select MFI operating and loan repayment metrics, finding a consistent increase in loans per loan officer but no large change in operating expense / loan portfolio and loan repayment metrics (loan loss rate and 30-day and 90-day portfolio at risk) over time. With an understanding of the data distributions, I run a regression and p-value statistical testing for those metrics with the rural wireless teledensity (as a proxy for mobile phone and internet access) to see the effect of mobile fintech adoption. I conclude that operations have improved, with the regression on loans per loan officer having a statistically significant result with a P-value of 0.0015 for the coefficient around 4 as well as an R-squared value of 0.74. Regarding loan repayment metrics, I could not reject the null hypothesis as no significant linear relationship was found, likely due to too many confounding factors. Lastly, I use the overview of MFIs to date in conjunction with the quantitative analysis to provide recommendations for further MFI development, reiterating the value of strong government and partner organizations relationships, adoption of novel technologies, internal governance, and quantification of performance metrics.

Rohan Jinturkar, COS (completed IW as a junior spring 2022)
Title of Project: Investigating Racial Bias Trends in the Text of US Legal Opinions

There are many instances of racially biased outcomes in the American legal system. However, it is unclear if such bias also exists in the text of judge opinions, and if it varies across time periods and regions. We approximate GloVe word embeddings for legal opinions at the federal and state level from 1860 to 2019. We find evidence of racial bias across nearly all regions and time periods, as traditionally Black names are more closely associated with negative/unpleasant terms whereas traditionally White names are more closely associated with positive/pleasant terms. We do not find evidence that older opinions exhibit more bias, or that opinions from Northeastern states show greater change in racial bias over time compared to Southern states. These results counter the principle of impartiality in legal settings and demonstrate the need for further research into institutionalized racial bias. Lastly, we survey approaches for reducing bias across the legal system.

Hannah Kapoor, SPIA
Title of Project: 18 Words to Reform the Internet | An Evaluation of the Governance of Evolving Technologies: The Case of Section 230 of the Communications Decency Act

What can be learned from debates about reforms surrounding Section 230 of the Communications Decency Act (CDA) to inspire the future governance of evolving technologies that challenge fundamental rights?  Passed in 1996, Section 230 grants internet service providers immunity from certain types of liability for user-generated content; the law has come under fire in the evolving communications landscape.  Policy attempts to reform Section 230, therefore, represent a compelling case study to inform how policymakers craft legislation for technologies in an evolving landscape. Through a comparative analysis of stakeholder interests from the onset of the deployment of online content platforms to their more matured state, the research travels from 1959 to the present day, and reviews thousands of pages of historical accounts, Congressional testimony, and court briefs, to build comparative stakeholder analyses presenting the underlying interests, motivations, and arguments that have framed the Section 230 debate across time and sector

Despite the widespread criticism of Section 230, the research finds that the core stakeholder motivations remain unchanged across time: to mitigate speech harms and uphold freedom of speech online. However, it is also observed, as technology has evolved, that stakeholders increasingly approach reform on the basis of profit, partisanship, and reactive responses to specific instances of social harm. The research reveals the perils of limited stakeholder engagement across sectors in the curation of legislation and the public pressure endured by policymakers that encourages them to react in response to specific harms. In a techno-moral context, policymakers are advised to consider lawmaking as an instrument of governance that promotes and protects foundational rights, such as freedom of speech. A “Precautionary Agile” approach to the governance of evolving technologies is recommended, coalescing “ex-ante” and “prohibitive” approaches of governance.

Henry Koffler, ORFE
Title of Project: A Pricing Analysis of European Cap & Trade Carbon Futures: Trading Strategies and Implications

The global response to changing climate conditions has overwhelmingly been in favor of strong regulation. However, the European Union, in their implementation of their Emissions Trading System, has demonstrated that there is an alternative where economic progress is not only fundamentally compatible with, but necessarily commands environmental protection. In this cap & trade system, certain corporations are required by law to purchase rights to emit CO2 to offset their emissions. That said, many are arguing that financial speculation makes pricing nigh impossible for corporations who are unable to opt out. As such, my paper seeks to evaluate the veracity of the claim that the European Union carbon credit prices are significantly driven by market speculation and not accurately predicted. To accomplish this goal, my independent work blends traditional financial modeling approaches such as Least Absolute Shrinkage and Selection Operator regression, the division of historical prices into market regimes using a Gaussian Mixture Model, and constructs a Long Short-Term Memory network to accurately price carbon credits. After showing that carbon credit prices are in fact readily modeled with precision, my research concludes that carbon prices are indeed strongly correlated to a slew of world states (such as weather) as well as commodities (such as oil or coal) and that market speculation is not a significant factor. Having established a methodology to comprehensively price carbon credits, my independent work provides policy recommendations to increase adoption of  Thailand’s new carbon credit market, especially amongst the rural farmers who will be its primary users. Additionally, my paper reflects on the ethical questions surrounding environmental investing and financial speculation. Principally, why financial speculation is notably distinct from gambling and, in fact, provides strong benefits to the carbon credit market.

Colton Loftus, COS (completed IW as a junior spring 2022)
Title of Project: Analyzing Experiences with Speech Controlled Accessibility Software and Developing a Solution for the Linux Desktop

For individuals with disabilities affecting the use of their hands, typing and using a mouse can be not only inconvenient, but also painful. This problem is especially prevalent within the software ecosystem of the open source operating system, Linux. While Windows and MacOS both have proprietary disability software for controlling the computer through voice, Linux users do not have access to these same proprietary solutions. In my independent work, I developed a voice controlled accessibility program that can help solve this issue. My program can be used for a wide array of actions across the Linux desktop. It can control windows, press keys, dictate text, and much more. It can also be customized or run scripts from the user to perform new behavior. While developing my program, I also wanted to better understand how to design and implement policies pertaining to accessibility software. To do this I held a series of software demos with my program.  By the end of my research, I developed a series of key takeaways: 1) While accessibility designers don’t always have the resources to train their own models, they can nonetheless design applications with modular and customizable behavior. In my application, users can interact with the machine learning backend, switch models, and customize command names. All of these choices push back on the idea that the voice recognition backend should be a black box for users. (2) Graphical user interface (GUI) programs should limit mouse usage and prefer keyboard shortcuts when possible. Throughout my software demo, it was particularly difficult for users to mimic commands like dragging or dropping through just voice. However, keyboard shortcuts were quick and easy for both voice and hands to perform. (3)  Workspace designers need to take into consideration those with alternative input methods. Throughout the development process, it was difficult to find public places quiet enough to use voice controlled software while not disturbing others. In addition to my open source code, these conclusions from my policy research will help future designers  create more accessible workspaces, communities, and software applications.

Katie McLaughlin, COS
Title of Project: A Systems Approach to Mitigating Harms of Content Recommender Systems

Content recommender systems play a critical role in the dissemination and discussion of information. Therefore, it is crucial that there are structures and processes to support the development of safe, trustworthy recommender systems. To responsibly deploy these models, Machine Learning (ML) practitioners must go out of their way to navigate a complex ecosystem of regulation, organizational structures, and resources. In this paper, I take an approach rooted in systems theory to detail a holistic overview of the content recommendation ecosystem and the challenges that arise from its structure. Through a survey and semi-structured interviews, my research highlights how these issues stem from the absence of a shared language around harms. Furthermore, practitioners lack a framework to evaluate the ethical and societal implications of their recommender systems. I detail how the current approach to identifying and mitigating harms does not adequately prevent these harms. I also identify challenges that arise from the current organizational arrangements of knowledge and expertise. Based on these findings, I recommend how we can use oversight mechanisms to establish a standard for harm mitigation approaches. I further suggest we modify the existing ecosystem to distribute responsibility and equip practitioners to confidently deploy safe content recommender systems.

William Olson, COS
Title of Project: Faces at Face Value: An Analysis of Face Recognition Technology Policy and Performance

Facial recognition is one of the best developed and widely used applications of machine learning and examples of artificial intelligence in 2023. The development of the technology and the ubiquity of high-quality video recording devices like traffic cameras, surveillance cameras, and police body cameras enables the permeation of this technology throughout all spheres of life. Such technology invites the fear of constant surveillance and the decline in individual privacy, particularly in public areas. This study aimed to comprehensively gauge the current policies surrounding the use of face recognition in the United State. Specifically, I embarked on a two-part policy analysis. The first component examined the use of face recognition by law enforcement at the federal, state and local levels. I found impactful regulation at the state level, the most effective of which I deemed to be moratoriums on the technology for this use case until further improvements are made to its accuracy. The second component examined the regulations around consumer data privacy which govern the collection, use, and distribution of our face images by public and private entities. I found the most impactful regulation at the state level, and propose that all regulation should be modeled on the Illinois Biometric Information Privacy Act; this legislation grants consumers the ability to seek monetary compensation when their face images are misappropriated. The second component of this project approached the problem of age-invariant cross-demographic face recognition; that is, matching current photos against outdated ones and examining accuracy rates across demographic groups. I assembled a novel dataset of 167k face images and tested using Amazon Rekognition, a leading commercial provider of face recognition. I found that this system performs worse on minority groups, in particular, darker skinned individuals of both sexes. I concluded that further development is necessary for both public policy and technological implementation of face recognition.

Hien Pham, COS (completed IW as a junior spring 2022)
Title of Project: Community Mesh Networks: A Local Solution to the Digital Divide

In a post-Covid-19 world, U.S. states face an unprecedented opportunity to secure funding for broadband infrastructure development. While the national conversation is focused on geographic scale and speed, the long-term resilience and community aspect of broadband should not be overlooked. Community mesh networks (CMNs) are community-owned and operated computer networks that provide affordable or free Internet access to local residents. A community-owned and operated network infrastructure not only helps tackle the broadband issue, but also provides essential development in digital literacy, civic engagement, and emergency preparedness for communities it serves. My paper explores how communities have deployed mesh networks and explains why community networks are a critical piece of the solution to the digital divide puzzle and should be an essential element in states’ broadband development plans. To do so, I focused on two CMNs at different development stages in the US north-east region. I traveled to NYC and Philadelphia to visit public sites in the networks and conducted interviews with volunteer network operators to learn about their experiences. I concluded with specific actions that policymakers can take to empower CMNs and communities they serve, informed by the visits and interviews. Some proposed actions to support CMNs at various stages include making public backhaul accessible to CMNs through an application process, implementing application pipelines for funding and partnerships between public departments and CMNs, and connecting CMNs to public and community organizations that can help provide volunteers or technical consulting.

Richard Qui, ECO
Title of Project: Airbnb’s Alarming Aftermath: An Analysis of Airbnb’s Effect on San Francisco’s Rent and Housing Market

San Francisco is facing a rent and housing crisis due to a lack of housing units available for its residents. The entry of Airbnb, a hospitality technology company, into this city, has threatened to exacerbate this effect, taking away long-term rent and housing units towards short-term rentals and possibly leading to increases in rent and housing prices in one of the most expensive cities to live nationally in the United States. In particular, San Francisco Ordinance 218-14 legalized short-term rentals in the city of San Francisco, allowing many owners to rent out their primary units to tourists and short-term residents under numerous restrictions. This paper details how SF Ordinance 218-14 affected San Francisco’s rent and house prices, along with the overall trend of Airbnb’s effect on the city. Furthermore, this paper experiments with a form of “regulation” by determining whether limiting Airbnb rentals to entire room/house units or private/shared room units is a viable strategy to further reduce Airbnb’s impact while still allowing Airbnb to run its operations and benefit tourism in the city. I conclude that Airbnb has a statistical significance on increasing rent and house prices in San Francisco short term and long term, along with recommending that only allowing private/shared room units is a possible strategy to regulate Airbnb’s impact on San Francisco. This paper also summarizes the impact of a technology company like Airbnb in light of the sharing economy on society concerning rent and housing units, along with how government regulations should work directly with technology companies to enact these laws and minimize their impact on society, as the government itself may not be powerful enough today to enforce its own rules since Ordinance 218-14 had many flaws in its enactment.

Katelyn Rodrigues, COS
Title of Project: The Last Thing We Forget: Applying Natural Language Processing to Decode Memories Evoked by Modern Music

With the pinnacle of the digital era upon us, the widespread accessibility of streaming platforms has disrupted the music industry. Now, more than ever, music has the potential to fully embrace its role in uniting all listeners in experiencing an emotionally transformative force regardless of their geographical location, demographic background, language, or culture. Through the lens of online music streaming platform comment forums, my independent work analyzes the potential for modern music genres to evoke memories and shared experiences. Inspired by a research study conducted in the Princeton University Music Cognition Lab where participants recorded their music-evoked autobiographical memories (MEAMs), my research specifically explored the research goal in the context of YouTube music video comment threads. After rigorous processing and sanitizing of this data, a series of Natural Language Processing (NLP) techniques surfaced thematic elements within listeners comments on modern music. Beginning with an introductory TF-IDF analysis and then migrating to more complex techniques like PCA dimension reduction, LDA topics analysis, and visualizing cosine similarity between the YouTube comments and their corresponding song lyrics, the results yielded fascinating themes with shared memories at both the song and genre levels. While applying NLP techniques to original data consisting of unconstrained, freely available responses is relatively uncharted territory in the digital music space, the findings were definitive and provide a basis for further analysis of this multi-dimensional data that can aid music video creators in developing engaging content that can unite listeners globally.

Iroha Shirai, COS (completed IW as a junior spring 2022)
Title of Project: Analyzing Gender Biases of STEM-Related Keywords Within United States and Japanese Twitter Posts

Though there has been increasing discussion surrounding the low representation of females in STEM (Science, Technology, Engineering, and Mathematics) fields and thus more movements to increase female representation in STEM fields, the number of females in STEM fields remains relatively low. Furthermore, there continues to be a difference in these representations among different countries. In my work, I focus specifically on the United States and Japan, looking at how male- and female-related keywords are used in context with STEM-related keywords within US and Japanese Twitter posts. I collected and created my own datasets and trained Gensim’s Word2Vec models to create word embeddings. Then, cosine similarities are calculated between gender-related and STEM-related words in the word embeddings. The analyzed results showed that the calculated difference between male- and female-related average cosine similarities was greater in the US than in Japan for the 2019 – 2020 Twitter post. In contrast, this difference was greater in Japan than in the US for the 2009 – 2010 Twitter posts. The results also suggested that the difference of these calculated differences between the US and Japan was larger in 2019 – 2020 than in 2009 – 2010.

Niva Sivakumar, COS (completed IW as a junior spring 2022)
Title of Project: Understanding and Detecting Hateful Users on Twitter via Graph Theory and Machine Learning

From July to December 2020, 3.8 million tweets were removed by Twitter for falling under the category of hate speech, and 1.1 million users were flagged and suspended as hateful accounts. Recent work has typically depended on the tweets themselves, often using linguistic contexts and vocabulary. However, there’s often a need to go beyond pure textual classification, potentially integrating features about users and social groups in addition to the content of the tweets. Using a publicly available dataset of Twitter users, this paper seeks to answer three questions: (1) What insight does the distribution of hateful users in the top 100/200 influential users of the retweet graph M. Ribeiro et al provide? (2) How likely are hateful users to retweet other hateful users, and normal users to retweet other normal users? (3) Can we train a preliminary model to identify hateful users based on a limited number of numerical features, none of those having to do with the actual content of their tweets? Using the PageRank algorithm, we find that hateful users seem to be just as influential as normal ones. Using the reciprocity of vertex-induced subgraphs, hateful users are almost twice as likely to retweet each other as normal users. After training a neural network on the annotated users over 10 numerical features, we were able to reach 92.3% accuracy in classifying a user as “hateful” or “normal” without looking at their tweets at all. Our findings indicate that user-based classification of hateful speech on social media is effective and could strengthen or corroborate text-based classification.

Anna Sivaraj, COS (completed IW as a junior spring 2022)
Title of Project: Content Warnings on Social Media: An Evaluation of Instagram’s Sensitive Content Screen

As social media platforms become more ingrained in society, it is critical to mitigate the potentially negative impacts of sensitive content on users’ mental health. In order to help users avoid offensive or disturbing content, Instagram places a content warning, the Sensitive Content Screen (SCS), over posts with sensitive content. However, since its introduction in 2017, the SCS continues to appear on Instagram without significant changes to the original design and function. This independent work investigates the effectiveness of the SCS. By conducting a historical case study of health warning labels on cigarette packages, I derive lessons from health warning labels on cigarette packages and extend them to propose improvements to the SCS. Furthermore, I survey college students to gather their impressions about the current and the proposed SCS. Drawing from the case study and survey results, I suggest recommendations for how Meta Platforms, Instagram’s parent company, might improve the existing SCS and better protect vulnerable populations. In summary, I recommend that Meta Platforms (a) strengthen and modify the existing SCS, (b) add text to the warning message with the categories of sensitive content that the post may fall into, and (c) implement a larger and more comprehensive strategy aimed at protecting vulnerable users from the harmful impact of exposure to potentially sensitive content on Instagram.

Morgan Teman, COS
Title of Project: You Can’t Hack Democracy: Preventing Foreign Election Interference Using the 2017 French Presidential Election as a Case Study

The 2017 French Presidential election provides a fascinating case of attempted foreign influence through a disinformation and hacking campaign that ultimately did not discourage the public from electing the targeted candidate. No singular entity is entirely responsible for thwarting the attack; a combination of circumstances and efforts coincided to delegitimize what became known as #MacronLeaks, when 15 gigabytes of data were stolen from Emmanuel Macron’s campaign team’s servers and shared on the Internet. My independent work analyzes the effects of multiple involved parties’ defensive actions–including cyber-blurring, multi-level and encrypted communication, cybersecurity training, public transparency, and fact-checking, among others–and recommends a joint strategy to prevent such an event from recurring, including an offensive cybersecurity push by the government, secure servers for campaigns, and proactive bot investigations by social networks.

2022 Certificate Graduates Independent Work

Jeremy Bernius, SPI
Title of Project: Algorithmic (In)Justice: The Bias and Unfairness of Risk Assessment Instruments Used in Sentencing

In the age of Big Data, all areas of life rely on data analytics, algorithms, and artificial intelligence to make essential decisions and to facilitate normal operations; the criminal justice system is no different in this regard. Algorithmic risk assessment instruments (RAIs) inform nearly every stage of the criminal justice process, such as pretrial detention, corrections, probation, and are used increasingly more in sentencing. These instruments predict on offenders’ risk of recidivism after inputting their demographic and behavioral data in a statistical model trained on samples from historical populations. In the sentencing context, this risk prediction then influences a judge’s decision on the type, length, and severity of the offender’s treatment.

My independent work research adopts an interdisciplinary approach to research the racial disparities and unfairness in the design and application of algorithmic risk assessment instruments used in sentencing. Specifically, I complete a cost-benefit analysis on their implementation, explore their accuracy rates and different errors, and critique the type of data they use. First, I draw on a case study of the Commonwealth of Virginia’s proprietary tool called the Nonviolent Risk Assessment (NVRA). With over two decades of use, state and independent researchers have extensively studied Virginia’s experiment. In particular, I discover that while the NVRA can accurately predict low-risk offenders to be diverted to non-carceral sentences, it suffers from a lack of alternative sentencing options, judicial resistance to its use, and growing racial disparities in sentencing decisions. Second, I dive into the ethical and mathematical debate spurred by ProPublica’s report on the racial bias of COMPAS, a leading RAI used in several jurisdictions. After reviewing relevant literature, I argue that model error parity, the state of members across racial groups receiving equal proportions of false negatives and false positives, is a salient measure of algorithmic fairness. Lastly, I rely on case law and legal theory to illustrate that the inclusion of certain demographic characteristics in risk predictions, such as an offender’s race or socioeconomic status, likely violates constitutional protections whereas data on an offender’s behavior, such as their criminal history, is permissible. To close, I find that RAIs, as they are designed and implemented, foster racial bias and unfairness, and I outline several implications my results have for future policies on the use of algorithmic risk assessment instruments in sentencing.

Marina Beshai, COS (completed IW as a junior 2021)
Title of Project: Political Movements in the Age of Social Media: An Analysis of Twitter’s Role in the Egyptian Crisis

Governments worldwide and the media often blame social media companies for civil unrest rather than the associated individuals.  Claiming social media to be a threat against democracy, governments heavily moderate platforms and suppress activists. Drawing on more than six million #Egypt tweets published during the 2011 Egyptian crisis, this study explores the relationship between the in-person demonstrations and the online Twitter movement to observe how these components complemented and influenced one another. The rise and fall of #Mubarak (on Twitter) with the subsequent rise of #noscaf (No Supreme Council of the Armed Forces) goes to show how the grievances of protesters mirrored that of the topics trending online. They were controlling the narrative to a certain extent. And the sheer number of associated country hashtags (154 out of the 195 present-day countries were associated with #Egypt), never mind the use, imply a connected, worldwide community. Natural Language Processing (NLP) showed that English speakers were consistently more negative in their tweets than their Arabic counterparts. Not once did Arabic users express a more negative outlook than their English counterparts. Despite the large gap between the two groups. the correlation coefficient between the Arabic and English scores was 0.46, so there was a strong linear relationship between the general moods of the two parties. And a holistic analysis of tweets during the internet blackout in Egypt showed that many users around this time were increasingly concerned for the safety of protesters. On January 28, 2011, the first day of the internet blackout in Egypt, the frequency of tweets greatly increased by 82,020 tweets which comprise 1% of the total tweets including #Egypt in 2011. Topic modeling showed that on seven out of the ten days of the internet blackout in Egypt, ‘freeEgypt’ or ‘freedom’ were one of the most frequently used words that aptly describe users’ general attitude. In all, results suggested that there is a give and take relationship whereby users inside the country greatly influence the platform at the start of demonstrations, and in turn, receive support and aid from users outside of the country later on.

Justin Curl, COS
Title of Project: Please Pay Attention: Using YouTube’s ad algorithm to analyze the presentation of unwanted information

How do you get people to pay attention to or process unwanted information? Our research studies the effect of user behavior — how often a user skips ads — on the amount, length, and type of ads a user sees on YouTube. In our experiment, we reverse engineered aspects of YouTube’s ad algorithm using bots built with Selenium in Python to simulate three types of user behavior: positive towards ads, in which users never skip ads; neutral towards, in which users skip ads 50% of the time; and negative towards ads, in which users always skip ads. Overall, we found that while there does not seem to be a meaningful relationship between how often users skip ads and the number of ads users see, we did find that the users who skip ads more often are shown shorter ads that are less frequently skippable. These findings have interesting policy implications for organizations trying to convey important, though unwanted information: make viewing your messages mandatory and keep them short.

Audrey Laude, COS
Title of Project: Performance Decay in Machine Learning: Temporal Effects across Policing, Epidemic, and Financial Forecasting

As time passes, data used to train a machine learning model often becomes “outdated” which in turn hinders model performance, most often measured via prediction accuracy. This concept is observed and described tangentially with various terms, such as model decay and concept drift, among others; however, none of these completely capture the whole picture of temporal performance decay across fields. Hence, the goal of this project is to perform an exploratory analysis of performance on several datasets in different fields: the Stanford Open Policing project (annual scale), FluSight (weekly scale), and Dow Jones performance (daily scale). For the Stanford Open Policing project, logistic regression models were trained on Washington state data from 2009-2011 and tested on each year between 2012-2016. Whereas there appeared to be decay in the accuracy and AUC as years passed in the SOPP data, this seemed to be intrinsic to the data. For FluSight, models’ abilities to forecast 1, 2, 3, and 4 weeks into the future as well as the nature of the decay across this time period were analyzed, showing a negative relationship between performance and time and suggesting that an exponential model may describe this best. Lastly, two different models (a neural net and random walk model) were used to make time-series forecasts on the Dow Jones daily performance to see how error changes the further away one is from the last known stock price. Although complex modeling techniques did not outperform the base model, these still displayed performance decay consistent with the theoretical rate of error. Thus, this project provides initial insights with respect to the nature of performance decay across fields and asks questions that lay the groundwork for future research directions in the area.

Yu Jeong Lee, ECO
Title of Project: “Belong Anywhere”? A Dynamic Model of Airbnb Expansion and the Affordable Housing Crisis in New York City

Since its inception as an air mattress rental service in 2008, Airbnb has redefined our understanding of hospitality by opening residential homes to tourists. As of 2019, Airbnb offers over 7 million listings around the world, empowering tourists to “Belong Anywhere,” per the company’s motto. However, by crowding the residential housing market with seasonal vacation rentals catering to tourists, housing advocates argue the sharing economy giant is making it increasingly difficult for long-term residents to find anywhere to belong. According to the New York Housing Authority, competition between vacation rentals and residential units are driving the City’s shortage of affordable housing to a “crisis point,” with Airbnbs comprising up to 20% of the rental inventory in certain neighborhoods.

My research evaluates the effectiveness of New York City’s 2016 Multiple Dwelling Law banning the advertisement of entire-property rentals for less than thirty days on online marketplaces, including Airbnb. Specifically, I evaluate the effectiveness of the policy intervention in the context of Airbnb’s introduction of the Smart Pricing feature in 2015–a dynamic pricing algorithm that adapts listing prices to maximize host revenues. I find that though Smart Pricing increased median host revenues by $218 on average throughout the five boroughs, it didn’t pose a significant counter incentive to the rental regulation policy. In fact, through a difference-in-differences study comparing short-term rentals in New York City and neighboring Jersey City, NJ, I find that the 2016 policy decreased the share of illegal (short-term, entire-property) Airbnb listings in New York by 2.9%, or a decrease equivalent to 26% of the rental inventory growth in New York between 2011-2017. This finding suggests the advertisement ban was effective in curbing short-term rentals that crowd out much-needed housing supply, though it is unclear whether these Airbnb units were reverted to housing units or longer-term vacation rentals. Nonetheless, by incorporating Airbnb’s Smart Pricing feature to evaluate the incentives surrounding policy compliance and enforceability, I expand upon existing literature on online marketplace dynamics, platform design, and its implications for policy-making.

Yana Mihova, SPI (completed IW as a junior 2021)
Title of Project: Bill Gates, Drinking Bleach and 5G Radiation: The Role of Right-Wing Media in Spreading Coronavirus Misinformation

From initial reporting of the COVID-19 virus in early 2020, misinformation fueled the pandemic by spreading doubts about its authenticity. Due to the novelty of the pandemic, there was a gap in research on the effects of type of media consumption and its impact on believing misinformation about the pandemic. Since I was interested in investigating the ways that media consumption can impact societal perspectives on a particular topic, I decided to investigate the relationship between the spread of COVID-19 misinformation on media platforms and the outstanding consequences in society of this misinformation. I looked at this relationship by observing the type of news source an individual consumed and their likelihood to endorse COVID-19 misinformation, as measured by belief of COVID-19 conspiracy theories and distrust in public health officials. My analysis found that a statistically significant positive relationship existed between individuals who reported consuming only right-leaning media and their tendency to endorse COVID-19 misinformation. When taken into context with previous research indicating right-leaning media reported significantly more COVID-19 misinformation than moderate and left-leaning media, my findings indicate a correlation between reporting of false information and likelihood to endorse COVID-19 misinformation. This study brings to light the dangers of fact-less reporting and how it can have detrimental effects on societal outcomes.

Lindsey Moore, COS
Title of Project: Top Attrition Factors To Be Addressed in Female Engineering Interventions:  A Comparative Study of Current and Switched Out Female BSE Students at Princeton University

How do we fix the leaky pipeline of female engineering students at the college level? My research studies which attrition factors engineering interventions must address in order to lower the female engineering attrition rate at Princeton University. I conducted a survey of current BSE students and students who chose to switch out of BSE to determine consistent characteristics of persistent engineering students at Princeton University. From this survey, nine consistent characteristics of persistent female engineering students were found. The top three attrition factors related to these characteristics were students’ pre-collegiate academics, self-efficacy/self-confidence, and social support. These factors should be addressed in female engineering interventions to help lower the attrition rate. I also recommended some possible interventions that could address these factors, such as offering Calculus I to students before their first year and encouraging more female BSE students to take the EGR sequence of introductory classes.

Sara Sacks, SPI
Title of Project: Infertility is Not Just a Rich, White Woman’s Problem: Addressing Disparities in Access to Assisted Reproductive Technologies

My independent work considers disparities in infertility and the way they correspond to the broader disparities in health. The research offers a synthesis of scholarly literature about health disparities along the axes of socioeconomic status (SES), race and ethnicity, and region. I argue that the trends in health disparities pertaining to fertility care correspond to those found in the general health disparity literature around these specific dimensions. The research further analyzes the recent policies proposed in Congress to alleviate these disparities in access to ART, specifically the policy dubbed “IVF for All” and the Infertility Awareness Act.

Tara Shawa, SOC
Title of Project: Technological Gender Socialization: Examining Gender Representation and Reinforcement through the YouTube Kids Recommendation System

The focus of my independent work research lies in the intersection between childhood gender socialization, media effects, and digital technology. The research question is twofold: first, how is gender represented in content on the YouTube Kids platform? And second, to what extent does the recommendation system reinforce these gendered representations? To address these questions, I begin with a review of literature in the key fields that intersect on this topic. I start with gender theory and childhood socialization theory, then media effects and digital media, and lastly the platform through which they convene: YouTube Kids. Grounded in this cumulative knowledge, I conduct a mixed-method analysis, comprised of qualitative case studies and quantitative content analysis, to examine representations of gender and their potential exacerbation throughout various types of content. I find that gendered representations on YouTube Kids reflect normative constructions of gender, and that there is potential for the platform’s recommendation system to suggest increasingly gendered content to users.

Rachel Sylwester, COS
Title of Project: Are We Fair Yet? A Practical Analysis of Bias and Barriers to Fairness in Mortgage Lending

My independent work investigates observed bias in the U.S. mortgage lending system, where two underwriting algorithms are used to make approval decisions on 90% of mortgage applications. Recent work has exposed disparate rates of approval between racial groups by these supposedly “colorblind” underwriting algorithms.  Using real loan application data, this research explores the effectiveness of an algorithmic fairness intervention in mitigating the system’s disparate impact on minorities. I implement disparate impact remover, a preprocessing method, which edits features in the data to improve group fairness while preserving relative rank within groups. The results suggest that an algorithmic intervention, such as disparate impact remover, is effective at reducing the disparate denial rate of Black applicants without significant effects on accuracy or total cost. If implemented, such an intervention could affect millions of applicants each year due to the industry-wide use of automated underwriting systems. However, this paper finds that the effectiveness of an algorithmic intervention, such as the one explored in this paper, is substantially limited by structural and legal barriers. Thus, my research results in recommendations for technical steps to mitigate automated underwriting system bias but also regulatory and legal changes to enable and support such measures.

Henry Vecchione, COS (completed IW as a junior 2021)
Title of Project: Pan-app-ticon: What to Do About Ring’s Partnerships with Police Departments

I take issue with how Ring, the Internet-connected security camera company owned by Amazon, has pursued mutually beneficial partnerships with local police departments. The partnerships incentivize police to distribute Ring devices in their communities and grants the police access to a “Law Enforcement Portal” that enables them to select an area on a map, specify up to a 12-hour window of time, and send requests for footage from those hours to Ring owners in that area. I argue that cost and inefficiency are a significant barrier to surveillance creep and that this interface reduces that cost too much. I support this argument with two Supreme Court cases, U.S. v. Knotts (1983) and U.S. V. Jones (2012), the comparison of which illustrates how technological advancements can fundamentally change one’s expectation of privacy and the invasiveness of criminal investigation. I then examine the ACLU’s Community Control Over Police Surveillance (CCOPS) model bill and real legislation based on it, which require public approval for new surveillance technologies. I find that much of it doesn’t adequately protect against connected surveillance devices like Ring because they are not a “new technology”, rather an old technology that is harmful in how it’s used and efficiency it creates. This allows Ring to bypass approval. I then propose changes that Ring can make to their products and changes that legislatures can make to their bills to minimize harm. I suggest that Ring could use image recognition to blur faces in video sent to police, only removing the blur on order from a judge, or it could change the law enforcement interface to prohibit bulk requests or require more information. Legislatures should also alter how “new technology” is defined, requiring reapproval if a technology increases surveillance efficiency a meaningful degree even if it resembles an approved technology on the surface.

Jacqueline Xu, ORF
Title of Project: Promoting Sustainable Habits: A Network Analysis of Hard-to-Maintain Behaviors 

People’s decisions, actions, and opinions are manifestations of not only their personal values, but their social environments as well. Extensive research by sociologists and mathematicians in the past century have culminated into a network diffusion model that explains how individuals weigh the utility of available options based on their personal preferences and the perceived decisions of their local network. But while this model is a good simplification for consumer behaviors—one-off decisions that have immediate consequences—they do not adequately represent the diffusion of hard-to-maintain (HTM) behaviors, which include healthy habits like exercising regularly and environmental behaviors like becoming vegetarian. Unlike consumer behaviors, these decisions require persistent cues for upkeep and statistically exhibit slower rates of adoption. As such, they cannot be directly understood through existing diffusion models. This paper modifies previous studies to analyze the spread of habitual behaviors by incorporating a parameter for behavioral loss. Compared to simple behavioral diffusion, the results indicate that final adoption levels are generally similar while adoption rates are much slower. These findings suggest that low adoption levels at one point in time do not necessarily predict low levels at a future date.

Melody Zheng, COS (completed IW as a junior 2021)
Title of Project: Analyzing the Digital Divide: A Quantitative and Qualitative Study of Six United States Cities

As society grows increasingly dependent on information and communication technologies (ICTs), it becomes crucial to address the digital divide still present in many communities. In my work, I focused on identifying a digital access policy that the city of Oakland, California should adopt. To do so, I compared the initiatives of three cities in the same population range that have been “successful,” such that the rate of Internet and computer access for historically underserved populations has increased from 2015 to 2019, with the initiatives of three cities that have not been successful. Using data from the U.S. Census Bureau’s American Community Survey, I tracked the rates of Internet and computer access for five different demographics over the five year period and chose the three cities with the best average rate and the three with the lowest average rate. I then analyzed qualitative data to identify whether the selected cities focused on Internet access, computer access, and/or digital literacy training in their digital access initiatives, although two of the three unsuccessful cities had little to no such information publicly available. When comparing the three successful cities to the remaining unsuccessful city, Oakland, I found that while the four cities generally addressed all three aspects, the successful cities had a greater focus on community resources. Therefore, I argue that Oakland should invest more resources on digital literacy programs and publicly available ICTs, especially since they were not able to offer entirely free home Internet plans and digital devices. Community resources and technology classes would be more accessible to a greater number of households and hopefully lead to improved financial situations as well.

Previous Certificate Graduates’ Independent Work