T&S Certificate – IT Track

Certificate Student Independent Work

2023 Certificate Student Independent Work

Rohan Jinturkar, COS (completed IW as a junior 2022)
Title of Project: Investigating Racial Bias Trends in the Text of US Legal Opinions

There are many instances of racially biased outcomes in the American legal system. However, it is unclear if such bias also exists in the text of judge opinions, and if it varies across time periods and regions. We approximate GloVe word embeddings for legal opinions at the federal and state level from 1860 to 2019. We find evidence of racial bias across nearly all regions and time periods, as traditionally Black names are more closely associated with negative/unpleasant terms whereas traditionally White names are more closely associated with positive/pleasant terms. We do not find evidence that older opinions exhibit more bias, or that opinions from Northeastern states show greater change in racial bias over time compared to Southern states. These results counter the principle of impartiality in legal settings and demonstrate the need for further research into institutionalized racial bias. Lastly, we survey approaches for reducing bias across the legal system.

Colton Loftus, COS (completed IW as a junior 2022)
Title of Project: Analyzing Experiences with Speech Controlled Accessibility Software and Developing a Solution for the Linux Desktop

For individuals with disabilities affecting the use of their hands, typing and using a mouse can be not only inconvenient, but also painful. This problem is especially prevalent within the software ecosystem of the open source operating system, Linux. While Windows and MacOS both have proprietary disability software for controlling the computer through voice, Linux users do not have access to these same proprietary solutions. In my independent work, I developed a voice controlled accessibility program that can help solve this issue. My program can be used for a wide array of actions across the Linux desktop. It can control windows, press keys, dictate text, and much more. It can also be customized or run scripts from the user to perform new behavior. While developing my program, I also wanted to better understand how to design and implement policies pertaining to accessibility software. To do this I held a series of software demos with my program.  By the end of my research, I developed a series of key takeaways: 1) While accessibility designers don’t always have the resources to train their own models, they can nonetheless design applications with modular and customizable behavior. In my application, users can interact with the machine learning backend, switch models, and customize command names. All of these choices push back on the idea that the voice recognition backend should be a black box for users. (2) Graphical user interface (GUI) programs should limit mouse usage and prefer keyboard shortcuts when possible. Throughout my software demo, it was particularly difficult for users to mimic commands like dragging or dropping through just voice. However, keyboard shortcuts were quick and easy for both voice and hands to perform. (3)  Workspace designers need to take into consideration those with alternative input methods. Throughout the development process, it was difficult to find public places quiet enough to use voice controlled software while not disturbing others. In addition to my open source code, these conclusions from my policy research will help future designers  create more accessible workspaces, communities, and software applications.

Hien Pham, COS (completed IW as a junior 2022)
Title of Project: Community Mesh Networks: A Local Solution to the Digital Divide

In a post-Covid-19 world, U.S. states face an unprecedented opportunity to secure funding for broadband infrastructure development. While the national conversation is focused on geographic scale and speed, the long-term resilience and community aspect of broadband should not be overlooked. Community mesh networks (CMNs) are community-owned and operated computer networks that provide affordable or free Internet access to local residents. A community-owned and operated network infrastructure not only helps tackle the broadband issue, but also provides essential development in digital literacy, civic engagement, and emergency preparedness for communities it serves. My paper explores how communities have deployed mesh networks and explains why community networks are a critical piece of the solution to the digital divide puzzle and should be an essential element in states’ broadband development plans. To do so, I focused on two CMNs at different development stages in the US north-east region. I traveled to NYC and Philadelphia to visit public sites in the networks and conducted interviews with volunteer network operators to learn about their experiences. I concluded with specific actions that policymakers can take to empower CMNs and communities they serve, informed by the visits and interviews. Some proposed actions to support CMNs at various stages include making public backhaul accessible to CMNs through an application process, implementing application pipelines for funding and partnerships between public departments and CMNs, and connecting CMNs to public and community organizations that can help provide volunteers or technical consulting.

Iroha Shirai, COS (completed IW as a junior 2022)
Title of Project: Analyzing Gender Biases of STEM-Related Keywords Within United States and Japanese Twitter Posts

Though there has been increasing discussion surrounding the low representation of females in STEM (Science, Technology, Engineering, and Mathematics) fields and thus more movements to increase female representation in STEM fields, the number of females in STEM fields remains relatively low. Furthermore, there continues to be a difference in these representations among different countries. In my work, I focus specifically on the United States and Japan, looking at how male- and female-related keywords are used in context with STEM-related keywords within US and Japanese Twitter posts. I collected and created my own datasets and trained Gensim’s Word2Vec models to create word embeddings. Then, cosine similarities are calculated between gender-related and STEM-related words in the word embeddings. The analyzed results showed that the calculated difference between male- and female-related average cosine similarities was greater in the US than in Japan for the 2019 – 2020 Twitter post. In contrast, this difference was greater in Japan than in the US for the 2009 – 2010 Twitter posts. The results also suggested that the difference of these calculated differences between the US and Japan was larger in 2019 – 2020 than in 2009 – 2010.

Niva Sivakumar, COS (completed IW as a junior 2022)
Title of Project: Understanding and Detecting Hateful Users on Twitter via Graph Theory and Machine Learning

From July to December 2020, 3.8 million tweets were removed by Twitter for falling under the category of hate speech, and 1.1 million users were flagged and suspended as hateful accounts. Recent work has typically depended on the tweets themselves, often using linguistic contexts and vocabulary. However, there’s often a need to go beyond pure textual classification, potentially integrating features about users and social groups in addition to the content of the tweets. Using a publicly available dataset of Twitter users, this paper seeks to answer three questions: (1) What insight does the distribution of hateful users in the top 100/200 influential users of the retweet graph M. Ribeiro et al provide? (2) How likely are hateful users to retweet other hateful users, and normal users to retweet other normal users? (3) Can we train a preliminary model to identify hateful users based on a limited number of numerical features, none of those having to do with the actual content of their tweets? Using the PageRank algorithm, we find that hateful users seem to be just as influential as normal ones. Using the reciprocity of vertex-induced subgraphs, hateful users are almost twice as likely to retweet each other as normal users. After training a neural network on the annotated users over 10 numerical features, we were able to reach 92.3% accuracy in classifying a user as “hateful” or “normal” without looking at their tweets at all. Our findings indicate that user-based classification of hateful speech on social media is effective and could strengthen or corroborate text-based classification.

Anna Sivaraj, COS (completed IW as a junior spring 2022)
Title of Project: Content Warnings on Social Media: An Evaluation of Instagram’s Sensitive Content Screen

As social media platforms become more ingrained in society, it is critical to mitigate the potentially negative impacts of sensitive content on users’ mental health. In order to help users avoid offensive or disturbing content, Instagram places a content warning, the Sensitive Content Screen (SCS), over posts with sensitive content. However, since its introduction in 2017, the SCS continues to appear on Instagram without significant changes to the original design and function. This independent work investigates the effectiveness of the SCS. By conducting a historical case study of health warning labels on cigarette packages, I derive lessons from health warning labels on cigarette packages and extend them to propose improvements to the SCS. Furthermore, I survey college students to gather their impressions about the current and the proposed SCS. Drawing from the case study and survey results, I suggest recommendations for how Meta Platforms, Instagram’s parent company, might improve the existing SCS and better protect vulnerable populations. In summary, I recommend that Meta Platforms (a) strengthen and modify the existing SCS, (b) add text to the warning message with the categories of sensitive content that the post may fall into, and (c) implement a larger and more comprehensive strategy aimed at protecting vulnerable users from the harmful impact of exposure to potentially sensitive content on Instagram.

2022 Certificate Student Independent Work

Jeremy Bernius, SPI
Title of Project: Algorithmic (In)Justice: The Bias and Unfairness of Risk Assessment Instruments Used in Sentencing

In the age of Big Data, all areas of life rely on data analytics, algorithms, and artificial intelligence to make essential decisions and to facilitate normal operations; the criminal justice system is no different in this regard. Algorithmic risk assessment instruments (RAIs) inform nearly every stage of the criminal justice process, such as pretrial detention, corrections, probation, and are used increasingly more in sentencing. These instruments predict on offenders’ risk of recidivism after inputting their demographic and behavioral data in a statistical model trained on samples from historical populations. In the sentencing context, this risk prediction then influences a judge’s decision on the type, length, and severity of the offender’s treatment.

My independent work research adopts an interdisciplinary approach to research the racial disparities and unfairness in the design and application of algorithmic risk assessment instruments used in sentencing. Specifically, I complete a cost-benefit analysis on their implementation, explore their accuracy rates and different errors, and critique the type of data they use. First, I draw on a case study of the Commonwealth of Virginia’s proprietary tool called the Nonviolent Risk Assessment (NVRA). With over two decades of use, state and independent researchers have extensively studied Virginia’s experiment. In particular, I discover that while the NVRA can accurately predict low-risk offenders to be diverted to non-carceral sentences, it suffers from a lack of alternative sentencing options, judicial resistance to its use, and growing racial disparities in sentencing decisions. Second, I dive into the ethical and mathematical debate spurred by ProPublica’s report on the racial bias of COMPAS, a leading RAI used in several jurisdictions. After reviewing relevant literature, I argue that model error parity, the state of members across racial groups receiving equal proportions of false negatives and false positives, is a salient measure of algorithmic fairness. Lastly, I rely on case law and legal theory to illustrate that the inclusion of certain demographic characteristics in risk predictions, such as an offender’s race or socioeconomic status, likely violates constitutional protections whereas data on an offender’s behavior, such as their criminal history, is permissible. To close, I find that RAIs, as they are designed and implemented, foster racial bias and unfairness, and I outline several implications my results have for future policies on the use of algorithmic risk assessment instruments in sentencing.

Marina Beshai, COS (completed IW as a junior 2021)
Title of Project: Political Movements in the Age of Social Media: An Analysis of Twitter’s Role in the Egyptian Crisis

Governments worldwide and the media often blame social media companies for civil unrest rather than the associated individuals.  Claiming social media to be a threat against democracy, governments heavily moderate platforms and suppress activists. Drawing on more than six million #Egypt tweets published during the 2011 Egyptian crisis, this study explores the relationship between the in-person demonstrations and the online Twitter movement to observe how these components complemented and influenced one another. The rise and fall of #Mubarak (on Twitter) with the subsequent rise of #noscaf (No Supreme Council of the Armed Forces) goes to show how the grievances of protesters mirrored that of the topics trending online. They were controlling the narrative to a certain extent. And the sheer number of associated country hashtags (154 out of the 195 present-day countries were associated with #Egypt), never mind the use, imply a connected, worldwide community. Natural Language Processing (NLP) showed that English speakers were consistently more negative in their tweets than their Arabic counterparts. Not once did Arabic users express a more negative outlook than their English counterparts. Despite the large gap between the two groups. the correlation coefficient between the Arabic and English scores was 0.46, so there was a strong linear relationship between the general moods of the two parties. And a holistic analysis of tweets during the internet blackout in Egypt showed that many users around this time were increasingly concerned for the safety of protesters. On January 28, 2011, the first day of the internet blackout in Egypt, the frequency of tweets greatly increased by 82,020 tweets which comprise 1% of the total tweets including #Egypt in 2011. Topic modeling showed that on seven out of the ten days of the internet blackout in Egypt, ‘freeEgypt’ or ‘freedom’ were one of the most frequently used words that aptly describe users’ general attitude. In all, results suggested that there is a give and take relationship whereby users inside the country greatly influence the platform at the start of demonstrations, and in turn, receive support and aid from users outside of the country later on.

Justin Curl, COS
Title of Project: Please Pay Attention: Using YouTube’s ad algorithm to analyze the presentation of unwanted information

How do you get people to pay attention to or process unwanted information? Our research studies the effect of user behavior — how often a user skips ads — on the amount, length, and type of ads a user sees on YouTube. In our experiment, we reverse engineered aspects of YouTube’s ad algorithm using bots built with Selenium in Python to simulate three types of user behavior: positive towards ads, in which users never skip ads; neutral towards, in which users skip ads 50% of the time; and negative towards ads, in which users always skip ads. Overall, we found that while there does not seem to be a meaningful relationship between how often users skip ads and the number of ads users see, we did find that the users who skip ads more often are shown shorter ads that are less frequently skippable. These findings have interesting policy implications for organizations trying to convey important, though unwanted information: make viewing your messages mandatory and keep them short.

Audrey Laude, COS
Title of Project: Performance Decay in Machine Learning: Temporal Effects across Policing, Epidemic, and Financial Forecasting

As time passes, data used to train a machine learning model often becomes “outdated” which in turn hinders model performance, most often measured via prediction accuracy. This concept is observed and described tangentially with various terms, such as model decay and concept drift, among others; however, none of these completely capture the whole picture of temporal performance decay across fields. Hence, the goal of this project is to perform an exploratory analysis of performance on several datasets in different fields: the Stanford Open Policing project (annual scale), FluSight (weekly scale), and Dow Jones performance (daily scale). For the Stanford Open Policing project, logistic regression models were trained on Washington state data from 2009-2011 and tested on each year between 2012-2016. Whereas there appeared to be decay in the accuracy and AUC as years passed in the SOPP data, this seemed to be intrinsic to the data. For FluSight, models’ abilities to forecast 1, 2, 3, and 4 weeks into the future as well as the nature of the decay across this time period were analyzed, showing a negative relationship between performance and time and suggesting that an exponential model may describe this best. Lastly, two different models (a neural net and random walk model) were used to make time-series forecasts on the Dow Jones daily performance to see how error changes the further away one is from the last known stock price. Although complex modeling techniques did not outperform the base model, these still displayed performance decay consistent with the theoretical rate of error. Thus, this project provides initial insights with respect to the nature of performance decay across fields and asks questions that lay the groundwork for future research directions in the area.

Yu Jeong Lee, ECO
Title of Project: “Belong Anywhere”? A Dynamic Model of Airbnb Expansion and the Affordable Housing Crisis in New York City

Since its inception as an air mattress rental service in 2008, Airbnb has redefined our understanding of hospitality by opening residential homes to tourists. As of 2019, Airbnb offers over 7 million listings around the world, empowering tourists to “Belong Anywhere,” per the company’s motto. However, by crowding the residential housing market with seasonal vacation rentals catering to tourists, housing advocates argue the sharing economy giant is making it increasingly difficult for long-term residents to find anywhere to belong. According to the New York Housing Authority, competition between vacation rentals and residential units are driving the City’s shortage of affordable housing to a “crisis point,” with Airbnbs comprising up to 20% of the rental inventory in certain neighborhoods.

My research evaluates the effectiveness of New York City’s 2016 Multiple Dwelling Law banning the advertisement of entire-property rentals for less than thirty days on online marketplaces, including Airbnb. Specifically, I evaluate the effectiveness of the policy intervention in the context of Airbnb’s introduction of the Smart Pricing feature in 2015–a dynamic pricing algorithm that adapts listing prices to maximize host revenues. I find that though Smart Pricing increased median host revenues by $218 on average throughout the five boroughs, it didn’t pose a significant counter incentive to the rental regulation policy. In fact, through a difference-in-differences study comparing short-term rentals in New York City and neighboring Jersey City, NJ, I find that the 2016 policy decreased the share of illegal (short-term, entire-property) Airbnb listings in New York by 2.9%, or a decrease equivalent to 26% of the rental inventory growth in New York between 2011-2017. This finding suggests the advertisement ban was effective in curbing short-term rentals that crowd out much-needed housing supply, though it is unclear whether these Airbnb units were reverted to housing units or longer-term vacation rentals. Nonetheless, by incorporating Airbnb’s Smart Pricing feature to evaluate the incentives surrounding policy compliance and enforceability, I expand upon existing literature on online marketplace dynamics, platform design, and its implications for policy-making.

Yana Mihova, SPI (completed IW as a junior 2021)
Title of Project: Bill Gates, Drinking Bleach and 5G Radiation: The Role of Right-Wing Media in Spreading Coronavirus Misinformation

From initial reporting of the COVID-19 virus in early 2020, misinformation fueled the pandemic by spreading doubts about its authenticity. Due to the novelty of the pandemic, there was a gap in research on the effects of type of media consumption and its impact on believing misinformation about the pandemic. Since I was interested in investigating the ways that media consumption can impact societal perspectives on a particular topic, I decided to investigate the relationship between the spread of COVID-19 misinformation on media platforms and the outstanding consequences in society of this misinformation. I looked at this relationship by observing the type of news source an individual consumed and their likelihood to endorse COVID-19 misinformation, as measured by belief of COVID-19 conspiracy theories and distrust in public health officials. My analysis found that a statistically significant positive relationship existed between individuals who reported consuming only right-leaning media and their tendency to endorse COVID-19 misinformation. When taken into context with previous research indicating right-leaning media reported significantly more COVID-19 misinformation than moderate and left-leaning media, my findings indicate a correlation between reporting of false information and likelihood to endorse COVID-19 misinformation. This study brings to light the dangers of fact-less reporting and how it can have detrimental effects on societal outcomes.

Lindsey Moore, COS
Title of Project: Top Attrition Factors To Be Addressed in Female Engineering Interventions:  A Comparative Study of Current and Switched Out Female BSE Students at Princeton University

How do we fix the leaky pipeline of female engineering students at the college level? My research studies which attrition factors engineering interventions must address in order to lower the female engineering attrition rate at Princeton University. I conducted a survey of current BSE students and students who chose to switch out of BSE to determine consistent characteristics of persistent engineering students at Princeton University. From this survey, nine consistent characteristics of persistent female engineering students were found. The top three attrition factors related to these characteristics were students’ pre-collegiate academics, self-efficacy/self-confidence, and social support. These factors should be addressed in female engineering interventions to help lower the attrition rate. I also recommended some possible interventions that could address these factors, such as offering Calculus I to students before their first year and encouraging more female BSE students to take the EGR sequence of introductory classes.

Sara Sacks, SPI
Title of Project: Infertility is Not Just a Rich, White Woman’s Problem: Addressing Disparities in Access to Assisted Reproductive Technologies

My independent work considers disparities in infertility and the way they correspond to the broader disparities in health. The research offers a synthesis of scholarly literature about health disparities along the axes of socioeconomic status (SES), race and ethnicity, and region. I argue that the trends in health disparities pertaining to fertility care correspond to those found in the general health disparity literature around these specific dimensions. The thesis further analyzes the recent policies proposed in Congress to alleviate these disparities in access to ART, specifically the policy dubbed “IVF for All” and the Infertility Awareness Act.

Tara Shawa, SOC
Title of Project: Technological Gender Socialization: Examining Gender Representation and Reinforcement through the YouTube Kids Recommendation System

The focus of my independent work research lies in the intersection between childhood gender socialization, media effects, and digital technology. The research question is twofold: first, how is gender represented in content on the YouTube Kids platform? And second, to what extent does the recommendation system reinforce these gendered representations? To address these questions, I begin with a review of literature in the key fields that intersect on this topic. I start with gender theory and childhood socialization theory, then media effects and digital media, and lastly the platform through which they convene: YouTube Kids. Grounded in this cumulative knowledge, I conduct a mixed-method analysis, comprised of qualitative case studies and quantitative content analysis, to examine representations of gender and their potential exacerbation throughout various types of content. I find that gendered representations on YouTube Kids reflect normative constructions of gender, and that there is potential for the platform’s recommendation system to suggest increasingly gendered content to users.

Rachel Sylwester, COS
Title of Project: Are We Fair Yet? A Practical Analysis of Bias and Barriers to Fairness in Mortgage Lending

This thesis investigates observed bias in the U.S. mortgage lending system, where two underwriting algorithms are used to make approval decisions on 90% of mortgage applications. Recent work has exposed disparate rates of approval between racial groups by these supposedly “colorblind” underwriting algorithms.  Using real loan application data, this research explores the effectiveness of an algorithmic fairness intervention in mitigating the system’s disparate impact on minorities. I implement disparate impact remover, a preprocessing method, which edits features in the data to improve group fairness while preserving relative rank within groups. The results suggest that an algorithmic intervention, such as disparate impact remover, is effective at reducing the disparate denial rate of Black applicants without significant effects on accuracy or total cost. If implemented, such an intervention could affect millions of applicants each year due to the industry-wide use of automated underwriting systems. However, this paper finds that the effectiveness of an algorithmic intervention, such as the one explored in this paper, is substantially limited by structural and legal barriers. Thus, my research results in recommendations for technical steps to mitigate automated underwriting system bias but also regulatory and legal changes to enable and support such measures.

Henry Vecchione, COS (completed IW as a junior 2021)
Title of Project: Pan-app-ticon: What to Do About Ring’s Partnerships with Police Departments

I take issue with how Ring, the Internet-connected security camera company owned by Amazon, has pursued mutually beneficial partnerships with local police departments. The partnerships incentivize police to distribute Ring devices in their communities and grants the police access to a “Law Enforcement Portal” that enables them to select an area on a map, specify up to a 12-hour window of time, and send requests for footage from those hours to Ring owners in that area. I argue that cost and inefficiency are a significant barrier to surveillance creep and that this interface reduces that cost too much. I support this argument with two Supreme Court cases, U.S. v. Knotts (1983) and U.S. V. Jones (2012), the comparison of which illustrates how technological advancements can fundamentally change one’s expectation of privacy and the invasiveness of criminal investigation. I then examine the ACLU’s Community Control Over Police Surveillance (CCOPS) model bill and real legislation based on it, which require public approval for new surveillance technologies. I find that much of it doesn’t adequately protect against connected surveillance devices like Ring because they are not a “new technology”, rather an old technology that is harmful in how it’s used and efficiency it creates. This allows Ring to bypass approval. I then propose changes that Ring can make to their products and changes that legislatures can make to their bills to minimize harm. I suggest that Ring could use image recognition to blur faces in video sent to police, only removing the blur on order from a judge, or it could change the law enforcement interface to prohibit bulk requests or require more information. Legislatures should also alter how “new technology” is defined, requiring reapproval if a technology increases surveillance efficiency a meaningful degree even if it resembles an approved technology on the surface.

Jacqueline Xu, ORF
Title of Project: Promoting Sustainable Habits: A Network Analysis of Hard-to-Maintain Behaviors 

People’s decisions, actions, and opinions are manifestations of not only their personal values, but their social environments as well. Extensive research by sociologists and mathematicians in the past century have culminated into a network diffusion model that explains how individuals weigh the utility of available options based on their personal preferences and the perceived decisions of their local network. But while this model is a good simplification for consumer behaviors—one-off decisions that have immediate consequences—they do not adequately represent the diffusion of hard-to-maintain (HTM) behaviors, which include healthy habits like exercising regularly and environmental behaviors like becoming vegetarian. Unlike consumer behaviors, these decisions require persistent cues for upkeep and statistically exhibit slower rates of adoption. As such, they cannot be directly understood through existing diffusion models. This paper modifies previous studies to analyze the spread of habitual behaviors by incorporating a parameter for behavioral loss. Compared to simple behavioral diffusion, the results indicate that final adoption levels are generally similar while adoption rates are much slower. These findings suggest that low adoption levels at one point in time do not necessarily predict low levels at a future date.

Melody Zheng, COS (completed IW as a junior 2021)
Title of Project: Analyzing the Digital Divide: A Quantitative and Qualitative Study of Six United States Cities

As society grows increasingly dependent on information and communication technologies (ICTs), it becomes crucial to address the digital divide still present in many communities. In my work, I focused on identifying a digital access policy that the city of Oakland, California should adopt. To do so, I compared the initiatives of three cities in the same population range that have been “successful,” such that the rate of Internet and computer access for historically underserved populations has increased from 2015 to 2019, with the initiatives of three cities that have not been successful. Using data from the U.S. Census Bureau’s American Community Survey, I tracked the rates of Internet and computer access for five different demographics over the five year period and chose the three cities with the best average rate and the three with the lowest average rate. I then analyzed qualitative data to identify whether the selected cities focused on Internet access, computer access, and/or digital literacy training in their digital access initiatives, although two of the three unsuccessful cities had little to no such information publicly available. When comparing the three successful cities to the remaining unsuccessful city, Oakland, I found that while the four cities generally addressed all three aspects, the successful cities had a greater focus on community resources. Therefore, I argue that Oakland should invest more resources on digital literacy programs and publicly available ICTs, especially since they were not able to offer entirely free home Internet plans and digital devices. Community resources and technology classes would be more accessible to a greater number of households and hopefully lead to improved financial situations as well.

2021 Certificate Students Independent Work

Yaw Asante, COS
Title of Project: Evaluating and Contextualizing Network-Based Analysis of Drug Response in Cancer Dependency Genes

Computationally assessing which genes cancer needs to propagate itself is a much-researched topic at the juncture of computational and medical science. To contribute to this area, I sought to build a software tool capable of assessing cancer dependency by extending from the foundation of a tool called NetMix, which solves a related problem. Additionally, I sought to examine the broader context in which tools like these may be applied in clinical medicine. For my first contribution, I designed a NetMix-based software process called CADEGA and compared its performance to that of a peer algorithm called NETPHIX. This work demonstrated CADEGA’s limited performance overall, though with a potential for finding functional correlations which differed from those of NETPHIX. For my second contribution, I conducted an overview of the real-word context in which methods like CADEGA and NETPHIX would apply to the field of data-enabled healthcare. This analysis demonstrated the expansive efforts being made or planned in technical infrastructure as well as the blindspots present in existing laws surrounding genetic data and in the equitable development of these resources for rural facilities.

Bevin Benson, COS
Title of Project: Restricted Content: A Technical Guide to Internet Censorship in the Age of Social Media

The growth of social media platforms poses new challenges to governments seeking to control information online. Historically, governments have relied on a toolkit of technical methods to censor content on the web, such as IP blocking, DNS tampering, and deep-packet filtering. These methods are ineffective against blocking specific content on social media platforms. As a result, many governments have turned to sending “content removal requests” to these platforms as a means of restricting material that it considers objectionable. My independent work outlines the technical methods of Internet censorship, focusing on how governments can block content using IP/port blocking, DNS tampering, and deep-packet filtering, and examines the relationship between governments and three major Internet platforms – Facebook, Twitter, and Google – vis-á-vis content removal requests. It conducts an exploratory analysis of the transparency datasets in the transparency reports released by the three platforms using Python to uncover what data the platforms release to the public. It finds that the platforms, particularly Facebook, lack transparency about the requests they receive and their guidelines for content removal. Twitter releases the greatest amount of data on content removal requests, including links to the content under question, yet this data is difficult to access and poorly organized. Additionally, it examines trends in the number of content removal requests provided by a subset of 13 countries based on geographic diversity, size, Internet freedom, and the number of content removal requests submitted. Specifically, it finds that there is no significant correlation between internet freedom and the number of content removal requests, but that Turkey and Russia send the greatest number of content removal requests to Internet platforms.

Justin Chang, COS
Title of Project: The Role of International Consensus in Cyber Attribution

With so many people relying on the critical infrastructures and data housed in cyberspace, cyber attacks have the potential to harm extremely large numbers of civilians. Yet international regulations on these attacks remain largely nonexistent, as there exists no binding agreement on what states can or cannot do in cyberspace. My independent work explores the role that international consensus can have in cyber attribution, a necessity to maintaining a secure cyberspace. By looking at examples of past attacks, I present the inherent limitations of technical attribution tools and techniques, arguing that international collaboration can improve the time and efficiency involved in attribution. In response to the difficulties in achieving such a consensus, I argue for the creation of an international body tasked with attributing cyber attacks, as such a body can still improve the process of cyber attribution, without the support of all major cyber actors.

Edward Elson, CLA
Title of Project: The Idea of Progress in Antiquity

My independent work investigates whether or not (and how) an idea of technological progress might have been understood by Mediterranean societies in early and late antiquity. Some scholars have posted that an idea of progress simply did not exist in the ancient world, that the institutional capabilities of Ancient Greek and early Roman societies were perceived by their people to be static, not to develop nor accelerate over time. My independent work refutes that argument, drawing from the “Golden Age” theory of Hesiod, the lesser known personal accounts of Xenophanes (whose allusions to a collective cultural and intellectual evolution quite clearly demonstrate that an idea of progress was – at the very least – in his own mind), the philosophical works of Plato, tragic excerpts from Sophocles, and finally, a poem of human history provided by the Roman Lucretius. My analysis consists of a series of close reading of the prior texts, which is supplemented by the existent but scant philological scholarship on the subject, and ultimately makes clear that an idea of progress certainly did exist in antique thought, but not in the way that it might exist today. Technological and institutional achievement were thought, I argue, not to better nor worsen the overall conditions of the ancient human experience, but to complicate it exponentially into the future. With added achievement came, according to the ancients, an added depth of problems, ambitions, interest, and values, many of which were thought to conflict with each other. I draw these ideas the fragments of Xenophanes and demonstrate how they echoed through Plato’s Laws, Sophocles’ “Ode to Man,” and Lucretius’ On Nature.

Isabella Faccone, ORFE
Title of Project: Tools to Understand 2016 Voter Influence Tactics in Comparison with the 2020 Election: Applications of Network Topology, Information Cascades and Rumor Recurrence

Social medias have a fundamental impact on how society receives and exchanges information around political events, specifically elections. These social media networks amplified misinformation during both the 2016 election and the 2020 election despite new control mechanisms. This research determines the key frameworks for understanding the network climate that enabled such amplification and misinformation, relying on veracity, amplification and recurrence to draw distinctions between the 2016 and 2020 elections. In order to evaluate these criteria, I constructed a 2016 Twitter dataset based on previous research, and I was able to find a 2020 Election dataset for Twitter that was updated weekly with keywords, trends, politicians, and new trackers for the duration of the 2020 election period. These datasets are what I utilized to evaluate cascades, the main trends on the network and sheer mass of activity around politicians and rumors. Critically, this research demonstrates that both the role of individuals in information cascades and the features of the rumors that propagate pervasively have a large impact on the likelihood that a rumor will recur in a given network. This research shows that false rumors propagate faster and recur more often than true rumors in both the 2016 and 2020 elections, but draws a distinction between unilateral and interactive information dissemination models to demonstrate the differing effect that amplifiers have on propagation and recurrence. For rumors that disseminate via a unidirectional traditional news outlet shared via links on Twitter, the effect of a high number of verified users is limited. However, for rumors that propagate as retweets and quoted replies, which have a multi-directional and interactive model, the effect of a high number of verified users participating in the cascade was very pronounced. Thus the properties of the rumor, its veracity and the specific subset of the population through which the rumor passes each has an effect on that rumor’s overall impact and exposure to a given network of users. This research’s findings are critical to the future of social medias as they grapple with the persistence of misinformation amidst a highly volatile and nuanced digital politics arena.

Kevin Feng, COS
Title of Project: Lowering the Barrier for Web Advertisement Research at Scale

Web advertisements are essential to the day-to-day operations on the internet by providing a key channel of revenue to websites that offer content at little to no cost. However, they are also common sources of deception, scams, and privacy violations. Given their significance, ads are of interest to many different groups of experts, including web researchers, communications scholars, and regulators, but their fleeting nature makes them difficult to study systematically and at scale. This independent work presents AdOculos, a technical system comprising a search interface powered by automated visual analysis tools and a continuously updated, large-scale archive of ads crawled from thousands of popular websites. By using the system to uncover novel research questions, dimensions of analysis, and policy recommendations, I demonstrate how AdOculos and its underlying tools enable expanded possibilities in ad research.

Grace Hong, ECO
Title of Project: The Effect of Google Fiber’s Entry on Student Educational Outcomes in Kansas City

In 2010, the private tech company Google disrupted the broadband market by partnering with individual cities to offer high-speed fiber Internet through Google Fiber. In my research, I study the impact of Google Fiber’s installation in Kansas City in 2011 with student educational outcomes in Missouri’s public schools through two studies: an intra-city study of Kansas City and inter-city study between Kansas City and St. Louis in pre- and post-fiber periods. The intra-city study used a fixed effects regression model and highlighted mixed effects of Fiber on education. However, the inter-city study, using a difference-in-differences regression, showed that post-Fiber Kansas City experienced less percentages of students scoring in the worst category (Below Basic) and greater percentages of students scoring in higher categories (Proficient). As a result, this study illustrates that Fiber’s entry may be correlated with higher test performances, especially for those who were performing in the lowest categories to start with, and it provides a stronger case for continuing to close to the digital divide across the United States.

Gabrielle Jabre, POL
Title of Project: Social Media as a Narrative Battlefield: An Investigation into the 2019 Lebanese Protests

In non-democracies, civil society and the regime battle to dominate the narrative on social media. During the 2019 Lebanese Protests, social media became a place for narrative warfare between civil society and the regime. My research questions: Did social media played a sectarian-reducing or sectarian-enhancing role, and what were its effects on mass mobilization? My research design is twofold: qualitative interviews and a quantitative analysis on a portion of Twitter data. I conducted 13 zoom interviews with: social media activists, physical activists, journalists, politicians and independent media center directors. The interviewees were asked questions on both the sectarian-reducing and sectarian-enhancing role of social media. Furthermore, during these interviews, seven broad categories were discussed: social media activism, the importance of social media for the protestors, social media as a narrative battlefield, online information corruption and its impact, whether the regime fought back online, social media’s overall effect on the protests, and freedom of speech. Overall, participants argued that social media played a sectarian-reducing role, disseminated a civic narrative among a global Lebanese network, and facilitated collective and connective action. Furthermore, all interviewees argued that online information corruption was propagated along sectarian narratives that discredited the protests, but it was not a compelling enough reason for the deterrence of mass mobilization. Instead, most interviewees argued that the reduction in collective action was due to violence, COVID-19 national lockdown and economic barriers. To further investigate the relationship between sectarianism and online information corruption, I substantiated the interview results with a quantitative analysis. My logistic regression model indicated a statistically significant positive relationship between sectarianism and a false narrative online through the metric of a p-value. Therefore, the data analysis corroborated the interviewees’ insights that false online narratives were sectarian. My results highlight that a civic narrative dominated social media and played a constructive role in greater collective and connective Lebanese action against the regime. One the other hand, the results also show that online information corruption was used as a sectarian narrative tool to discredit the protests, but this was not enough to deter collective action. Thus, social media’s democratic nature benefitted a civic narrative, but also served regime manipulation.

Watson Jia, COS
Title of Project: Consistency and Distributed Gateways in IoT Environments

Distributed systems have become ubiquitous in our modern computing world, with applications ranging from telecommunications to computer networks. The Internet of Things (IoT) has integrated technology with many physical objects in our everyday lives, with applications ranging from smart home technologies to medical applications. My independent work attempts to combine two increasingly important fields in modern computing – distributed systems and the IoT – and investigates applications of distributed systems in IoT environments by leveraging multiple IoT gateways as a distributed system. This project explores fault tolerance and data consistency, which have large implications for reliability and scalability in applications that rely on both distributed systems and the IoT. This could especially impact industrial systems and infrastructural applications. I aimed to modify multiple Mozilla WebThings smart home gateways to act as a distributed system, implement a fault tolerance scheme, and identify consistency issues in smart home IoT devices within this system. Quality of service metrics of the distributed system in the form of latency measures show consistent, reasonable delays between gateways, with no large deviations from the mean. A fault tolerance scheme, in which one gateway takes over the IoT devices of another gateway that had gone offline, was able to add all devices from the offline gateway to the new gateway, and the new gateway was able to control half of the devices added. Consistency issues caused by network connectivity problems and event reorderings were identified and possible solutions were found.

Christy Lee, COS
Title of Project:  When a Virus Goes Viral: A Study on the Efficacy of Using Twitter Analysis to Forecast COVID-19 Cases

In a short period of time, COVID-19 has completely transformed the landscape of global health, economics, and society. Given the enormity of this impact, it has become crucial to more effectively prepare for and act against COVID-19; improving our ability to forecast case counts is one method of doing so. My independent work discusses a forecasting model which aims to quantify an aspect of social response in order to build a more well-rounded predictor of case trends. Because COVID-19 is spread primarily through person-to-person contact, shifts in social response to the virus can affect social behavior, and thus subsequent case numbers. By analyzing Twitter data for sentiment and frequency, the model takes into account one measure of social attitudes and behaviors towards COVID-19. This data is considered in conjunction with reported COVID-19 case data and state demographic information, inputted into a feedforward neural network model for regression, and ultimately used to forecast positive cases 3, 7, and 14 days into the future.

Austin Mejia, IND
Title of Project: Lucky Break: Regulating Loot Boxes in Video Games

Over the past four years, loot boxes have skyrocketed in popularity. These virtual crates of in-game items have become a mainstay of the video game industry, generating over $30 billion in 2019. However, with their meteoric rise comes concerns over their impact on gamers, as a growing body of evidence suggests that loot boxes are addictive. Though other nations like China and Hungary have already introduced legislation to regulate loot boxes, the U.S. has yet to establish a policy response, with no viable regulations foreseeable within the next year. My research seeks to propose a compelling and comprehensive policy response to loot boxes. Whereas many proposals solely focus on combating potential addiction, this recommendation additionally examines the structure of loot boxes and how they are embedded with “dark patterns,” or designs intended to trick players into spending more money. Ultimately, I recommend new regulations that create a stricter digital marketplace, requiring developers to disclose the odds of their loot boxes and implementing strict currency expectations. This recommendation hopes to lay a flexible foundation upon which future regulation can build on as our understanding of loot boxes continues to progress.

Sean-Wyn Ng, COS
Title of Project: Pose2Pose2: Pose Selection and Transfer for Full-Body Character Animation

To convert a video of a real-life human subject into an animation, artists often watch an original performance video of the subject many times in order to determine which body poses they tend to hold. Artists must also choose optimal points of transition between body poses within the animation. In this project, I explored the design of animation systems that are less manually intensive, which could potentially make animation more accessible to the general public by lowering its time-consuming barrier of entry. I created Pose2Pose2, a system inspired by Pose2Pose, a tool that automatically extracted and clustered two-dimensional upper-body pose data from a subject within a video, displaying them on a user interface in order of frequency of occurrence for more efficient visualization. However, Pose2Pose2 also has the ability to track full-body poses, as well identify both two- and three-dimensional pose data within a video featuring a human subject. Pose2Pose2 also includes additional features within the user interface, such as grouping rotation-normalized three-dimensional poses together and marking poses that are visually similar to poses selected by the user. Users select poses from the interface and use them as reference to draw cartoonized versions of the subject holding the selected poses. Pose2Pose2 uses the drawings to convert a new video featuring the same subject into an animation.

Vedika Patwari, COS
Title of Project: Evaluating the Impact of Data Localization on Technological Innovation in India

Cross-border data flows are playing an increasingly important role in supporting a globally digitized economy and yet, countries are attempting to regulate the flow of data through localization mandates. Using India as a case study, I examine the impact of localization on company operations and technological innovation. India’s dynamic policy environment, and its unique combination of a large digital economy and an emerging data center industry, offer insight into how localization impacts growth and innovation in countries with relatively lower levels of digital infrastructure. Given that emerging economies are also turning towards restricting the free flow of data, this is an important context within which to study data localization. Existing studies analyze the impact of localization at the national level and there is a need to better understand how localization plays out at a company level. Thus, I conducted semi-structured interviews with executives at various financial technology companies in India to understand the impacts of localization. I find that there is a high level of compliance with the localization mandate across all company sizes. Additionally, companies with local operations are able to localize their data with greater ease when compared to companies with global operations. This varied impact of localization is not addressed in existing literature and may cause multinational companies to opt out of markets with localization restrictions. I also identify an over-reliance on data centers located in Mumbai; this geographic centralization of data is a key vulnerability in the Indian financial ecosystem. In order to mitigate some of the identified risks, I recommend public and private investments to increase the availability and geographic spread of India’s data center infrastructure. Further research with more companies and in different countries is necessary to build upon my findings and to better inform future policies on data localization.

Carlotta Platt, SPI
Title of Project: Containing the Contagion: Determinants of Government Response to the First Wave of the COVID-19 Pandemic in Europe

The COVID-19 Pandemic has spared no country in the world, causing almost three million deaths in its first year. Yet governments were unprepared for and responded very differently to iterations of the same virus. My research uses quantitative and qualitative analysis to investigate what factors determined national variation of first-wave policy response to the COVID-19 pandemic in European countries, and what led to response effectiveness. I hypothesize that four groups of factors (Governmental, Political, Societal and Economic) will be significant in explaining (1) national variations in response intensity, (2) national variations in response quality, and will interact to determine (3) response effectiveness. Overall, I find that these four groups of factors explain variation in response intensity and quality, and that they interact to determine an effective response. Among them, (1) decentralization with strong intergovernmental coordination and central guidance, with reliance on few but communicative scientific experts, (2) strong leadership with low pressure from the opposition, (3) high trust and low media misinformation, and (4) a strong economy that is able to quickly increase healthcare capacity, combine to determine an effective response: one which best couples intensity with quality to avoid high numbers of cases and deaths. On a societal level specifically, I use Instagram and Twitter analysis to study how high media coverage of the pandemic, when used to spread misinformation, related to a less effective response.

Harmit (Hari) Raval, COS
Title of Project: Security and reliability implications of imprecise programming language specifications: A case-study of GPU schedulers

Programming language specifications are the rules that govern how programs behave; developers use these specifications to reason about their program properties, including safety and security concerns. When these specifications are imprecise, programmers develop applications that can behave in surprising ways. It is possible for malicious actors to exploit these surprising behaviors, causing significant societal impact. One of the most widespread parallel computing devices is the graphics processing unit (GPU). While these devices were classically used for graphics computations, they are now able to handle more general-purpose compute applications. Rapid evolution, coupled with increasing diversity of these devices lead to underspecified programming languages. This situation is ripe for security vulnerabilities given how widespread GPUs are. My research focuses on an underspecified area of GPU specification, the scheduler. Through our work in creating a thorough GPU testing framework and automatically constructing hundreds of multi-threaded test cases, we discovered instances where the scheduler can lock up. When the GPU locks up, we found that many different behaviors can be exhibited including: a simple graphics reboot or even the machine freezing completely. The latter example provides a direct pathway for a security vulnerability. By embedding our litmus tests in a mobile device application, we demonstrate the ability for such an application to leverage its low-level system access and cause visual information leakage. Overall, our work identifies serious security concerns in modern GPU devices. These concerns have severe sociological concerns given the prevalence of GPUs in modern systems, i.e., most people interact with their most private information, e.g. their daily usage on a smart phone, using devices that contain these powerful, yet underspecified processors.

Lauren Tang, COS
Title of Project: Towards the Democratization of Finance in the Context of Stock Trading

There is a growing market of millennials becoming more interested in personal finance and stock trading, especially with the onset of the COVID-19 pandemic. Many stock market brokers are making changes to their platforms to capture the millennial market. Robinhood, a high-tech trading platform, strives to bring novice investors onto their platform, stating that their mission is to “democratize finance for all.” We need to consider this: What does democratizing finance truly mean and has this goal been reached? I argue that the democratization of finance requires two parts: access and education. People need to be able to access financial systems and have the financial literacy to understand how to skillfully navigate them. I analyze differences between Robinhood and older incumbent brokers such as Charles Schwab to determine whether access to markets has increased. Additionally, I explore what means of financial literacy tools exist for investors and what can be further done by brokers. Robinhood has made great strides towards increasing access through their introduction of commission-free trading to the industry which has led brokers to follow suit. Robinhood also utilizes gamification tactics, and emphasis on the UI/UX. Examples of this can be seen in their sign up and trading process when compared to older incumbents. Issues arise when inexperienced investors don’t understand certain dangers associated with trading, such as tax liabilities and downfalls with commission-free trading (specifically on Robinhood). Commission-free trading poses a financial harm to investors because Robinhood practices payment for order flow (PFOF), which can result in users on their platform receiving worse prices on trade execution. While other brokers also engage in this practice, they pass along (PFOF) benefits to their users, but it is unclear whether Robinhood does this. Across brokers in general, tax liability is another issue for novice investors because they may not understand how stock trading is taxed based on the transaction and don’t know of tax offsetting practices. A clear example of these dangers arose in the GameStop short squeeze event in early 2021. Brokers provide access to markets but fail to provide tools for financial literacy. New investors seeking financial advice often independently look to other investors through Reddit forums, enabling a community of financial learning, but this channel of information is not always reliable. In order to continue making progress towards the democratization of finance for all, Robinhood and other brokers need to play an active role in educating their users by providing them with tools for learning and smarter investing.

Ethan Thai, ELE
Title of Project: Dr. AI: Adapting CNN Classification for the Technical and Social Challenges of Medical Diagnosis

Diagnosing medical images is a time, cost, and labor intensive task traditionally only undertaken by an expert few. Fortunately, through the development of artificial intelligence (AI) and accessibility to medical datasets, convolutional neural networks (CNNs) have become increasingly suitable for learning to conduct computer-assisted diagnosis (CAD). However, learning for medical classification comes with the unique technical challenges of low data volume, class imbalance, inconsistent labeling, and having fine image details differentiate multiple diagnoses, as well as the social challenges of respecting patient data usage and combating algorithmic bias. In this independent work, I designed (with others on this research project) a training methodology specifically tailored to the medical domain by integrating transfer-learning, dataset cleaning, and synthetic data augmentation techniques. Through evaluation of color channel variations in images used to pre-train a model, implementation of an iterative dataset cleaning scheme, and use of DeepInversion to synthesize patient-decoupled training data, small but compounding improvements to classification performance are shown. Finally, through the gained experience of developing a CAD methodology and contextualization of medical AI research in prevalent social and legal discussions, a set of privacy and bias conscious design principles are introduced.

Ryan Yao, COS
Title of Project: Safeguarding Consumer Privacy: Analysis of Data Obfuscation Mechanisms to Prevent Ubiquitous Network Tracking

The rapid rise of modern consumer Internet platforms has largely been enabled by the development of lucrative targeted network advertising models. However, these model-based platforms have engaged in unprecedented user data collection and extensive network tracking, which collectively threaten individual consumer privacy. In the absence of sufficient general data privacy regulation, a new class of user-oriented data obfuscation privacy tools has quickly grown. Using two popular data obfuscation tools — TrackMeNot, a search obfuscation tool, and AdNauseam, an ad clickstream obfuscation tool — as a lens, my independent work examines the ways in which data obfuscation — the production and inclusion of fake data to mask real data — can be applied to anonymize user data, deny data collection, and fundamentally disrupt excessive and unsanctioned network tracking. Grounded in review of recent literature, my research explores data obfuscation as a potential alternative to an otherwise narrow and exclusive focus on privacy regulation which has been the primary focus of previous work. Analysis of the ability of data obfuscation to prevent ubiquitous network tracking places it as a means of incentivizing the adoption of more responsible data collection practices and  advertising models which respect existing and future privacy standards. Ultimately,  recommend new policy initiatives, including the implementation of regulatory protections for consumer data obfuscation tools, the prevention of exclusive platform self-regulation, and the creation of regulation which works in conjunction with data obfuscation. These recommendations aim to serve as a principled foundation for use of data obfuscation in safeguarding the future of consumer privacy.

Anika Yardi, ORFE
Title of Project: Using Monte Carlo Markov Chain Methods to Understand the Mathematics and Visualization of Gerrymandering in Politically Competitive Districts

Gerrymandering is a technique used to give an unfair advantage to any one political party through the process of manipulating district lines in order to dilute the voting power of an opposing political party. Known by their bizarre shapes, gerrymandered districts are thought to be easily recognizable. However, this is not always the case, and it can be incredibly difficult to tell whether victories in particular areas occur due to legislative wrongdoing or are a natural political outcome. This is where mathematics can help. In my research, I worked to form a framework for analysis of redistricting plans of Maryland, Pennsylvania, and North Carolina. Firstly, using the powerful technique of Monte Carlo Markov Chain Methods, I came to the conclusion that while all three of my selected states have elements of gerrymandering in their redistricting plans, North Carolina and Pennsylvania are extreme examples of the technique. Furthermore, I compared court-mandated redistricting plans for these two states and determined that the implemented remedial plans were not the fairest and least extreme option of the proposed plans. Finally, I tackled the problem of gerrymandering from a policy perspective and isolated effective elements to combat gerrymandering from bills being proposed in North Carolina and Pennsylvania, which include independent commissions, required mechanisms for public hearings, and criteria for fairness like compactness.

Noa Zarur, COS
Title of Project: Profit Maximization and Food Waste Reduction

Almost half of the food in the United States goes to waste. The current solutions bakeries and restaurants have for avoiding food waste are not sufficient. The goal of my project is to reduce food waste and maximize profit for bakeries algorithmically, focusing on bread as my primary commodity of study. A challenge with creating such an algorithm is that it relies on the number of customers that show up at a bakery and buy bread throughout all points in a day. My approach recursively calls a function that returns the optimal number of breads to make in each time slot. By recursively calling my core function I run through every possible profitable outcome given all combinations of regular customers, leftover customers, and regular breads. I used average statistics of how many time slots there are per day to bake bread, the number of customers per day, and cost of making each loaf of bread to test my algorithm. My algorithm also allows the user to customize the algorithm by inputting the maximum number of customers that typically come to their bakery, their price for regular bread, their price for leftover bread, and how many leftover breads you start the day with. The results return the optimal number of regular breads to make, the expected profit from the regular bread and leftover bread, and the total profit for each time point. Next steps include adding features to allow this algorithm to be easily used by bakeries and ready for deployment. Further applications of the project might include expanding its usage to restaurants and pharmacies.

Previous Certificate Graduates’ Independent Work