2023 Certificate Graduates Independent Work

Ayers, Christien, MUS
Title of Project: The Effect of the Internet on Music Consumption Patterns

The Internet, more than 30 years after its public release, is generally understood to have played a significant role in changing the way global society now organizes itself. Thinking about the ways in which people today communicate, learn, spend their time, relate, and consume, the Internet has proved pervasive in each instance. My independent work analyzes how society’s relationship to music consumption has been impacted by the Internet. Through a survey of a variety of sociological literature diving into older music consumption patterns, I account for the ways in which society consumed music before the Internet was used as a tool for consuming and sharing music. Then, I consider Napster and its effect on the music industry in order to understand the way in which the Internet radically altered consumers’ relationship to music. From there, I use Napster as a case study to draw conclusions on consumers’ relationship with music in the 21st century vis à vis the Internet, providing insight into patterns that have remained the same and those that have changed.

Chloe Chen, COS 
Title of Project: Increasingly Ambiguous Privacy Policies: Concerns and Regulations

Privacy policies are a critical and ever-evolving tool for understanding how companies collect, use, and store data. However, while many regulations exist in various jurisdictions for ensuring privacy policies contain necessary content, there is virtually no regulation regarding the nature of the language that is used in privacy policies. In this paper, I examine previous research on the negative effect that ambiguous language in privacy policies can have on user understanding of privacy risk. I then analyze the ambiguity of language in existing privacy policies using Python, and find that privacy policies have evolved to include proportionally more ambiguous language over time. Finally, I assess the indirect impact that the General Data Protection Regulation (GDPR) may have had on the ambiguity of privacy policy language, and explore other ways that the language of privacy policies can be quantified and regulated.

Elizabeth Dorman, COS
Title of Project: Dream Garden: Exploring Location-Based, Collaboratively-Created Augmented Reality Spaces

Despite the potential for connecting strangers in the digital realm, current research has not explored location-based augmented reality experiences that enable strangers to connect by building artifacts collaboratively. In my independent work I created Dream Garden, an augmented reality application (AR) that lets people place 3D flowers into the physical world to build a collaborative location-based garden. Anyone with the app can access and see the flowers previously planted by strangers, as well as plant their own flowers to grow the garden. I evaluated this app with 10 participants, with 5 visiting the digital garden more than once, to evaluate their sense of connection to the other participants as each participant added one digital flower to the garden. I found that participants were joyful about building a shared space, were excited about the dynamic nature of the garden, felt a connection to the physical location of the digital garden, and expressed a sense of belonging to a community of strangers but not necessarily an emotional connection. Significantly, this research gives us insights on how we can use augmented reality as a tool to bring people together in real life and fostering senses of connection, creating technologies that bring us together instead of driving us apart.

Jaelin Haynes, COS
Title of Project: The Black Purposes of Web3: A Technical and Societal Analysis

Web3 is a system of applications built on the blockchain, an immutable, distributed ledger of blocks linked together with cryptographic hashes. This emerging technology is being promoted by some as a solution to the centralization and user privacy issues of the current web. Given the pattern of technologies becoming mainstream and having negative effects, especially on marginalized groups, it is important to consider the potential societal implications of this new technology before it is widely adopted. With background on the previous iterations of the web, Web3’s value proposition, and its technical details, I focused on African Americans in particular, exploring the history of Black Americans and technology and drawing on that history to investigate the potential uses and impacts of Web3. Specifically, I found themes of the digital divide, surveillance and privacy, coded racial bias, and technological creativity, and completed a data analysis and visualization of African American sentiment towards new technologies using Python libraries. The data analysis showed that Black participants indicated generally negative attitudes towards new technologies compared to their White counterparts. Even while being informed about scientific and technological innovations, African Americans showed unfavorable attitudes, potentially contradicting the idea that lower levels of use and access are results of less education. Based on the persistence of past themes, Web3 may not be the ultimate solution to the issues of Web 2.0 as some may claim, especially for African Americans, but there are still some valuable and beneficial principles in its development.

Rahul Jain, ORFE
Title of Project: An Examination of the Effect of Mobile Fintech Adoption on Microfinance Institutions in India

Microfinance Institutions, in India especially, have matured significantly over the past decade and a half, with increased government support, buy-in from private institutions, and the adoption of data analysis and mobile financial technologies. In this independent work, I first begin by discussing the history and regulations of MFIs in India, breaking it into four phases: origination, maturation, buy-in from private investors, and adoption of technologies. Following this overview, I perform exploratory data analysis for select MFI operating and loan repayment metrics, finding a consistent increase in loans per loan officer but no large change in operating expense / loan portfolio and loan repayment metrics (loan loss rate and 30-day and 90-day portfolio at risk) over time. With an understanding of the data distributions, I run a regression and p-value statistical testing for those metrics with the rural wireless teledensity (as a proxy for mobile phone and internet access) to see the effect of mobile fintech adoption. I conclude that operations have improved, with the regression on loans per loan officer having a statistically significant result with a P-value of 0.0015 for the coefficient around 4 as well as an R-squared value of 0.74. Regarding loan repayment metrics, I could not reject the null hypothesis as no significant linear relationship was found, likely due to too many confounding factors. Lastly, I use the overview of MFIs to date in conjunction with the quantitative analysis to provide recommendations for further MFI development, reiterating the value of strong government and partner organizations relationships, adoption of novel technologies, internal governance, and quantification of performance metrics.

Rohan Jinturkar, COS (completed IW as a junior spring 2022)
Title of Project: Investigating Racial Bias Trends in the Text of US Legal Opinions

There are many instances of racially biased outcomes in the American legal system. However, it is unclear if such bias also exists in the text of judge opinions, and if it varies across time periods and regions. We approximate GloVe word embeddings for legal opinions at the federal and state level from 1860 to 2019. We find evidence of racial bias across nearly all regions and time periods, as traditionally Black names are more closely associated with negative/unpleasant terms whereas traditionally White names are more closely associated with positive/pleasant terms. We do not find evidence that older opinions exhibit more bias, or that opinions from Northeastern states show greater change in racial bias over time compared to Southern states. These results counter the principle of impartiality in legal settings and demonstrate the need for further research into institutionalized racial bias. Lastly, we survey approaches for reducing bias across the legal system.

Hannah Kapoor, SPIA
Title of Project: 18 Words to Reform the Internet | An Evaluation of the Governance of Evolving Technologies: The Case of Section 230 of the Communications Decency Act

What can be learned from debates about reforms surrounding Section 230 of the Communications Decency Act (CDA) to inspire the future governance of evolving technologies that challenge fundamental rights?  Passed in 1996, Section 230 grants internet service providers immunity from certain types of liability for user-generated content; the law has come under fire in the evolving communications landscape.  Policy attempts to reform Section 230, therefore, represent a compelling case study to inform how policymakers craft legislation for technologies in an evolving landscape. Through a comparative analysis of stakeholder interests from the onset of the deployment of online content platforms to their more matured state, the research travels from 1959 to the present day, and reviews thousands of pages of historical accounts, Congressional testimony, and court briefs, to build comparative stakeholder analyses presenting the underlying interests, motivations, and arguments that have framed the Section 230 debate across time and sector

Despite the widespread criticism of Section 230, the research finds that the core stakeholder motivations remain unchanged across time: to mitigate speech harms and uphold freedom of speech online. However, it is also observed, as technology has evolved, that stakeholders increasingly approach reform on the basis of profit, partisanship, and reactive responses to specific instances of social harm. The research reveals the perils of limited stakeholder engagement across sectors in the curation of legislation and the public pressure endured by policymakers that encourages them to react in response to specific harms. In a techno-moral context, policymakers are advised to consider lawmaking as an instrument of governance that promotes and protects foundational rights, such as freedom of speech. A “Precautionary Agile” approach to the governance of evolving technologies is recommended, coalescing “ex-ante” and “prohibitive” approaches of governance.

Henry Koffler, ORFE
Title of Project: A Pricing Analysis of European Cap & Trade Carbon Futures: Trading Strategies and Implications

The global response to changing climate conditions has overwhelmingly been in favor of strong regulation. However, the European Union, in their implementation of their Emissions Trading System, has demonstrated that there is an alternative where economic progress is not only fundamentally compatible with, but necessarily commands environmental protection. In this cap & trade system, certain corporations are required by law to purchase rights to emit CO2 to offset their emissions. That said, many are arguing that financial speculation makes pricing nigh impossible for corporations who are unable to opt out. As such, my paper seeks to evaluate the veracity of the claim that the European Union carbon credit prices are significantly driven by market speculation and not accurately predicted. To accomplish this goal, my independent work blends traditional financial modeling approaches such as Least Absolute Shrinkage and Selection Operator regression, the division of historical prices into market regimes using a Gaussian Mixture Model, and constructs a Long Short-Term Memory network to accurately price carbon credits. After showing that carbon credit prices are in fact readily modeled with precision, my research concludes that carbon prices are indeed strongly correlated to a slew of world states (such as weather) as well as commodities (such as oil or coal) and that market speculation is not a significant factor. Having established a methodology to comprehensively price carbon credits, my independent work provides policy recommendations to increase adoption of  Thailand’s new carbon credit market, especially amongst the rural farmers who will be its primary users. Additionally, my paper reflects on the ethical questions surrounding environmental investing and financial speculation. Principally, why financial speculation is notably distinct from gambling and, in fact, provides strong benefits to the carbon credit market.

Colton Loftus, COS (completed IW as a junior spring 2022)
Title of Project: Analyzing Experiences with Speech Controlled Accessibility Software and Developing a Solution for the Linux Desktop

For individuals with disabilities affecting the use of their hands, typing and using a mouse can be not only inconvenient, but also painful. This problem is especially prevalent within the software ecosystem of the open source operating system, Linux. While Windows and MacOS both have proprietary disability software for controlling the computer through voice, Linux users do not have access to these same proprietary solutions. In my independent work, I developed a voice controlled accessibility program that can help solve this issue. My program can be used for a wide array of actions across the Linux desktop. It can control windows, press keys, dictate text, and much more. It can also be customized or run scripts from the user to perform new behavior. While developing my program, I also wanted to better understand how to design and implement policies pertaining to accessibility software. To do this I held a series of software demos with my program.  By the end of my research, I developed a series of key takeaways: 1) While accessibility designers don’t always have the resources to train their own models, they can nonetheless design applications with modular and customizable behavior. In my application, users can interact with the machine learning backend, switch models, and customize command names. All of these choices push back on the idea that the voice recognition backend should be a black box for users. (2) Graphical user interface (GUI) programs should limit mouse usage and prefer keyboard shortcuts when possible. Throughout my software demo, it was particularly difficult for users to mimic commands like dragging or dropping through just voice. However, keyboard shortcuts were quick and easy for both voice and hands to perform. (3)  Workspace designers need to take into consideration those with alternative input methods. Throughout the development process, it was difficult to find public places quiet enough to use voice controlled software while not disturbing others. In addition to my open source code, these conclusions from my policy research will help future designers  create more accessible workspaces, communities, and software applications.

Katie McLaughlin, COS
Title of Project: A Systems Approach to Mitigating Harms of Content Recommender Systems

Content recommender systems play a critical role in the dissemination and discussion of information. Therefore, it is crucial that there are structures and processes to support the development of safe, trustworthy recommender systems. To responsibly deploy these models, Machine Learning (ML) practitioners must go out of their way to navigate a complex ecosystem of regulation, organizational structures, and resources. In this paper, I take an approach rooted in systems theory to detail a holistic overview of the content recommendation ecosystem and the challenges that arise from its structure. Through a survey and semi-structured interviews, my research highlights how these issues stem from the absence of a shared language around harms. Furthermore, practitioners lack a framework to evaluate the ethical and societal implications of their recommender systems. I detail how the current approach to identifying and mitigating harms does not adequately prevent these harms. I also identify challenges that arise from the current organizational arrangements of knowledge and expertise. Based on these findings, I recommend how we can use oversight mechanisms to establish a standard for harm mitigation approaches. I further suggest we modify the existing ecosystem to distribute responsibility and equip practitioners to confidently deploy safe content recommender systems.

William Olson, COS
Title of Project: Faces at Face Value: An Analysis of Face Recognition Technology Policy and Performance

Facial recognition is one of the best developed and widely used applications of machine learning and examples of artificial intelligence in 2023. The development of the technology and the ubiquity of high-quality video recording devices like traffic cameras, surveillance cameras, and police body cameras enables the permeation of this technology throughout all spheres of life. Such technology invites the fear of constant surveillance and the decline in individual privacy, particularly in public areas. This study aimed to comprehensively gauge the current policies surrounding the use of face recognition in the United State. Specifically, I embarked on a two-part policy analysis. The first component examined the use of face recognition by law enforcement at the federal, state and local levels. I found impactful regulation at the state level, the most effective of which I deemed to be moratoriums on the technology for this use case until further improvements are made to its accuracy. The second component examined the regulations around consumer data privacy which govern the collection, use, and distribution of our face images by public and private entities. I found the most impactful regulation at the state level, and propose that all regulation should be modeled on the Illinois Biometric Information Privacy Act; this legislation grants consumers the ability to seek monetary compensation when their face images are misappropriated. The second component of this project approached the problem of age-invariant cross-demographic face recognition; that is, matching current photos against outdated ones and examining accuracy rates across demographic groups. I assembled a novel dataset of 167k face images and tested using Amazon Rekognition, a leading commercial provider of face recognition. I found that this system performs worse on minority groups, in particular, darker skinned individuals of both sexes. I concluded that further development is necessary for both public policy and technological implementation of face recognition.

Hien Pham, COS (completed IW as a junior spring 2022)
Title of Project: Community Mesh Networks: A Local Solution to the Digital Divide

In a post-Covid-19 world, U.S. states face an unprecedented opportunity to secure funding for broadband infrastructure development. While the national conversation is focused on geographic scale and speed, the long-term resilience and community aspect of broadband should not be overlooked. Community mesh networks (CMNs) are community-owned and operated computer networks that provide affordable or free Internet access to local residents. A community-owned and operated network infrastructure not only helps tackle the broadband issue, but also provides essential development in digital literacy, civic engagement, and emergency preparedness for communities it serves. My paper explores how communities have deployed mesh networks and explains why community networks are a critical piece of the solution to the digital divide puzzle and should be an essential element in states’ broadband development plans. To do so, I focused on two CMNs at different development stages in the US north-east region. I traveled to NYC and Philadelphia to visit public sites in the networks and conducted interviews with volunteer network operators to learn about their experiences. I concluded with specific actions that policymakers can take to empower CMNs and communities they serve, informed by the visits and interviews. Some proposed actions to support CMNs at various stages include making public backhaul accessible to CMNs through an application process, implementing application pipelines for funding and partnerships between public departments and CMNs, and connecting CMNs to public and community organizations that can help provide volunteers or technical consulting.

Richard Qui, ECO
Title of Project: Airbnb’s Alarming Aftermath: An Analysis of Airbnb’s Effect on San Francisco’s Rent and Housing Market

San Francisco is facing a rent and housing crisis due to a lack of housing units available for its residents. The entry of Airbnb, a hospitality technology company, into this city, has threatened to exacerbate this effect, taking away long-term rent and housing units towards short-term rentals and possibly leading to increases in rent and housing prices in one of the most expensive cities to live nationally in the United States. In particular, San Francisco Ordinance 218-14 legalized short-term rentals in the city of San Francisco, allowing many owners to rent out their primary units to tourists and short-term residents under numerous restrictions. This paper details how SF Ordinance 218-14 affected San Francisco’s rent and house prices, along with the overall trend of Airbnb’s effect on the city. Furthermore, this paper experiments with a form of “regulation” by determining whether limiting Airbnb rentals to entire room/house units or private/shared room units is a viable strategy to further reduce Airbnb’s impact while still allowing Airbnb to run its operations and benefit tourism in the city. I conclude that Airbnb has a statistical significance on increasing rent and house prices in San Francisco short term and long term, along with recommending that only allowing private/shared room units is a possible strategy to regulate Airbnb’s impact on San Francisco. This paper also summarizes the impact of a technology company like Airbnb in light of the sharing economy on society concerning rent and housing units, along with how government regulations should work directly with technology companies to enact these laws and minimize their impact on society, as the government itself may not be powerful enough today to enforce its own rules since Ordinance 218-14 had many flaws in its enactment.

Katelyn Rodrigues, COS
Title of Project: The Last Thing We Forget: Applying Natural Language Processing to Decode Memories Evoked by Modern Music

With the pinnacle of the digital era upon us, the widespread accessibility of streaming platforms has disrupted the music industry. Now, more than ever, music has the potential to fully embrace its role in uniting all listeners in experiencing an emotionally transformative force regardless of their geographical location, demographic background, language, or culture. Through the lens of online music streaming platform comment forums, my independent work analyzes the potential for modern music genres to evoke memories and shared experiences. Inspired by a research study conducted in the Princeton University Music Cognition Lab where participants recorded their music-evoked autobiographical memories (MEAMs), my research specifically explored the research goal in the context of YouTube music video comment threads. After rigorous processing and sanitizing of this data, a series of Natural Language Processing (NLP) techniques surfaced thematic elements within listeners comments on modern music. Beginning with an introductory TF-IDF analysis and then migrating to more complex techniques like PCA dimension reduction, LDA topics analysis, and visualizing cosine similarity between the YouTube comments and their corresponding song lyrics, the results yielded fascinating themes with shared memories at both the song and genre levels. While applying NLP techniques to original data consisting of unconstrained, freely available responses is relatively uncharted territory in the digital music space, the findings were definitive and provide a basis for further analysis of this multi-dimensional data that can aid music video creators in developing engaging content that can unite listeners globally.

Iroha Shirai, COS (completed IW as a junior spring 2022)
Title of Project: Analyzing Gender Biases of STEM-Related Keywords Within United States and Japanese Twitter Posts

Though there has been increasing discussion surrounding the low representation of females in STEM (Science, Technology, Engineering, and Mathematics) fields and thus more movements to increase female representation in STEM fields, the number of females in STEM fields remains relatively low. Furthermore, there continues to be a difference in these representations among different countries. In my work, I focus specifically on the United States and Japan, looking at how male- and female-related keywords are used in context with STEM-related keywords within US and Japanese Twitter posts. I collected and created my own datasets and trained Gensim’s Word2Vec models to create word embeddings. Then, cosine similarities are calculated between gender-related and STEM-related words in the word embeddings. The analyzed results showed that the calculated difference between male- and female-related average cosine similarities was greater in the US than in Japan for the 2019 – 2020 Twitter post. In contrast, this difference was greater in Japan than in the US for the 2009 – 2010 Twitter posts. The results also suggested that the difference of these calculated differences between the US and Japan was larger in 2019 – 2020 than in 2009 – 2010.

Niva Sivakumar, COS (completed IW as a junior spring 2022)
Title of Project: Understanding and Detecting Hateful Users on Twitter via Graph Theory and Machine Learning

From July to December 2020, 3.8 million tweets were removed by Twitter for falling under the category of hate speech, and 1.1 million users were flagged and suspended as hateful accounts. Recent work has typically depended on the tweets themselves, often using linguistic contexts and vocabulary. However, there’s often a need to go beyond pure textual classification, potentially integrating features about users and social groups in addition to the content of the tweets. Using a publicly available dataset of Twitter users, this paper seeks to answer three questions: (1) What insight does the distribution of hateful users in the top 100/200 influential users of the retweet graph M. Ribeiro et al provide? (2) How likely are hateful users to retweet other hateful users, and normal users to retweet other normal users? (3) Can we train a preliminary model to identify hateful users based on a limited number of numerical features, none of those having to do with the actual content of their tweets? Using the PageRank algorithm, we find that hateful users seem to be just as influential as normal ones. Using the reciprocity of vertex-induced subgraphs, hateful users are almost twice as likely to retweet each other as normal users. After training a neural network on the annotated users over 10 numerical features, we were able to reach 92.3% accuracy in classifying a user as “hateful” or “normal” without looking at their tweets at all. Our findings indicate that user-based classification of hateful speech on social media is effective and could strengthen or corroborate text-based classification.

Anna Sivaraj, COS (completed IW as a junior spring 2022)
Title of Project: Content Warnings on Social Media: An Evaluation of Instagram’s Sensitive Content Screen

As social media platforms become more ingrained in society, it is critical to mitigate the potentially negative impacts of sensitive content on users’ mental health. In order to help users avoid offensive or disturbing content, Instagram places a content warning, the Sensitive Content Screen (SCS), over posts with sensitive content. However, since its introduction in 2017, the SCS continues to appear on Instagram without significant changes to the original design and function. This independent work investigates the effectiveness of the SCS. By conducting a historical case study of health warning labels on cigarette packages, I derive lessons from health warning labels on cigarette packages and extend them to propose improvements to the SCS. Furthermore, I survey college students to gather their impressions about the current and the proposed SCS. Drawing from the case study and survey results, I suggest recommendations for how Meta Platforms, Instagram’s parent company, might improve the existing SCS and better protect vulnerable populations. In summary, I recommend that Meta Platforms (a) strengthen and modify the existing SCS, (b) add text to the warning message with the categories of sensitive content that the post may fall into, and (c) implement a larger and more comprehensive strategy aimed at protecting vulnerable users from the harmful impact of exposure to potentially sensitive content on Instagram.

Morgan Teman, COS
Title of Project: You Can’t Hack Democracy: Preventing Foreign Election Interference Using the 2017 French Presidential Election as a Case Study

The 2017 French Presidential election provides a fascinating case of attempted foreign influence through a disinformation and hacking campaign that ultimately did not discourage the public from electing the targeted candidate. No singular entity is entirely responsible for thwarting the attack; a combination of circumstances and efforts coincided to delegitimize what became known as #MacronLeaks, when 15 gigabytes of data were stolen from Emmanuel Macron’s campaign team’s servers and shared on the Internet. My independent work analyzes the effects of multiple involved parties’ defensive actions–including cyber-blurring, multi-level and encrypted communication, cybersecurity training, public transparency, and fact-checking, among others–and recommends a joint strategy to prevent such an event from recurring, including an offensive cybersecurity push by the government, secure servers for campaigns, and proactive bot investigations by social networks.

2022 Certificate Graduates Independent Work

Jeremy Bernius, SPI
Title of Project: Algorithmic (In)Justice: The Bias and Unfairness of Risk Assessment Instruments Used in Sentencing

In the age of Big Data, all areas of life rely on data analytics, algorithms, and artificial intelligence to make essential decisions and to facilitate normal operations; the criminal justice system is no different in this regard. Algorithmic risk assessment instruments (RAIs) inform nearly every stage of the criminal justice process, such as pretrial detention, corrections, probation, and are used increasingly more in sentencing. These instruments predict on offenders’ risk of recidivism after inputting their demographic and behavioral data in a statistical model trained on samples from historical populations. In the sentencing context, this risk prediction then influences a judge’s decision on the type, length, and severity of the offender’s treatment.

My independent work research adopts an interdisciplinary approach to research the racial disparities and unfairness in the design and application of algorithmic risk assessment instruments used in sentencing. Specifically, I complete a cost-benefit analysis on their implementation, explore their accuracy rates and different errors, and critique the type of data they use. First, I draw on a case study of the Commonwealth of Virginia’s proprietary tool called the Nonviolent Risk Assessment (NVRA). With over two decades of use, state and independent researchers have extensively studied Virginia’s experiment. In particular, I discover that while the NVRA can accurately predict low-risk offenders to be diverted to non-carceral sentences, it suffers from a lack of alternative sentencing options, judicial resistance to its use, and growing racial disparities in sentencing decisions. Second, I dive into the ethical and mathematical debate spurred by ProPublica’s report on the racial bias of COMPAS, a leading RAI used in several jurisdictions. After reviewing relevant literature, I argue that model error parity, the state of members across racial groups receiving equal proportions of false negatives and false positives, is a salient measure of algorithmic fairness. Lastly, I rely on case law and legal theory to illustrate that the inclusion of certain demographic characteristics in risk predictions, such as an offender’s race or socioeconomic status, likely violates constitutional protections whereas data on an offender’s behavior, such as their criminal history, is permissible. To close, I find that RAIs, as they are designed and implemented, foster racial bias and unfairness, and I outline several implications my results have for future policies on the use of algorithmic risk assessment instruments in sentencing.

Marina Beshai, COS (completed IW as a junior 2021)
Title of Project: Political Movements in the Age of Social Media: An Analysis of Twitter’s Role in the Egyptian Crisis

Governments worldwide and the media often blame social media companies for civil unrest rather than the associated individuals.  Claiming social media to be a threat against democracy, governments heavily moderate platforms and suppress activists. Drawing on more than six million #Egypt tweets published during the 2011 Egyptian crisis, this study explores the relationship between the in-person demonstrations and the online Twitter movement to observe how these components complemented and influenced one another. The rise and fall of #Mubarak (on Twitter) with the subsequent rise of #noscaf (No Supreme Council of the Armed Forces) goes to show how the grievances of protesters mirrored that of the topics trending online. They were controlling the narrative to a certain extent. And the sheer number of associated country hashtags (154 out of the 195 present-day countries were associated with #Egypt), never mind the use, imply a connected, worldwide community. Natural Language Processing (NLP) showed that English speakers were consistently more negative in their tweets than their Arabic counterparts. Not once did Arabic users express a more negative outlook than their English counterparts. Despite the large gap between the two groups. the correlation coefficient between the Arabic and English scores was 0.46, so there was a strong linear relationship between the general moods of the two parties. And a holistic analysis of tweets during the internet blackout in Egypt showed that many users around this time were increasingly concerned for the safety of protesters. On January 28, 2011, the first day of the internet blackout in Egypt, the frequency of tweets greatly increased by 82,020 tweets which comprise 1% of the total tweets including #Egypt in 2011. Topic modeling showed that on seven out of the ten days of the internet blackout in Egypt, ‘freeEgypt’ or ‘freedom’ were one of the most frequently used words that aptly describe users’ general attitude. In all, results suggested that there is a give and take relationship whereby users inside the country greatly influence the platform at the start of demonstrations, and in turn, receive support and aid from users outside of the country later on.

Justin Curl, COS
Title of Project: Please Pay Attention: Using YouTube’s ad algorithm to analyze the presentation of unwanted information

How do you get people to pay attention to or process unwanted information? Our research studies the effect of user behavior — how often a user skips ads — on the amount, length, and type of ads a user sees on YouTube. In our experiment, we reverse engineered aspects of YouTube’s ad algorithm using bots built with Selenium in Python to simulate three types of user behavior: positive towards ads, in which users never skip ads; neutral towards, in which users skip ads 50% of the time; and negative towards ads, in which users always skip ads. Overall, we found that while there does not seem to be a meaningful relationship between how often users skip ads and the number of ads users see, we did find that the users who skip ads more often are shown shorter ads that are less frequently skippable. These findings have interesting policy implications for organizations trying to convey important, though unwanted information: make viewing your messages mandatory and keep them short.

Audrey Laude, COS
Title of Project: Performance Decay in Machine Learning: Temporal Effects across Policing, Epidemic, and Financial Forecasting

As time passes, data used to train a machine learning model often becomes “outdated” which in turn hinders model performance, most often measured via prediction accuracy. This concept is observed and described tangentially with various terms, such as model decay and concept drift, among others; however, none of these completely capture the whole picture of temporal performance decay across fields. Hence, the goal of this project is to perform an exploratory analysis of performance on several datasets in different fields: the Stanford Open Policing project (annual scale), FluSight (weekly scale), and Dow Jones performance (daily scale). For the Stanford Open Policing project, logistic regression models were trained on Washington state data from 2009-2011 and tested on each year between 2012-2016. Whereas there appeared to be decay in the accuracy and AUC as years passed in the SOPP data, this seemed to be intrinsic to the data. For FluSight, models’ abilities to forecast 1, 2, 3, and 4 weeks into the future as well as the nature of the decay across this time period were analyzed, showing a negative relationship between performance and time and suggesting that an exponential model may describe this best. Lastly, two different models (a neural net and random walk model) were used to make time-series forecasts on the Dow Jones daily performance to see how error changes the further away one is from the last known stock price. Although complex modeling techniques did not outperform the base model, these still displayed performance decay consistent with the theoretical rate of error. Thus, this project provides initial insights with respect to the nature of performance decay across fields and asks questions that lay the groundwork for future research directions in the area.

Yu Jeong Lee, ECO
Title of Project: “Belong Anywhere”? A Dynamic Model of Airbnb Expansion and the Affordable Housing Crisis in New York City

Since its inception as an air mattress rental service in 2008, Airbnb has redefined our understanding of hospitality by opening residential homes to tourists. As of 2019, Airbnb offers over 7 million listings around the world, empowering tourists to “Belong Anywhere,” per the company’s motto. However, by crowding the residential housing market with seasonal vacation rentals catering to tourists, housing advocates argue the sharing economy giant is making it increasingly difficult for long-term residents to find anywhere to belong. According to the New York Housing Authority, competition between vacation rentals and residential units are driving the City’s shortage of affordable housing to a “crisis point,” with Airbnbs comprising up to 20% of the rental inventory in certain neighborhoods.

My research evaluates the effectiveness of New York City’s 2016 Multiple Dwelling Law banning the advertisement of entire-property rentals for less than thirty days on online marketplaces, including Airbnb. Specifically, I evaluate the effectiveness of the policy intervention in the context of Airbnb’s introduction of the Smart Pricing feature in 2015–a dynamic pricing algorithm that adapts listing prices to maximize host revenues. I find that though Smart Pricing increased median host revenues by $218 on average throughout the five boroughs, it didn’t pose a significant counter incentive to the rental regulation policy. In fact, through a difference-in-differences study comparing short-term rentals in New York City and neighboring Jersey City, NJ, I find that the 2016 policy decreased the share of illegal (short-term, entire-property) Airbnb listings in New York by 2.9%, or a decrease equivalent to 26% of the rental inventory growth in New York between 2011-2017. This finding suggests the advertisement ban was effective in curbing short-term rentals that crowd out much-needed housing supply, though it is unclear whether these Airbnb units were reverted to housing units or longer-term vacation rentals. Nonetheless, by incorporating Airbnb’s Smart Pricing feature to evaluate the incentives surrounding policy compliance and enforceability, I expand upon existing literature on online marketplace dynamics, platform design, and its implications for policy-making.

Yana Mihova, SPI (completed IW as a junior 2021)
Title of Project: Bill Gates, Drinking Bleach and 5G Radiation: The Role of Right-Wing Media in Spreading Coronavirus Misinformation

From initial reporting of the COVID-19 virus in early 2020, misinformation fueled the pandemic by spreading doubts about its authenticity. Due to the novelty of the pandemic, there was a gap in research on the effects of type of media consumption and its impact on believing misinformation about the pandemic. Since I was interested in investigating the ways that media consumption can impact societal perspectives on a particular topic, I decided to investigate the relationship between the spread of COVID-19 misinformation on media platforms and the outstanding consequences in society of this misinformation. I looked at this relationship by observing the type of news source an individual consumed and their likelihood to endorse COVID-19 misinformation, as measured by belief of COVID-19 conspiracy theories and distrust in public health officials. My analysis found that a statistically significant positive relationship existed between individuals who reported consuming only right-leaning media and their tendency to endorse COVID-19 misinformation. When taken into context with previous research indicating right-leaning media reported significantly more COVID-19 misinformation than moderate and left-leaning media, my findings indicate a correlation between reporting of false information and likelihood to endorse COVID-19 misinformation. This study brings to light the dangers of fact-less reporting and how it can have detrimental effects on societal outcomes.

Lindsey Moore, COS
Title of Project: Top Attrition Factors To Be Addressed in Female Engineering Interventions:  A Comparative Study of Current and Switched Out Female BSE Students at Princeton University

How do we fix the leaky pipeline of female engineering students at the college level? My research studies which attrition factors engineering interventions must address in order to lower the female engineering attrition rate at Princeton University. I conducted a survey of current BSE students and students who chose to switch out of BSE to determine consistent characteristics of persistent engineering students at Princeton University. From this survey, nine consistent characteristics of persistent female engineering students were found. The top three attrition factors related to these characteristics were students’ pre-collegiate academics, self-efficacy/self-confidence, and social support. These factors should be addressed in female engineering interventions to help lower the attrition rate. I also recommended some possible interventions that could address these factors, such as offering Calculus I to students before their first year and encouraging more female BSE students to take the EGR sequence of introductory classes.

Sara Sacks, SPI
Title of Project: Infertility is Not Just a Rich, White Woman’s Problem: Addressing Disparities in Access to Assisted Reproductive Technologies

My independent work considers disparities in infertility and the way they correspond to the broader disparities in health. The research offers a synthesis of scholarly literature about health disparities along the axes of socioeconomic status (SES), race and ethnicity, and region. I argue that the trends in health disparities pertaining to fertility care correspond to those found in the general health disparity literature around these specific dimensions. The research further analyzes the recent policies proposed in Congress to alleviate these disparities in access to ART, specifically the policy dubbed “IVF for All” and the Infertility Awareness Act.

Tara Shawa, SOC
Title of Project: Technological Gender Socialization: Examining Gender Representation and Reinforcement through the YouTube Kids Recommendation System

The focus of my independent work research lies in the intersection between childhood gender socialization, media effects, and digital technology. The research question is twofold: first, how is gender represented in content on the YouTube Kids platform? And second, to what extent does the recommendation system reinforce these gendered representations? To address these questions, I begin with a review of literature in the key fields that intersect on this topic. I start with gender theory and childhood socialization theory, then media effects and digital media, and lastly the platform through which they convene: YouTube Kids. Grounded in this cumulative knowledge, I conduct a mixed-method analysis, comprised of qualitative case studies and quantitative content analysis, to examine representations of gender and their potential exacerbation throughout various types of content. I find that gendered representations on YouTube Kids reflect normative constructions of gender, and that there is potential for the platform’s recommendation system to suggest increasingly gendered content to users.

Rachel Sylwester, COS
Title of Project: Are We Fair Yet? A Practical Analysis of Bias and Barriers to Fairness in Mortgage Lending

My independent work investigates observed bias in the U.S. mortgage lending system, where two underwriting algorithms are used to make approval decisions on 90% of mortgage applications. Recent work has exposed disparate rates of approval between racial groups by these supposedly “colorblind” underwriting algorithms.  Using real loan application data, this research explores the effectiveness of an algorithmic fairness intervention in mitigating the system’s disparate impact on minorities. I implement disparate impact remover, a preprocessing method, which edits features in the data to improve group fairness while preserving relative rank within groups. The results suggest that an algorithmic intervention, such as disparate impact remover, is effective at reducing the disparate denial rate of Black applicants without significant effects on accuracy or total cost. If implemented, such an intervention could affect millions of applicants each year due to the industry-wide use of automated underwriting systems. However, this paper finds that the effectiveness of an algorithmic intervention, such as the one explored in this paper, is substantially limited by structural and legal barriers. Thus, my research results in recommendations for technical steps to mitigate automated underwriting system bias but also regulatory and legal changes to enable and support such measures.

Henry Vecchione, COS (completed IW as a junior 2021)
Title of Project: Pan-app-ticon: What to Do About Ring’s Partnerships with Police Departments

I take issue with how Ring, the Internet-connected security camera company owned by Amazon, has pursued mutually beneficial partnerships with local police departments. The partnerships incentivize police to distribute Ring devices in their communities and grants the police access to a “Law Enforcement Portal” that enables them to select an area on a map, specify up to a 12-hour window of time, and send requests for footage from those hours to Ring owners in that area. I argue that cost and inefficiency are a significant barrier to surveillance creep and that this interface reduces that cost too much. I support this argument with two Supreme Court cases, U.S. v. Knotts (1983) and U.S. V. Jones (2012), the comparison of which illustrates how technological advancements can fundamentally change one’s expectation of privacy and the invasiveness of criminal investigation. I then examine the ACLU’s Community Control Over Police Surveillance (CCOPS) model bill and real legislation based on it, which require public approval for new surveillance technologies. I find that much of it doesn’t adequately protect against connected surveillance devices like Ring because they are not a “new technology”, rather an old technology that is harmful in how it’s used and efficiency it creates. This allows Ring to bypass approval. I then propose changes that Ring can make to their products and changes that legislatures can make to their bills to minimize harm. I suggest that Ring could use image recognition to blur faces in video sent to police, only removing the blur on order from a judge, or it could change the law enforcement interface to prohibit bulk requests or require more information. Legislatures should also alter how “new technology” is defined, requiring reapproval if a technology increases surveillance efficiency a meaningful degree even if it resembles an approved technology on the surface.

Jacqueline Xu, ORF
Title of Project: Promoting Sustainable Habits: A Network Analysis of Hard-to-Maintain Behaviors 

People’s decisions, actions, and opinions are manifestations of not only their personal values, but their social environments as well. Extensive research by sociologists and mathematicians in the past century have culminated into a network diffusion model that explains how individuals weigh the utility of available options based on their personal preferences and the perceived decisions of their local network. But while this model is a good simplification for consumer behaviors—one-off decisions that have immediate consequences—they do not adequately represent the diffusion of hard-to-maintain (HTM) behaviors, which include healthy habits like exercising regularly and environmental behaviors like becoming vegetarian. Unlike consumer behaviors, these decisions require persistent cues for upkeep and statistically exhibit slower rates of adoption. As such, they cannot be directly understood through existing diffusion models. This paper modifies previous studies to analyze the spread of habitual behaviors by incorporating a parameter for behavioral loss. Compared to simple behavioral diffusion, the results indicate that final adoption levels are generally similar while adoption rates are much slower. These findings suggest that low adoption levels at one point in time do not necessarily predict low levels at a future date.

Melody Zheng, COS (completed IW as a junior 2021)
Title of Project: Analyzing the Digital Divide: A Quantitative and Qualitative Study of Six United States Cities

As society grows increasingly dependent on information and communication technologies (ICTs), it becomes crucial to address the digital divide still present in many communities. In my work, I focused on identifying a digital access policy that the city of Oakland, California should adopt. To do so, I compared the initiatives of three cities in the same population range that have been “successful,” such that the rate of Internet and computer access for historically underserved populations has increased from 2015 to 2019, with the initiatives of three cities that have not been successful. Using data from the U.S. Census Bureau’s American Community Survey, I tracked the rates of Internet and computer access for five different demographics over the five year period and chose the three cities with the best average rate and the three with the lowest average rate. I then analyzed qualitative data to identify whether the selected cities focused on Internet access, computer access, and/or digital literacy training in their digital access initiatives, although two of the three unsuccessful cities had little to no such information publicly available. When comparing the three successful cities to the remaining unsuccessful city, Oakland, I found that while the four cities generally addressed all three aspects, the successful cities had a greater focus on community resources. Therefore, I argue that Oakland should invest more resources on digital literacy programs and publicly available ICTs, especially since they were not able to offer entirely free home Internet plans and digital devices. Community resources and technology classes would be more accessible to a greater number of households and hopefully lead to improved financial situations as well.

2021 Certificate Graduates Independent Work

Yaw Asante, COS
Title of Project: Evaluating and Contextualizing Network-Based Analysis of Drug Response in Cancer Dependency Genes

Computationally assessing which genes cancer needs to propagate itself is a much-researched topic at the juncture of computational and medical science. To contribute to this area, I sought to build a software tool capable of assessing cancer dependency by extending from the foundation of a tool called NetMix, which solves a related problem. Additionally, I sought to examine the broader context in which tools like these may be applied in clinical medicine. For my first contribution, I designed a NetMix-based software process called CADEGA and compared its performance to that of a peer algorithm called NETPHIX. This work demonstrated CADEGA’s limited performance overall, though with a potential for finding functional correlations which differed from those of NETPHIX. For my second contribution, I conducted an overview of the real-word context in which methods like CADEGA and NETPHIX would apply to the field of data-enabled healthcare. This analysis demonstrated the expansive efforts being made or planned in technical infrastructure as well as the blindspots present in existing laws surrounding genetic data and in the equitable development of these resources for rural facilities.

Bevin Benson, COS
Title of Project: Restricted Content: A Technical Guide to Internet Censorship in the Age of Social Media

The growth of social media platforms poses new challenges to governments seeking to control information online. Historically, governments have relied on a toolkit of technical methods to censor content on the web, such as IP blocking, DNS tampering, and deep-packet filtering. These methods are ineffective against blocking specific content on social media platforms. As a result, many governments have turned to sending “content removal requests” to these platforms as a means of restricting material that it considers objectionable. My independent work outlines the technical methods of Internet censorship, focusing on how governments can block content using IP/port blocking, DNS tampering, and deep-packet filtering, and examines the relationship between governments and three major Internet platforms – Facebook, Twitter, and Google – vis-á-vis content removal requests. It conducts an exploratory analysis of the transparency datasets in the transparency reports released by the three platforms using Python to uncover what data the platforms release to the public. It finds that the platforms, particularly Facebook, lack transparency about the requests they receive and their guidelines for content removal. Twitter releases the greatest amount of data on content removal requests, including links to the content under question, yet this data is difficult to access and poorly organized. Additionally, it examines trends in the number of content removal requests provided by a subset of 13 countries based on geographic diversity, size, Internet freedom, and the number of content removal requests submitted. Specifically, it finds that there is no significant correlation between internet freedom and the number of content removal requests, but that Turkey and Russia send the greatest number of content removal requests to Internet platforms.

Justin Chang, COS
Title of Project: The Role of International Consensus in Cyber Attribution

With so many people relying on the critical infrastructures and data housed in cyberspace, cyber attacks have the potential to harm extremely large numbers of civilians. Yet international regulations on these attacks remain largely nonexistent, as there exists no binding agreement on what states can or cannot do in cyberspace. My independent work explores the role that international consensus can have in cyber attribution, a necessity to maintaining a secure cyberspace. By looking at examples of past attacks, I present the inherent limitations of technical attribution tools and techniques, arguing that international collaboration can improve the time and efficiency involved in attribution. In response to the difficulties in achieving such a consensus, I argue for the creation of an international body tasked with attributing cyber attacks, as such a body can still improve the process of cyber attribution, without the support of all major cyber actors.

Edward Elson, CLA
Title of Project: The Idea of Progress in Antiquity

My independent work investigates whether or not (and how) an idea of technological progress might have been understood by Mediterranean societies in early and late antiquity. Some scholars have posted that an idea of progress simply did not exist in the ancient world, that the institutional capabilities of Ancient Greek and early Roman societies were perceived by their people to be static, not to develop nor accelerate over time. My independent work refutes that argument, drawing from the “Golden Age” theory of Hesiod, the lesser known personal accounts of Xenophanes (whose allusions to a collective cultural and intellectual evolution quite clearly demonstrate that an idea of progress was – at the very least – in his own mind), the philosophical works of Plato, tragic excerpts from Sophocles, and finally, a poem of human history provided by the Roman Lucretius. My analysis consists of a series of close reading of the prior texts, which is supplemented by the existent but scant philological scholarship on the subject, and ultimately makes clear that an idea of progress certainly did exist in antique thought, but not in the way that it might exist today. Technological and institutional achievement were thought, I argue, not to better nor worsen the overall conditions of the ancient human experience, but to complicate it exponentially into the future. With added achievement came, according to the ancients, an added depth of problems, ambitions, interest, and values, many of which were thought to conflict with each other. I draw these ideas the fragments of Xenophanes and demonstrate how they echoed through Plato’s Laws, Sophocles’ “Ode to Man,” and Lucretius’ On Nature.

Isabella Faccone, ORFE
Title of Project: Tools to Understand 2016 Voter Influence Tactics in Comparison with the 2020 Election: Applications of Network Topology, Information Cascades and Rumor Recurrence

Social medias have a fundamental impact on how society receives and exchanges information around political events, specifically elections. These social media networks amplified misinformation during both the 2016 election and the 2020 election despite new control mechanisms. This research determines the key frameworks for understanding the network climate that enabled such amplification and misinformation, relying on veracity, amplification and recurrence to draw distinctions between the 2016 and 2020 elections. In order to evaluate these criteria, I constructed a 2016 Twitter dataset based on previous research, and I was able to find a 2020 Election dataset for Twitter that was updated weekly with keywords, trends, politicians, and new trackers for the duration of the 2020 election period. These datasets are what I utilized to evaluate cascades, the main trends on the network and sheer mass of activity around politicians and rumors. Critically, this research demonstrates that both the role of individuals in information cascades and the features of the rumors that propagate pervasively have a large impact on the likelihood that a rumor will recur in a given network. This research shows that false rumors propagate faster and recur more often than true rumors in both the 2016 and 2020 elections, but draws a distinction between unilateral and interactive information dissemination models to demonstrate the differing effect that amplifiers have on propagation and recurrence. For rumors that disseminate via a unidirectional traditional news outlet shared via links on Twitter, the effect of a high number of verified users is limited. However, for rumors that propagate as retweets and quoted replies, which have a multi-directional and interactive model, the effect of a high number of verified users participating in the cascade was very pronounced. Thus the properties of the rumor, its veracity and the specific subset of the population through which the rumor passes each has an effect on that rumor’s overall impact and exposure to a given network of users. This research’s findings are critical to the future of social medias as they grapple with the persistence of misinformation amidst a highly volatile and nuanced digital politics arena.

Kevin Feng, COS
Title of Project: Lowering the Barrier for Web Advertisement Research at Scale

Web advertisements are essential to the day-to-day operations on the internet by providing a key channel of revenue to websites that offer content at little to no cost. However, they are also common sources of deception, scams, and privacy violations. Given their significance, ads are of interest to many different groups of experts, including web researchers, communications scholars, and regulators, but their fleeting nature makes them difficult to study systematically and at scale. This independent work presents AdOculos, a technical system comprising a search interface powered by automated visual analysis tools and a continuously updated, large-scale archive of ads crawled from thousands of popular websites. By using the system to uncover novel research questions, dimensions of analysis, and policy recommendations, I demonstrate how AdOculos and its underlying tools enable expanded possibilities in ad research.

Grace Hong, ECO
Title of Project: The Effect of Google Fiber’s Entry on Student Educational Outcomes in Kansas City

In 2010, the private tech company Google disrupted the broadband market by partnering with individual cities to offer high-speed fiber Internet through Google Fiber. In my research, I study the impact of Google Fiber’s installation in Kansas City in 2011 with student educational outcomes in Missouri’s public schools through two studies: an intra-city study of Kansas City and inter-city study between Kansas City and St. Louis in pre- and post-fiber periods. The intra-city study used a fixed effects regression model and highlighted mixed effects of Fiber on education. However, the inter-city study, using a difference-in-differences regression, showed that post-Fiber Kansas City experienced less percentages of students scoring in the worst category (Below Basic) and greater percentages of students scoring in higher categories (Proficient). As a result, this study illustrates that Fiber’s entry may be correlated with higher test performances, especially for those who were performing in the lowest categories to start with, and it provides a stronger case for continuing to close to the digital divide across the United States.

Gabrielle Jabre, POL
Title of Project: Social Media as a Narrative Battlefield: An Investigation into the 2019 Lebanese Protests

In non-democracies, civil society and the regime battle to dominate the narrative on social media. During the 2019 Lebanese Protests, social media became a place for narrative warfare between civil society and the regime. My research questions: Did social media played a sectarian-reducing or sectarian-enhancing role, and what were its effects on mass mobilization? My research design is twofold: qualitative interviews and a quantitative analysis on a portion of Twitter data. I conducted 13 zoom interviews with: social media activists, physical activists, journalists, politicians and independent media center directors. The interviewees were asked questions on both the sectarian-reducing and sectarian-enhancing role of social media. Furthermore, during these interviews, seven broad categories were discussed: social media activism, the importance of social media for the protestors, social media as a narrative battlefield, online information corruption and its impact, whether the regime fought back online, social media’s overall effect on the protests, and freedom of speech. Overall, participants argued that social media played a sectarian-reducing role, disseminated a civic narrative among a global Lebanese network, and facilitated collective and connective action. Furthermore, all interviewees argued that online information corruption was propagated along sectarian narratives that discredited the protests, but it was not a compelling enough reason for the deterrence of mass mobilization. Instead, most interviewees argued that the reduction in collective action was due to violence, COVID-19 national lockdown and economic barriers. To further investigate the relationship between sectarianism and online information corruption, I substantiated the interview results with a quantitative analysis. My logistic regression model indicated a statistically significant positive relationship between sectarianism and a false narrative online through the metric of a p-value. Therefore, the data analysis corroborated the interviewees’ insights that false online narratives were sectarian. My results highlight that a civic narrative dominated social media and played a constructive role in greater collective and connective Lebanese action against the regime. One the other hand, the results also show that online information corruption was used as a sectarian narrative tool to discredit the protests, but this was not enough to deter collective action. Thus, social media’s democratic nature benefitted a civic narrative, but also served regime manipulation.

Watson Jia, COS
Title of Project: Consistency and Distributed Gateways in IoT Environments

Distributed systems have become ubiquitous in our modern computing world, with applications ranging from telecommunications to computer networks. The Internet of Things (IoT) has integrated technology with many physical objects in our everyday lives, with applications ranging from smart home technologies to medical applications. My independent work attempts to combine two increasingly important fields in modern computing – distributed systems and the IoT – and investigates applications of distributed systems in IoT environments by leveraging multiple IoT gateways as a distributed system. This project explores fault tolerance and data consistency, which have large implications for reliability and scalability in applications that rely on both distributed systems and the IoT. This could especially impact industrial systems and infrastructural applications. I aimed to modify multiple Mozilla WebThings smart home gateways to act as a distributed system, implement a fault tolerance scheme, and identify consistency issues in smart home IoT devices within this system. Quality of service metrics of the distributed system in the form of latency measures show consistent, reasonable delays between gateways, with no large deviations from the mean. A fault tolerance scheme, in which one gateway takes over the IoT devices of another gateway that had gone offline, was able to add all devices from the offline gateway to the new gateway, and the new gateway was able to control half of the devices added. Consistency issues caused by network connectivity problems and event reorderings were identified and possible solutions were found.

Christy Lee, COS
Title of Project:  When a Virus Goes Viral: A Study on the Efficacy of Using Twitter Analysis to Forecast COVID-19 Cases

In a short period of time, COVID-19 has completely transformed the landscape of global health, economics, and society. Given the enormity of this impact, it has become crucial to more effectively prepare for and act against COVID-19; improving our ability to forecast case counts is one method of doing so. My independent work discusses a forecasting model which aims to quantify an aspect of social response in order to build a more well-rounded predictor of case trends. Because COVID-19 is spread primarily through person-to-person contact, shifts in social response to the virus can affect social behavior, and thus subsequent case numbers. By analyzing Twitter data for sentiment and frequency, the model takes into account one measure of social attitudes and behaviors towards COVID-19. This data is considered in conjunction with reported COVID-19 case data and state demographic information, inputted into a feedforward neural network model for regression, and ultimately used to forecast positive cases 3, 7, and 14 days into the future.

Austin Mejia, IND
Title of Project: Lucky Break: Regulating Loot Boxes in Video Games

Over the past four years, loot boxes have skyrocketed in popularity. These virtual crates of in-game items have become a mainstay of the video game industry, generating over $30 billion in 2019. However, with their meteoric rise comes concerns over their impact on gamers, as a growing body of evidence suggests that loot boxes are addictive. Though other nations like China and Hungary have already introduced legislation to regulate loot boxes, the U.S. has yet to establish a policy response, with no viable regulations foreseeable within the next year. My research seeks to propose a compelling and comprehensive policy response to loot boxes. Whereas many proposals solely focus on combating potential addiction, this recommendation additionally examines the structure of loot boxes and how they are embedded with “dark patterns,” or designs intended to trick players into spending more money. Ultimately, I recommend new regulations that create a stricter digital marketplace, requiring developers to disclose the odds of their loot boxes and implementing strict currency expectations. This recommendation hopes to lay a flexible foundation upon which future regulation can build on as our understanding of loot boxes continues to progress.

Sean-Wyn Ng, COS
Title of Project: Pose2Pose2: Pose Selection and Transfer for Full-Body Character Animation

To convert a video of a real-life human subject into an animation, artists often watch an original performance video of the subject many times in order to determine which body poses they tend to hold. Artists must also choose optimal points of transition between body poses within the animation. In this project, I explored the design of animation systems that are less manually intensive, which could potentially make animation more accessible to the general public by lowering its time-consuming barrier of entry. I created Pose2Pose2, a system inspired by Pose2Pose, a tool that automatically extracted and clustered two-dimensional upper-body pose data from a subject within a video, displaying them on a user interface in order of frequency of occurrence for more efficient visualization. However, Pose2Pose2 also has the ability to track full-body poses, as well identify both two- and three-dimensional pose data within a video featuring a human subject. Pose2Pose2 also includes additional features within the user interface, such as grouping rotation-normalized three-dimensional poses together and marking poses that are visually similar to poses selected by the user. Users select poses from the interface and use them as reference to draw cartoonized versions of the subject holding the selected poses. Pose2Pose2 uses the drawings to convert a new video featuring the same subject into an animation.

Vedika Patwari, COS
Title of Project: Evaluating the Impact of Data Localization on Technological Innovation in India

Cross-border data flows are playing an increasingly important role in supporting a globally digitized economy and yet, countries are attempting to regulate the flow of data through localization mandates. Using India as a case study, I examine the impact of localization on company operations and technological innovation. India’s dynamic policy environment, and its unique combination of a large digital economy and an emerging data center industry, offer insight into how localization impacts growth and innovation in countries with relatively lower levels of digital infrastructure. Given that emerging economies are also turning towards restricting the free flow of data, this is an important context within which to study data localization. Existing studies analyze the impact of localization at the national level and there is a need to better understand how localization plays out at a company level. Thus, I conducted semi-structured interviews with executives at various financial technology companies in India to understand the impacts of localization. I find that there is a high level of compliance with the localization mandate across all company sizes. Additionally, companies with local operations are able to localize their data with greater ease when compared to companies with global operations. This varied impact of localization is not addressed in existing literature and may cause multinational companies to opt out of markets with localization restrictions. I also identify an over-reliance on data centers located in Mumbai; this geographic centralization of data is a key vulnerability in the Indian financial ecosystem. In order to mitigate some of the identified risks, I recommend public and private investments to increase the availability and geographic spread of India’s data center infrastructure. Further research with more companies and in different countries is necessary to build upon my findings and to better inform future policies on data localization.

Carlotta Platt, SPI
Title of Project: Containing the Contagion: Determinants of Government Response to the First Wave of the COVID-19 Pandemic in Europe

The COVID-19 Pandemic has spared no country in the world, causing almost three million deaths in its first year. Yet governments were unprepared for and responded very differently to iterations of the same virus. My research uses quantitative and qualitative analysis to investigate what factors determined national variation of first-wave policy response to the COVID-19 pandemic in European countries, and what led to response effectiveness. I hypothesize that four groups of factors (Governmental, Political, Societal and Economic) will be significant in explaining (1) national variations in response intensity, (2) national variations in response quality, and will interact to determine (3) response effectiveness. Overall, I find that these four groups of factors explain variation in response intensity and quality, and that they interact to determine an effective response. Among them, (1) decentralization with strong intergovernmental coordination and central guidance, with reliance on few but communicative scientific experts, (2) strong leadership with low pressure from the opposition, (3) high trust and low media misinformation, and (4) a strong economy that is able to quickly increase healthcare capacity, combine to determine an effective response: one which best couples intensity with quality to avoid high numbers of cases and deaths. On a societal level specifically, I use Instagram and Twitter analysis to study how high media coverage of the pandemic, when used to spread misinformation, related to a less effective response.

Harmit (Hari) Raval, COS
Title of Project: Security and reliability implications of imprecise programming language specifications: A case-study of GPU schedulers

Programming language specifications are the rules that govern how programs behave; developers use these specifications to reason about their program properties, including safety and security concerns. When these specifications are imprecise, programmers develop applications that can behave in surprising ways. It is possible for malicious actors to exploit these surprising behaviors, causing significant societal impact. One of the most widespread parallel computing devices is the graphics processing unit (GPU). While these devices were classically used for graphics computations, they are now able to handle more general-purpose compute applications. Rapid evolution, coupled with increasing diversity of these devices lead to underspecified programming languages. This situation is ripe for security vulnerabilities given how widespread GPUs are. My research focuses on an underspecified area of GPU specification, the scheduler. Through our work in creating a thorough GPU testing framework and automatically constructing hundreds of multi-threaded test cases, we discovered instances where the scheduler can lock up. When the GPU locks up, we found that many different behaviors can be exhibited including: a simple graphics reboot or even the machine freezing completely. The latter example provides a direct pathway for a security vulnerability. By embedding our litmus tests in a mobile device application, we demonstrate the ability for such an application to leverage its low-level system access and cause visual information leakage. Overall, our work identifies serious security concerns in modern GPU devices. These concerns have severe sociological concerns given the prevalence of GPUs in modern systems, i.e., most people interact with their most private information, e.g. their daily usage on a smart phone, using devices that contain these powerful, yet underspecified processors.

Lauren Tang, COS
Title of Project: Towards the Democratization of Finance in the Context of Stock Trading

There is a growing market of millennials becoming more interested in personal finance and stock trading, especially with the onset of the COVID-19 pandemic. Many stock market brokers are making changes to their platforms to capture the millennial market. Robinhood, a high-tech trading platform, strives to bring novice investors onto their platform, stating that their mission is to “democratize finance for all.” We need to consider this: What does democratizing finance truly mean and has this goal been reached? I argue that the democratization of finance requires two parts: access and education. People need to be able to access financial systems and have the financial literacy to understand how to skillfully navigate them. I analyze differences between Robinhood and older incumbent brokers such as Charles Schwab to determine whether access to markets has increased. Additionally, I explore what means of financial literacy tools exist for investors and what can be further done by brokers. Robinhood has made great strides towards increasing access through their introduction of commission-free trading to the industry which has led brokers to follow suit. Robinhood also utilizes gamification tactics, and emphasis on the UI/UX. Examples of this can be seen in their sign up and trading process when compared to older incumbents. Issues arise when inexperienced investors don’t understand certain dangers associated with trading, such as tax liabilities and downfalls with commission-free trading (specifically on Robinhood). Commission-free trading poses a financial harm to investors because Robinhood practices payment for order flow (PFOF), which can result in users on their platform receiving worse prices on trade execution. While other brokers also engage in this practice, they pass along (PFOF) benefits to their users, but it is unclear whether Robinhood does this. Across brokers in general, tax liability is another issue for novice investors because they may not understand how stock trading is taxed based on the transaction and don’t know of tax offsetting practices. A clear example of these dangers arose in the GameStop short squeeze event in early 2021. Brokers provide access to markets but fail to provide tools for financial literacy. New investors seeking financial advice often independently look to other investors through Reddit forums, enabling a community of financial learning, but this channel of information is not always reliable. In order to continue making progress towards the democratization of finance for all, Robinhood and other brokers need to play an active role in educating their users by providing them with tools for learning and smarter investing.

Ethan Thai, ELE
Title of Project: Dr. AI: Adapting CNN Classification for the Technical and Social Challenges of Medical Diagnosis

Diagnosing medical images is a time, cost, and labor intensive task traditionally only undertaken by an expert few. Fortunately, through the development of artificial intelligence (AI) and accessibility to medical datasets, convolutional neural networks (CNNs) have become increasingly suitable for learning to conduct computer-assisted diagnosis (CAD). However, learning for medical classification comes with the unique technical challenges of low data volume, class imbalance, inconsistent labeling, and having fine image details differentiate multiple diagnoses, as well as the social challenges of respecting patient data usage and combating algorithmic bias. In this independent work, I designed (with others on this research project) a training methodology specifically tailored to the medical domain by integrating transfer-learning, dataset cleaning, and synthetic data augmentation techniques. Through evaluation of color channel variations in images used to pre-train a model, implementation of an iterative dataset cleaning scheme, and use of DeepInversion to synthesize patient-decoupled training data, small but compounding improvements to classification performance are shown. Finally, through the gained experience of developing a CAD methodology and contextualization of medical AI research in prevalent social and legal discussions, a set of privacy and bias conscious design principles are introduced.

Ryan Yao, COS
Title of Project: Safeguarding Consumer Privacy: Analysis of Data Obfuscation Mechanisms to Prevent Ubiquitous Network Tracking

The rapid rise of modern consumer Internet platforms has largely been enabled by the development of lucrative targeted network advertising models. However, these model-based platforms have engaged in unprecedented user data collection and extensive network tracking, which collectively threaten individual consumer privacy. In the absence of sufficient general data privacy regulation, a new class of user-oriented data obfuscation privacy tools has quickly grown. Using two popular data obfuscation tools — TrackMeNot, a search obfuscation tool, and AdNauseam, an ad clickstream obfuscation tool — as a lens, my independent work examines the ways in which data obfuscation — the production and inclusion of fake data to mask real data — can be applied to anonymize user data, deny data collection, and fundamentally disrupt excessive and unsanctioned network tracking. Grounded in review of recent literature, my research explores data obfuscation as a potential alternative to an otherwise narrow and exclusive focus on privacy regulation which has been the primary focus of previous work. Analysis of the ability of data obfuscation to prevent ubiquitous network tracking places it as a means of incentivizing the adoption of more responsible data collection practices and  advertising models which respect existing and future privacy standards. Ultimately,  recommend new policy initiatives, including the implementation of regulatory protections for consumer data obfuscation tools, the prevention of exclusive platform self-regulation, and the creation of regulation which works in conjunction with data obfuscation. These recommendations aim to serve as a principled foundation for use of data obfuscation in safeguarding the future of consumer privacy.

Anika Yardi, ORFE
Title of Project: Using Monte Carlo Markov Chain Methods to Understand the Mathematics and Visualization of Gerrymandering in Politically Competitive Districts

Gerrymandering is a technique used to give an unfair advantage to any one political party through the process of manipulating district lines in order to dilute the voting power of an opposing political party. Known by their bizarre shapes, gerrymandered districts are thought to be easily recognizable. However, this is not always the case, and it can be incredibly difficult to tell whether victories in particular areas occur due to legislative wrongdoing or are a natural political outcome. This is where mathematics can help. In my research, I worked to form a framework for analysis of redistricting plans of Maryland, Pennsylvania, and North Carolina. Firstly, using the powerful technique of Monte Carlo Markov Chain Methods, I came to the conclusion that while all three of my selected states have elements of gerrymandering in their redistricting plans, North Carolina and Pennsylvania are extreme examples of the technique. Furthermore, I compared court-mandated redistricting plans for these two states and determined that the implemented remedial plans were not the fairest and least extreme option of the proposed plans. Finally, I tackled the problem of gerrymandering from a policy perspective and isolated effective elements to combat gerrymandering from bills being proposed in North Carolina and Pennsylvania, which include independent commissions, required mechanisms for public hearings, and criteria for fairness like compactness.

Noa Zarur, COS
Title of Project: Profit Maximization and Food Waste Reduction

Almost half of the food in the United States goes to waste. The current solutions bakeries and restaurants have for avoiding food waste are not sufficient. The goal of my project is to reduce food waste and maximize profit for bakeries algorithmically, focusing on bread as my primary commodity of study. A challenge with creating such an algorithm is that it relies on the number of customers that show up at a bakery and buy bread throughout all points in a day. My approach recursively calls a function that returns the optimal number of breads to make in each time slot. By recursively calling my core function I run through every possible profitable outcome given all combinations of regular customers, leftover customers, and regular breads. I used average statistics of how many time slots there are per day to bake bread, the number of customers per day, and cost of making each loaf of bread to test my algorithm. My algorithm also allows the user to customize the algorithm by inputting the maximum number of customers that typically come to their bakery, their price for regular bread, their price for leftover bread, and how many leftover breads you start the day with. The results return the optimal number of regular breads to make, the expected profit from the regular bread and leftover bread, and the total profit for each time point. Next steps include adding features to allow this algorithm to be easily used by bakeries and ready for deployment. Further applications of the project might include expanding its usage to restaurants and pharmacies.

2020 Certificate Graduates Independent Work

Neel Ajjarapu, ELE
Title of Project: Applications of Machine Learning to Causal Networks for Generating Attacks in Networked System

The advent of the Internet-of-Things (IoT) and next-generation networking has led to the development of networked systems that are increasingly interconnected, complex, and vulnerable. Many of these systems, including the in-vehicle network (IVN) of the connected vehicle, were designed for closed networks, and are inherently vulnerable in this new environment. Others, including software defined network (SDN) technologies, are still in their nascent phases, but will soon be widely deployed with the expansion of 5G. These new networked systems provide attackers a large attack surface, which conventional security methods cannot easily detect and mitigate. This thesis explores the application of a causal network and machine learning (ML) based methodology, which preemptively generates attacks in order to mitigate vulnerabilities in networked systems. This methodology extracts intelligence from known attacks, represents them as causal networks, and employs ML techniques to extrapolate new attack vectors and vulnerabilities within the system. We demonstrate the feasibility of this approach by applying it to the controller area network (CAN) protocol of the IVN, as well as applying a modified approach to the OpenFlow protocol of the SDN. Within the IVN, the methodology achieves an 87.22% reduction in the search space and generates six attacks that are novel to the ML model. We generate an additional seven attacks in the SDN. We then demonstrate the respective attacks on an emulated CAN bus and emulated OpenFlow-enabled SDN.

Jake Caddeau, PHI
Title of Project: Facebook, Knowledge and Sight: Mechanisms of Information in the Digital Age

This thesis is an investigation of moral and spiritual epistemology in the Information Age. I analyze two case studies, one involving social/political speech and one involving information about climate science. I investigated these two cases in order to develop a theory of sight on Facebook—how users of these networks are emotionally and spiritually oriented to see information presented on social media in a specific way. In developing this notion of sight, I grounded my theory in a relativist picture of knowledge, drawing on philosophers Bruno Latour and Michel Foucault. I argue that a theory of good sight which we should strive for involves honoring nuance, complexity and depth and trying to see with openness and empathy so that knowledge about subjects can resonate as complete wholes.

Simone Downs, WWS
Title of Project: Social Media Impacts on Female Judging & Representation in South Asia 

In South Asia, the courts are a powerful mediator which are often the strongest conferrer of rights to citizens. I am interested in how the courts include women in their decision-making process. Even if women are present on the court, do they truly have the power to influence positive outcomes for women? One factor that could impact female perception and power is social media. Social media has greater political power in South Asia than it does in the United States. It has been used administratively by politicians to make official announcements and it has been used more maliciously to spark real-world violence and hysteria. In this chapter of my thesis, I argue that social media is an important avenue of exploration in order to determine what obstacles female justices face outside of the courtroom. I find that most women face heavy criticism online, and I posit that this could potentially prevent them from acting as boldly in controversial cases. Social media in these negative instances lessens the substantive impact that female justices have on women’s rights. However, there is evidence that women can be helped through social media. If there is enough support for female justices online, this could present a powerful force for ensuring that female representation on the Supreme Courts of South Asia is both descriptive and substantive.

Andrew Griffin, COS
Title of Project: Analysis of Potential Regulatory Frameworks for Artificial Intelligence in the Healthcare Industry

Technological innovation has long come with the promise of improving the lives of humans by increasing efficiency, complementing human intellect, and raising our standard of living. Artificial Intelligence broadly is the most recent and promising technological innovation of our generation and is no exception. However, we have recently seen evidence of the potential for machine learning algorithms to perpetuate or even exacerbate discriminatory patterns that already exist in many applications. Discrimination in historical data used to train algorithms, the outcomes chosen to pursue by technologists, and the choice of variables to include in the model can all be potential sources of bias. This paper weighs the pros and cons of several different frameworks ranging from the very onerous FDA-style clearance for algorithms to hands-off regulation focused on discrimination law in the case of lawsuits. The more onerous the framework, the more complete the check can be on discrimination in an ML setting. However, that completeness comes at a cost to the goals of the individual companies as well as a heavy burden on resources needed in order to enforce the regulation. The less onerous the framework, the more instances of bias or discrimination can slip through the cracks, although the frameworks can be more realistically implemented from a cost standpoint. My solution puts forth a framework for regulating the healthcare industry that relies on clear, transparent expectations for data storage and algorithm design so that companies can understand what is expected of them and can focus on mitigating issues of bias/discrimination in their own algorithms. With clear expectations, companies can still pursue their own goals with respect to their machine learning algorithms, while responsibly storing data and design decisions so that they can analyze their own results in lieu of important considerations regarding discrimination. With companies incentivized to check themselves, the regulatory body can efficiently and effectively examine companies on a periodic basis without putting a strain on the resources available to them.

Maia Hamin, COS
Title of Project: “don’t ignore this”: Automating the Collection and Analysis of Political Campaign Emails

The humble email remains a key way that political campaigns reach their supporters and incite them to donate, sign petitions, attend events, and get out to vote. Emails are, therefore, a rich source of data about the rhetoric, message, and priorities of different campaigns, but previous work has been narrow in scope due to the time cost of manually signing up to receive emails from political campaigns and analyzing hundreds or thousands of emails to generate insights. My junior paper was inspired by the realization that tools from computer science might help us close this methodological gap, giving us better insight into the political strategies of the moment and even helping us potentially identify overtly worrisome techniques and patterns of rhetoric in use by political campaigns today. I updated a web crawler to automatically sign up for emails, ultimately signing up for more than 1,400 different campaign mailing lists. I began the development of a pipeline for using NLP to detect the use of fear-based, “demagogue”-like rhetoric, as well as to analyze the issues about which each candidate spoke the most in their emails.

Hillenbrand, Julia, WWS
Title of Project: Examining Ohio Data Policy Responses to the Opioid Crisis in Support of Children and Families

I examined the secondary effects of the opioid crisis on Ohio child welfare and education systems. Research was a combination of literature review, policy analysis, and 12 interviews with state policymakers and other relevant professionals. I then examined the policies that have been enforced in Ohio to help combat these secondary effects. A critical finding regarded a lack of coordination and data sharing amongst federal, state, and local agencies meant to support children and families facing addiction. I dedicated a significant portion of my research to understanding how data policy is being leveraged to support those subject to the secondary effects of the opioid crisis, particularly children and families. I found that, in Ohio, two major barriers to data coordination arise: (1) the emphasis of local control in Ohio policymaking, and (2) the influence of federal data privacy policy. In interviews, the impact of poor data policy on the provision of coordinated care for children and families was an apparent strain on service providers, many of whom demonstrated a desire for more statewide case management systems than are currently available. Service providers such as Family Dependency Treatment Court and the Ohio MOMS program use case management systems that vary county to county and are not integrated with relevant data platforms, such as the Statewide Automated Child Welfare Information System. At the county level, workers are fearful to share information with other agencies due to federal 42 C.F.R. and HIPAA policy. As a result of decentralized data management, Ohio collaboration between case workers, educators, rehabilitation centers, medical centers, and other relevant treatment providers is limited. These inefficiencies mean that children may not receive resources from legal, mental health, or other relevant providers when their lives are impacted by drugs. Though local lawmakers are trying to reform current data practices through the creation of county-to-county data networks and federal policymakers have rolled back 42 C.F.R. for the reasons stated above, state policymakers should consider centralizing data networks throughout the state to better support children and families impacted by the opioid crisis.

Robert Liu, COS
Title of Project: Analyzing Platform Design for Interactive Narrative

Interactive narratives are digital experiences that allow readers to shape a dramatic storyline through their actions. There is a broad design space of platforms for interactive narratives that structure how writers design choice-based stories and how readers engage with those stories. Some new IF platforms leverage Web technologies to create more usable interfaces for narrative designers and readers, let authors write stories that use novel affordances, and even offer multiplayer capability. We analyze some representative examples in the narrative platform design space and the experiences of creating and reading on these platforms. We find that emerging platforms center around the form of “quests,” where dialogue between author and readers is central to creating the narrative through real-time reader choices.

Morlan Osgood, COS
Title of Project: Measuring the Impact of Social Media Features on Mental Health

Studies have shown a direct link between social media use and decreased mental health quality. It is implausible to stop a 40 billion dollar, 3.2 billion user industry. However, it is plausible to mitigate the most detrimental effects by identifying the features that most negatively impact mental health and proposing alternatives. This paper takes the first step in that process by evaluating the best way to utilize current scales that measure mental health effects to identify detrimental social media platform features. It found the Beck Depression Inventory- Second Edition (BDI-II) scale to be the best option to analyze a feature’s impact on depression symptoms. The paper found the Fear of Missing Out scale (FoMOs) to be the most detailed way to analyze FoMO symptoms and recommends it be used in latitudinal studies.

Matteo Russo, COS (presented independent work as a junior in spring 2019)
Title of Project: Robust OOD Detection in Secure Open-world Learning

Distributed systems have become ubiquitous in our modern computing world, with applications ranging from telecommunications to computer networks. The Internet of Things (IoT) has integrated technology with many physical objects in our everyday lives, with applications ranging from smart home technologies to medical applications. This paper attempts to combine two increasingly important fields in modern computing – distributed systems and the IoT – and investigates applications of distributed systems in IoT environments by leveraging multiple IoT gateways as a distributed system. This paper explores fault tolerance and data consistency, which have large implications for reliability and scalability in applications that rely on both distributed systems and the IoT. This could especially impact industrial systems and infrastructural applications. This paper aimed to modify multiple Mozilla WebThings smart home gateways to act as a distributed system, implement a fault tolerance scheme, and identify consistency issues in smart home IoT devices within this system. Quality of service metrics of the distributed system in the form of latency measures show consistent, reasonable delays between gateways, with no large deviations from the mean. A fault tolerance scheme, in which one gateway takes over the IoT devices of another gateway that had gone offline, was able to add all devices from the offline gateway to the new gateway, and the new gateway was able to control half of the devices added. Consistency issues caused by network connectivity problems and event reorderings were identified and possible solutions were proposed.

Meghan Slattery, ORFE
Title of Project: Spread of Extremist Ideologies Through Online Network Influence

The onset of the social media revolution has allowed unprecedented levels of global communication, allowing for an increased spread in some of the darker ideologies affecting the world today. For my independent work, I analyzed data from the Profiles of Individual Radicalization in the United States (PIRUS) dataset put forth by the START Program at the University of Maryland. From the initial set of over 2,100 deidentified individuals, I tracked incidents over time, geographic region, type of extremism, and preferred social media platform. Highest recorded individuals were found in the U.S. South, throughout the country, Islamist ideologies were spread most commonly through social media, and the most common platform for radical social media use was Facebook. Using these and other findings, I identified the regions, extremist trends, and social media platforms most significant to the spread, and suggested basic policy recommendations for stopping the use of social media technology for extremist purposes.

2019 Certificate Graduates Independent Work

Casey Chow, COS
Title of Project: Opinion Authorship in the United States Supreme Court

In modern American jurisprudence, the decisions of the Supreme Court of the United States (SCOTUS) are treated as de facto law. However, little is officially disclosed about how justices develop their opinions; internal memos from a the bench take decades after the respective justice’s death or resignation to be released. This paper applies the natural language processing technique of author verification (AV) to determine to what extent justices write the opinions they present to the Supreme Court.

Jeremy Colvin, WWS
Title of Project: Unchecked Ambiguity and the Globalization of User Privacy Controls Under the GDPR

Recently enforced in 2018, the European Union’s General Data Protection Regulation (GDPR) builds on past EU privacy directives and establishes a new era of data privacy emphasizing increased transparency, accountability, and user control in regards to the processing of personal data. This research project monitors the citing and application of Article 6(f) (processing under a ‘legitimate interest’) among 2,275,137 privacy policies pre-GDPR and 1,937,894 privacy policies post-GDPR, presenting a rise in usage from 104,143 policies (4.58%) pre-GDPR to 185,014 (9.55%) post-GDPR enforcement. The second section of analysis investigates the globalization of the regulation and the ability for controllers to fulfill specific articles of the GDPR relating to user controls. A second dataset evaluates 43 of the most popular U.S. Alexa ranked websites on their ability to fulfill Articles 17 (Right to Erasure) and 20 (Right to Data Portability) when approached from two different IP addresses (U.S. and EU).

Jamie Cuffe, COS
Title of Project: Defining The Field of Venture Analytics: Applying Computational Neural Networks to Model Market Sentiment

This paper aims to define a framework for the field of venture analytics, the application of computer science, technology and data science to the venture capital industry. The first step outlines the three key stages of the industry, sourcing, investing and portfolio support considering technology’s role in each. A four pillar framework, involving team, product, traction and market, is developed to quantitatively evaluate companies. The highest leverage opportunity is in modelling a leading indicator for the market component. Applying a computational neural network to topically categorized features derived from news sources achieved 86.8% accuracy in its classification of successes and failures. The network was particularly effective in removing bad companies with a true positive rate of 99.1% for failures. However, it struggled to get full coverage of the winners with a true positive rate of 21.3% for successes. The results were evaluated at the feature-level to uncover the intuition within what was previously a black box approach. It was found that operations and industry news categories were significantly more predictive of startup success than financial news despite the fact funding news is widely used as a signal today. Finally, interviews with industry experts uncovered that these results are applicable to venture capital today. This paper outlined a roadmap for the creation of the venture analytics industry, which many are excited to build off of to make the industry a reality today.

John Ennis, WWS
Title of Project: Coverage of Iraq in the Media: Bush’s Fait Accompli

The 2003 invasion of Iraq was primarily predicated on Saddam Hussein’s supposed ties to al-Qaeda and an estimation that the country was continuing the development of weapons of mass destruction (WMD). Both of these claims were later found to be false, and the Bush administration, as well as the intelligence community, bore significant amounts of blame for this notable mistake. Yet a third group, the news media, was also denounced by many policymakers, scholars, and general citizens for not digging deeper into the administration’s reasoning for going to war and the intelligence produced to support this decision. However, rather than placing blame on the whole news media, I find that the Bush administration actively engaged “willing mouthpieces,” or weak spots in the news media. While most print media reported fairly on the information that the Bush administration disseminated in support of the Iraq War, the administration used the reporting of Judith Miller, as well as the broad support of the television media, to promote their agenda. The role of the media, as well as misunderstandings surrounding the complexities of its involvement in promoting the White House’s agenda, may have negatively impacted American trust in the media, which has been reduced by almost half since 2003. The news media is both a technology in its own right, and is usually accessed through technology, and my work has shown that the different types of technology used to convey and access journalism at this time played a role in the information that society received regarding an invasion of Iraq predicated on false assumptions.

Michael J. Friedman, COS
Title of Project: Adversarial Design Patterns: Findings From a Crawl of 11K Shopping Websites

Adversarial Design Patterns (ADPs) are user interface design patterns that coerce, manipulate, or deceive users into making unintended and potentially harmful decisions. These patterns are particularly common on shopping websites, where they pressure users into making more purchases or disclosing more information than they would otherwise. We conducted a large-scale study, analyzing ~53K product pages from ~11K shopping websites to characterize and quantify the prevalence of ADPs. To do so, we created an interactive crawler that simulates an actual user browsing a shopping website, and then we used text clustering to analyze the resulting data set. As a result, we discovered 1,764 ADP instances, together representing 15 types and 7 categories. We examined the underlying influence of these ADPs, documenting the kind of impact they have on user decision-making. We also examined these ADPs for deceptive practices, and found 117 websites that display deceptive messages to their visitors. In addition, we uncovered 22 third-party entities that offer ADPs as a turnkey solution. Based on our findings, we make recommendations for stakeholders including researchers and regulators to study, mitigate, and minimize the use of these patterns.

Priya Ganatra, ENG
Title of Project: When you look in a mirror, you see yourself. When you look at a screen, is it really you?

What does it mean that technology now controls an aspect of our identity? Advertisement agencies and social media control individual identity because the identity is separated from the individual – it is on a platform that can be distorted. Companies profit off of the human connections that are made. Something that previously could not be measured by money, now is. Instagram feeds are full of idealized versions of humans. This narrative is not new. It has been rooted in culture for as long as time has existed: humans seek attention. This video presentation will explore these types of technologies and their impact on human identity.

Leora Huebner, COS
Title of Project: EyeBeat Revolution: Tackling Accessibility by Creating Sound and Music in Virtual Reality

Technology has an almost unlimited potential to help people in need, and to lower the entry barriers for individuals with disabilities to participate in all aspects of society. This project, EyeBeat Revolution, is a product designed to enable people with severe physical disabilities to play music. It is an accessible music instrumentation environment that uses the Fove 0 virtual reality headset to create a realistic, immersive experience of playing the drums controlled solely by eye tracking. This paper highlights current issues and developments in accessibility, music education and therapy, and gamified and accessible music. It details the process of designing and creating an accessible music-playing environment in virtual reality. It focuses primarily on the benefits of the product, from both cognitive and technical perspectives, and on the technology behind creating the music.

Jay Li, COS
Title of Project: Who Polices the Online Police? Measuring User Profile Bias in Social Media Content Moderation

Content moderation is the cornerstone of online discourse today. All content that users share stays online by following rules set forward by technology companies. These companies, such as Google, Twitter, Facebook, and Reddit, each have their own teams of content moderators that review text, photos, videos, and other forms of medias on the platforms per these rules. However, these guidelines cannot cover everything; there exist grey areas where moderators refer to company policies only as guidelines, consider the context, and then apply their personal judgement. My research investigates if people exhibit this inherent bias in moderating flagged social media activities. I specifically focus on Facebook and the company’s Community Standards. I use two separate surveys of Princeton students to see how they rate sample posts against these Standards and to gather their demographic information. I have found that students of different races, genders, and ideological leanings evaluate online content differently when presented with the same Facebook post shared by different Facebook users. This difference varies by the user’s as well as the students’ own demographics. As content moderation is progressing towards a more automated process with machine learning and artificial intelligence, my results show that this biasness has to be resolved before moderation is completely automated. They can be applied to further developments in content moderation and progress the current literature on free speech theory and technology policy that have arisen for online content providers today.

Grace Miles, COS (presented her independent work as a junior in spring 2018)
Title of Project: The Online World Versus the Physical World: An Analysis of College Students’ Social Networks

The purpose of the research is to understand how structurally similar, or isomorphic, Princeton student’s online social networks are in comparison to their physical social networks. This study surveys Princeton undergraduate students (n = 464) about the nature of their usage on three separate online social network platforms: Facebook, LinkedIn, and Instagram. Findings include an online social network structure that is greatly inflated compared to their real world counterparts including large numbers of stagnant ties. The implications of this study are increasingly relevant in defining the role of stagnant ties in our lives and what are the mental health and personal repercussions of these increasing online networks.

Heather Milke, FRE
Title of Project: The Evolution of Emmanuel Macron’s Start-Up Rhetoric and Where It Is Leading France

This paper examines the development of start-ups and entrepreneurship in France immediately following the election of President Emmanuel Macron in May 2017. Building off of the momentum of growing entrepreneurship and technology innovation in France over the last ten years, Macron’s policies reflect pro-business attitude and entrepreneurial spirit. Macron and his party, La République En Marche, developed a rhetoric of innovation and forward thinking. Part of this rhetoric draws parallels between France and the start-up scene in United States, particularly Silicon Valley. It also establishes France’s national identity around entrepreneurship, calling for France to become a “start-up nation.” Tracing through Macron’s discourse from the launch of his party, La République En Marche! (or simply, En Marche!) in April 2016, to his campaign, his Presidential election in May 2017, and finally through his first year in office, it is clear how Macron’s rhetoric of innovation is a response to France’s economic and political situation before his rise to power, an imitation of the United States’ start-up culture, and an attempt to place himself in opposition to President of the United States, Donald Trump. While the outcomes of Macron’s reform remain uncertain, this paper examines the effects of Macron’s rhetoric on France’s business and society one year into his presidency in order to better understand France’s status as a “start-up nation” moving forward.

Oluwapelumi Odimayo, COS
Title of Project: A Theoretical Framework to Analyze The Matchmaking Process of Early Stage Venture Capital Investment

The venture capital (VC) investment process has remained largely unchanged over the last several decades. Since the tech bubble burst in the early 2000s, we have seen a considerable increase in capital going into early-stage startup investment. We have also seen the number of exits (mergers, acquisitions, and IPOs) and funding rounds increase at a commensurate rate. However, a critical aspect of the venture capital funnel has seen a steady decline over the last few years. The cumulative number of early-stage startups receiving funding has decreased even though the amount of capital being put into these companies has risen steadily over the same period. This paper uses a theoretical framework grounded in behavioral economics and informed by anecdotal evidence from practicing venture capitalists to gain insight into the early stage investment process. We propose process interventions to the matchmaking process to alleviate inefficiencies uncovered during the research phase of the project. These interventions were designed with respect to key considerations put forth by practicing early stage venture capitalists. Finally, we highlight the real-world limitations of these process recommendations as well as possible avenues for future work.

Jake Reichel, COS
Title of Project: Understanding User Privacy and Social Media Usage in South Africa

Social media usage in the developing world is continuing to rise. However, research about the many associated privacy concerns has mostly been limited to studying social media use in more developed settings. In this study, I show how mobile social media users in South Africa are making use of the privacy settings and controls on social media platforms. I present findings from interviews of 52 current mobile social media users in South Africa, ranging from low-income users to upper middle class users. There were several themes that emerged. First, users’ primary privacy-related concern was surrounding who else could see their posts and messages as opposed to what data the platforms collect about them. Second, users displayed general knowledge gaps on both existing privacy settings on social media and data collection e orts by advertisers. Third, users’ considerations for their own physical safety often shaped their attitudes towards privacy online. Fourth, usage of privacy settings and conceptions of privacy are heavily swayed by offline social factors such as perceived intimacy of a platform and information sharing amongst friends. Based on these findings, I make recommendations for social media designers, companies, and regulators to ensure that user privacy is maintained on social media.

Cierra Robson, AAS
Title of Project: In the Eye of the Shareholder: Racialized Surveillance Capitalism in Oakland, California

Focusing on the construction of a state-sanctioned city-wide surveillance center in Oakland, California called the “Domain Awareness Center” (DAC), this work considers how economic, social and political priorities are used to justify enhanced surveillance of racialized populations. In tracing the public and private networks of capital and ideology that emerge from the DAC, I argue that the the hyperlocal policing technologies of the city are intimately linked to the economic priorities of private companies and the foreign policy agenda of the United States. To explain this dynamic, I develop the concept of racialized surveillance capitalism to describe the ways in which governments mobilize private companies to surveil the most marginalized populations for the purposes of amassing economic and political capital. Such a collaboration is mechanized by systems like the DAC, deploying seemingly objective technologies and statistics which furtively reinforce existing social hierarchies in the name of efficiency and impartiality. In analyzing the dissonance between the official record—government reports, City Council Meeting minutes, and Oakland city contracts and budgets—alongside the resident archive—Twitter feeds, Op-Eds, and pubic testimonies at town-halls—I argue that the same surveillance center simultaneously protects some and terrorizes others. I dispel the myth that legal battles are the best solution to the potential problems emerging from state-sanctioned surveillance: the law, according to critical race theorists, has systematically excluded many and therefore cannot be the only means of liberation. Instead, I turn to those protesting against the DAC as exemplars of radical and imaginative solution-making.

2018 Certificate Graduates Independent Work

Adam Berman, COS (presented independent work as a junior)
Title of Project: A Computational Pathway for Identifying Metabolites Relevant to Cancer Development

This work describes a computational approach to identify “driver” metabolites and metabolic pathways that are most crucial to cancer development and growth based on pre-existing large-scale genomic databases of mutational and expressional cancer data. The implementation relies on the development of formulae to assign cancer associative scores to each of many biologically active metabolites from the publicly available cancer data. Thereafter, the metabolites can be ranked according to their scores, and the ranking can be cross-validated with a list of metabolites known to take part in cancerous metabolic pathways using machine learning techniques.

Given cancer’s known tendency to alter the body’s metabolism, we feel that this metabolite-first approach may bring to light previously overlooked metabolites that are crucial to cancer development.

Kelly Bojic, ECO
Title of Project: How to Save a Life” (and Maybe Money): The Effects of Information Technology on Health Spending During Economic Downturns

My paper investigates the effects of information technology on health care spending per capita during economic downturns. Information technology has the potential to create substantial savings to health care spending, especially during economic downturns, when private health care spending and capital investments decline and federal spending for health care rises. I distinguish between supplier- and consumer-facing information technology by looking at health information technology systems on the supplier side and iPhones on the consumer side. Among health information technology systems, I focus on Electronic Medical Records systems and Clinical Decision Support systems. I find that health information technology systems are associated with a 14.5% rise in health care spending per capita, reflecting an increase in costs to medical facilities. However, the rise in health spending associated with health information technology use falls 2.41% for every percentage point increase in the unemployment rate. I do not find a significant relationship between consumer-facing information technology and health care spending.

Luisa Goytia, COS
Title of Project: Amazona – A framework for mobile context-aware personal security

Personal security is a global concern that ignores race, gender, age, and socioeconomic status. It can be gravely compromised without warning in a dangerous location as well as in a perceived well-protected neighborhood. Calling 911 is the default emergency protocol but this might not be possible or feasible during many emergencies. In the case where contact can be established, emergency responders have very limited information to formulate an effective rescue protocol.

This thesis introduces Amazona, a context-aware mobile framework that discretely and rapidly distributes relevant system and user information upon activation in an emergency situation. The context-aware nature of the system allows the mobile device to gather information about its environment and to adapt its emergency protocol accordingly. Based on the selected protocol, Amazona sends an updated information package, containing location, video/images, among other data, to pre-selected emergency contacts upon the consecutive presses of the device’s power button. The package does not just reflect a snapshot of the user’s condition at the time of the emergency but also includes data from the past, creating a detailed timeline of the user’s activity before the alarm is activated. The system continues sending these packages with updated information, if available, until the alarm is deactivated without relying on any other input from the user.

This approach facilitates access to different emergency contacts, improves the likelihood of rescue and diminishes the risk a user has to take to reach out for help using traditional methods.

Sreela Kodali, ELE
Title of Project: Monitoring Mental Health with a Multimodal Sensor System and Low-power Specialized Hardware

This work presents a system that utilizes smartphone and wearable data to understand human behavior and facilitate mental health monitoring. Although mental disorders are exceedingly prevalent, their diagnostic methods are much more antiquated than their physical ailment counterparts. Mental health diagnosticians employ subjective surveys, professional observations, and patient recall to capture a patient’s changing behavior, but these approaches are severely limited. Mobile devices introduce a unique opportunity to quantify and unobtrusively record data on human behavior. Predictive classifiers and neural network (NN) models can interpret the data and yield meaningful behavior classifications. With the inclusion of specialized hardware, a secure and energy-efficient predictive system can be developed to identify worrying behavior and encourage users to seek medical help. In this work, multiple predictive models associated with worrisome mental health behaviors are developed with existing datasets, optimized for performance, and ported to hardware accelerators. Energy and power metrics for the models are estimated and used as design considerations for a realizable end-to-end system.

Marion Lewis, WWS
Title of Project: Understanding the Effects of High Jump:
A Study of One Chicago-Based Supplementary Academic Enrichment Program for Low-Income, High-Achieving Middle School Students on College Outcomes and Interest in STEM

This work analyzes the effects of High Jump. Located in urban Chicago, High Jump is a tuition-free supplementary academic enrichment program for highly motivated seventh and eighth grade students of limited financial means. Specifically, the objectives of this thesis are to assess the relationships between participation in High Jump and college outcomes, as well as participation in High Jump and STEM interest. I use an observational research design that compares students who applied and were selected for High Jump to those who applied and were not selected. With data collected from the National Student Clearinghouse on college enrollment, graduation, and declared major, I examine the association between High Jump and college outcomes. Using college major choice as a proxy for STEM aptitude and/or interest, my findings can then be applied more generally to draw conclusions about how well High Jump alumni will fare in the future digitized workforce and provide recommendations about how to modify education policy in developmental years so as to safeguard the human advantage that machines may begin to threaten. After running several OLS regressions of college graduation, type of college, and STEM major on High Jump completion status, my results indicate that High Jump completion is correlated with 0.112 (significant at the 0.01 level) increase in college enrollment, 0.229 (significant at the 0.01 level) increase in college graduation, 0.169 (significant at the 0.01 level) increase in enrollment at a selective college, 0.177 (significant at the 0.01 level) increase in enrollment at a four-year college, and 0.211 (significant at the 0.01 level) increase in enrollment at a private college. Furthermore, High Jump completion is correlated with 0.068 (significant at the 0.01 level) decrease in enrollment at a two-year college, 0.106 (significant at the 0.01 level) decrease in enrollment at a public college, and 0.114 (significant at the 0.01 level) decrease in pursuing a STEM major. I conclude with policy implications related to universal college access and STEM education.

Sarah Muse, WWS
Title of Project: The Surprising Success of the European Union Member States’ Migration Policies: An Analysis of Greece, Italy, Spain and France’s Border Control

This paper attempts to discern how different Member States within the European Union were able to effectively restrict the number of illegal refugees entering their borders through the employment of various technologies and treaties. My paper specifically looks at Greece, Italy, Spain and France. Ultimately it demonstrates that the nations successfully controlled their borders from illegal migrants by erecting physical barriers and by creating and employing surveillance technologies to track and detain refugees attempting to cross the sea from North African into the southern border of the EU. The paper focuses specifically on Spain’s creation of unique and effective technologies including SIVE, or an Integrated External Vigilance System, that is capable of not only tracking migrant traffic across the Mediterranean but also alerting authorities in both African and European nations about these crossings to help them detain the illegal refugees.

Julie Novick-Lederer, COS
Title of Project: Using Sentiment Analysis to Detect Gender Bias in Course Evaluations for Professors at Princeton University

This project assesses whether gender bias exists in Princeton University’s course evaluations using a combination of natural language processing techniques. Gender gaps and discriminatory practices in the workplace and in the hiring process are issues that have been frequently brought to the table, and the world of academia has not escaped scrutiny. We seek to understand whether male professors and female professors are being evaluated differently by students and analyze how the possible existence of gender bias may affect the demographic of both students and professors in a variety of fields and departments at Princeton. In order to conduct a meaningful study, we analyze the evaluations written anonymously by students from Princeton’s registrar for a chosen set of departments and courses. Using Princeton evaluations as the dataset provides a unique lens through which we can assess how ratings, sentiment, and language used differs between female and male professors. We discovered that while the overall sentiment and ratings of courses taught by female professors were not consistently more negative than that of their male colleagues, the language used in the evaluations is definitely gendered. Moreover, the discrepancy between the number of women and men in academia still remains a problem both at Princeton and at universities across the nation. This study incorporates the use of technology, specifically large data manipulation and sentiment analyses, to understand a very topical societal issue. We hope that this project can be extended further to see whether there is a connection between the fields where employees are evaluated and the number of women in that area of work. This project assesses gender gaps in Princeton’s faculty from an interesting angle and sheds light on the issue of the lack of women working in academia.

Luke Petruzzi, COS (presented independent work as a junior)
Title of Project: Virtual Reality for Nuclear Arms Control: Prototyping New Verification Processes

Nuclear disarmament has come a long way since the detonation of the first nuclear weapon. International treaties and efforts pursuing non-proliferation, bans on testing, and dismantlement of deployment vehicles including aircraft and missiles are ongoing. However, there are no treaties to dismantle nuclear warheads themselves. Convincing nuclear weapon states to come to a consensus on the scope of a treaty to dismantle warheads is extremely difficult. Currently, in-person walkthroughs of proposed treaties, inspections, and verification processes are performed by multiple nations to gauge treaty effectiveness and capacity for adoption. This process is very expensive, time consuming, and introduces intrinsic security and safety risks. This project addresses these problems by employing a networked Virtual Reality application in which international parties can perform confirmation walkthroughs and rapidly prototype new verification processes for treaties. Our approach was analyzed on Princeton students and has already shown positive results that provide insights on a potential verification process for mapping the location of warheads within a facility. This project also created an open source template for multi-user virtual reality developed on the Unity engine: a resource that did not previously exist.

Maya Phillips, COS
Title of Project: A Metadata API for the Fragile Families and Child Wellbeing Study

The Fragile Families and Child Wellbeing Study is a longitudinal study of children from ages birth-15. The data from this study is currently being used by social scientists and data scientists to combine predictive modeling with other tools to predict key outcomes in the lives of young children. This data includes information about the children, their parents, their schools, and their larger environments. The recently collected year-15 data helps us to understand how these factors are predictors of success. Researchers at Princeton are now constructing better metadata to support the raw dataset of survey responses in order to make predictive modeling easier, and to make the models themselves more predictive. My project is an API that manipulates and partitions this metadata along lines that will allow researchers easier access to the information. In both an online version and an embedded code version, this interface will allow the for the widest use of the metadata, and that will eventually give us better insight into the predictive nature of the data.

Tyler Sullivan. ECO
Title of Project: Contrasting Technological Development’s Effect on Labor Share and Industrial Concentration in
U.S. Manufacturing and Information Industries

The rise of the digital economy has had an undeniable impact on industrial dynamics in the United States. The productivity gains resulting from technological development are manifesting themselves differently throughout the economy, and this study specifically contrasts the U.S. manufacturing and information industries in order to forecast changing industry dynamics. Two central dynamics are studied: changing labor share and changing industrial concentration. While productivity gains derived from increased digitalization have been leading to generally falling employee labor share and increasing “superstar” firm concentration, this study shows that the results are heavily nuanced by industry. Using data from the U.S. Census Bureau and the U.S. Bureau of Labor Statistics, this study indicates a technological “rebalancing,” where older, less digitally-native industries like manufacturing are beginning to act more like newer, digitally-native industries in an effort to mature into the digital economy. This study finds a positive long-term effect on labor share, but a negative long-term effect on industrial competition. With the rise of advanced artificial intelligence and technological unemployment, monitoring changing productivity’s effect on key industry dynamics will continue to be a crucial study with significant economic and public policy impacts.

Rebecca Weng, SOC
Title of Project: Negotiating Twitter as a Space for Digital Activism: A Case Study of #MeToo

Currently we are seeing a variety of everyday interactions that used to be exclusively offline being forced online in one way or another—from professional correspondence to advertisements to teaching and learning. But there has been particular skepticism around online activism both in everyday and scholarly discussion. This study focused on the case of #MeToo on Twitter. From October 15, 2017 to March 31, 2018, I gathered more than 6,000,000 tweets that used the hashtag, as well as tweets that used related hashtags about women’s rights. I used these tweets to form a sampling frame, which I used to contact research subjects over Twitter. This study offers insights with regard to recruiting research subjects on sites like Twitter. Through 10 in-depth interviews, this study shows how Twitter users are mapping between the traditionally offline activity of protest and online hashtag activism. In this case, Hollywood, the 2016 Presidential election, and the utilization of social media by news outlets to spread information all play crucial roles.

Natalie Wertz, COS
Title of Project: LeadHERship: An Analysis of Gendered Portrayals in the Media

Over the past decade, people from all over the world have become increasingly reliant on online sources to read the news and keep up with current events. Although news articles are meant to be unbiased sources of information, they often contain the same problematic biases that humans exhibit in everyday life. As the popularity of online news continues to increase, it is necessary to uncover the gender biases within these articles in order to counteract inequalities present in society today. The aim of this study is to analyze the many axes in which gender bias is present within news articles in order to generate new ideas on how to mitigate, and eventually eliminate, such biases.

2017 Certificate Graduates Independent Work

Elinor (Nora) Buck, PSY
Title of Project: Books Judged by their Covers: Revealing the Hidden Gender Biases in Impressions of Competence

Many of the inequalities in the world today are the result of incorrect, biased perceptions based on appearance during a first impression. This study shows that certain faces are perceived as less competent than others because of their feminine properties. With a statistical face model (Oosterhof & Todorov, 2008; Todorov et al., 2013), we uncover this implicit bias that indicates women are perceived as less competent than men through trait judgments after exposure to faces manipulated on a scale of competence. Experiment 1 re-validates the Competence Model, an existing model of this social dimension (Todorov et al., 2013), as well as tests for the relationship between this model and trait judgments of attractiveness. This allows us to create a new model, a Difference Model, for Experiment 2, which looks to validate a [competence – attractiveness] model, as well as perform additional trait judgments of confidence and masculinity. Finally, Experiment 3 uses both the Competence and Difference Models for a gender classification test, providing strong evidence in favor of the hypothesis. The trait judgment of competence for both the Competence and Difference models demonstrated that the models were accurate in their representations, as the mean ratings increased steadily from the lowest to the highest SD. In addition, the gender classifications yielded results very similar to the competence judgments, with the proportion of faces rated as “Male” rising steadily from the lowest to the highest SD of competence. Most notably, the Difference Model, having removed the effects of attractiveness, shows an even stronger effect from each SD to the next. Therefore, we can conclude that perceptions of competence are highly correlated with gender bias against women – a troubling outcome for society.

Does technology aid this effect or detract from it? While the internet has served as an incubator and amplifier for many social phenomena, it may present an opportunity for women especially to be judged primarily on demonstrated competence instead of perceived competence.

Caroline Congdon, COS
Title of Project: Finding An Effective E-Waste Solution

As we move further and further into the technological age, production of electronic devices of varying purpose, size, and composition continues to expand rapidly. This growth has enabled numerous advancements in several areas, and this progress is a concept that has come to define the current era. Although growing use of technology has made an enormous positive impact, significant pitfalls have also emerged. One such hazard is the sizable increase in the production of electronic waste, also known as e-waste. Discarded televisions, personal computers, refrigerators, tablets, and mobile phones don’t follow a clear path to recycling, salvage, or disposal. Instead, they sit in storage or in landfills, often without protections to keep them from making a toxic environmental impact. Over three and a half million tons of e-waste was generated in 2012, almost double the amount from fifteen years prior. This number continues to grow at an alarming rate, while the recycling rate has stayed relatively low. A solution is necessary in order to combat the realities of this trend, which include dangerous environmental impacts, especially in developing nations. Such a solution will require logistic work, legal justification, and funding. This paper explores some of the proposed solutions, and synthesizes their best elements, via the lens of planned obsolescence.

Aleksandra Czulak, ECO
Title of Project: An Analysis of the Effects of School Closings and Openings on Crime in Chicago

In 2013, the Chicago Public School District closed almost 50 underutilized elementary schools and opened over 40 new schools of which over 30 were new charter schools that were mostly high schools. Two key issues continue to plague Chicago: access to high quality public schools and crime. These shocks to public schools are occurring in most major cities across the United States and it is essential for there to be more research on the effects of school changes on neighborhoods and communities. I analyze the effect of new school openings and school closures, beyond the classroom, on crime in Chicago community areas. The crime categories that I analyze are homicides, violent crime, index crime, and non-index crime and I use quarterly data on crime, changes in schools, and demographic data for Chicago from 2010-2016. Access to publicly available data by cities and the use of big data by cities and police departments offer new ways of looking at issues, like crime, which may result in best practices that may be adopted by other cities; however, it also requires a deeper understanding of the communities and the potential consequences of such initiatives.

Carole Touma, COS
Title of Project: The Racial Education Gap: A Look Beyond AP Exam Scores

This paper intends to provide an overview of the different ways AP exam data have been visualized during this semester’s independent work with respect to race, gender, particular AP exam, and year. The two main visualization methods implemented and discussed in the paper are line graphs plotting AP scores and participant volumes for various demographics over time, as well as an interactive map of the United States where race, gender, AP exam, state, and year can be modified.

Daphne Weinstein, COS
Title of Project: Creating a Decentralized Marketplace for Sensitive Data

Recent experience has taught us that centralized data storage is prone to attacks and surveillance. As more IoT devices come online, there will be consumer demand for a better model. We propose a system for decentralized storage and sale of user data. We build a decentralized data storage system using an existing blockchain and peer-to-peer file sharing system. This is the first system of its kind to take advantage of the affordances of smart contracts to sell data automatically upon conditions delineated by users. We propose the development of a rich vocabulary to describe and automatically ‘compile’ user preferences into smart contracts through which data items or streams of data can be automatically sold. In this way, our system empowers users to take control of their own data: selling it or storing it with exclusive access themselves.

Samantha Weissman, COS
Title of Project: You Are What You Eat: An Analysis of Manhattan Restaurant Health Inspections

Restaurant health inspection scores are an important factor in consumer decision making. My project analyzes Manhattan restaurant health inspection scores, as made publicly available through the New York City Open Data portal, with respect to income and demographic information in order to glean insights into the relationships between restaurants and their corresponding neighborhoods. The research project integrates income statistics and neighborhood composition to recognize and interpret trends in the restaurant inspection data. The project concludes that relevant trends do exist between restaurant health inspection scores and income and demographics; notably, a positive correlation was found between median household income and distributions of inspection scores, connections were made between concentrations of cuisine types and ethnic neighborhoods, and relationships were drawn between temporary restaurant closures and health inspection results.

Jeremy Zullow, WWS
Title of Project: Parking and Traffic in Urban Areas: Harnessing Technology to Drive Market-Based Solutions

Until recently, city governments relied on outdated policies that created surplus demand for parking spaces and, as a result, increased traffic. This was due primarily to the embedded belief that on-street parking should be kept free. However, solutions were also impeded by limited technological means to develop pricing and enforcement systems that could effectively manage parking availability. New technological systems facilitate real-time data curation and enable policymakers to change parking prices in response to consumer changes, and vary prices in areas of high and low parking demand to increase revenue and reach an optimal level of demand, given the fixed supply of parking spaces. In the process, they are also changing motorists’ behaviors and reducing traffic congestion. Available case studies from Seattle, San Francisco, and Redwood City indicate that technology-driven parking policies can be substantially more effective at reducing parking over saturation and traffic in large cities, while small cities may not need to adopt complex technology infrastructure to have a similar impact.

2016 Certificate Graduates Independent Work

Louis (Britt) Colcolough, ENG
Title of Project: Technology and Princeton: What the Education Giant Has Yet to Learn

American higher education is about go through a period of radical change. With an explosion of educational innovations like massive open online courses (or MOOCs) and flipped classrooms, college is beginning to look more and more different. These new ways to learn coincide with a widespread public outcry for colleges to reform themselves in the face of rising student debt and seeming undergraduate incompetency. The result, as Andrew Delbanco puts it in his book College: What it Was, Is, and Should Be, is that it would be “foolish to doubt that higher education is on the verge of upheaval.”

My work focuses on how Princeton has reacted and continues to react in this climate, specifically in the humanities departments. As technological innovation marches forward, what has a Princeton liberal arts education done to adapt? What are the intersections between technology and the humanities here, and do the two even complement one another in a meaningful sense? My research suggests that Princeton has a lot to learn.

Benjamin Dobkin, SOC
Title of Project: To Catch a Redditor: Studying the Identity of Anonymous Users on Reddit, Online and Offline

Anonymous users of online networks deal with the conundrum of needing to create a new identity to participate as individuals in a network. An anonymous identity is paradoxical as the two terms directly oppose each other. However, users who desire to benefit in their network, must create an identity to maximize their online presence and social capital. In the offline world, a separate social capital exists, as users shun their online identity to better achive physical benefits. The social news site Reddit presents this difference between the online and offline, and allows us to view user histories, conduct in depth interviews with users, and observe offline meetups. Although the research focuses on the offline and online realm separately, it emphasizes the connection and overlap as all users participate in both worlds. This work aims to analyze the backwards progression of online to offline interaction among Reddit users.

Caroline Haas, COS
Title of Project: Bidding for Healthcare: A Two-Sided Market Exploration

America’s healthcare system is broken. The cost of care is rising and consumers are faced with limited choice and minimal pricing data. In this study, I present the idea of a novel healthcare marketplace where physicians bid for patients’ business. The marketplace would provide pricing insights, quality metrics, and discovery opportunities to patients and doctors, thereby injecting competition into this broken healthcare system. To test the feasibility of such a marketplace, I composed market research surveys. Through the research firm Qualtrics, I administered the surveys to 50 recent (past 3 years) patients of Lasik eye surgery and 10 current physicians of Lasik. Lasik patients and physicians were chosen because the specialty is non-insured, the price is highly variable, and the procedure is relatively high-volume in the US. The results indicate that patients are eager for shoppable healthcare and like the proposed marketplace, but that physicians are more resistant to technology and to direct competition. For both patients and physicians, it is clear that for the proposed marketplace to achieve success, the design and marketing must be careful to cater to both consumers’ and physicians’ psychologies.

Jack Hudson, COS
Title of Project: TigerTreat: An Exploration of Technology and Generosity Integration and Student Venture Policy at Princeton University

Recently, many universities have been rigorously promoting innovation on their campuses. Although universities are building resources to enable innovation, many have policies that restrict student innovation in order to protect their non-profit and tax-exempt status. This paper explores a treat delivery service called TigerTreat, its integration with the Princeton University campus, and the university policy hurdles it encountered. By reviewing Princeton University’s policies towards revenue-generating student activities and comparing Princeton University’s policies to those of other universities, this paper proposes three recommendations for Princeton University policy changes: (1) create an official document detailing Student Agency regulations, (2) create a separate set of regulations for profit-generating activities, and (3) establish an objective administration approval and oversight structure.

Samuel Jordan, COS
Title of Project: Bad Choices made by Brilliant People: Explaining the lack of transport-level security on the modern Internet

The Internet, as it exists today, is one of the most successful tools for communication that humans have ever built. A recent major concern about the Internet, greatly fueled by Edward Snowden’s release of documents detailing the NSA’s broad and pervasive Internet eavesdropping capabilities, has been the absence of encryption. Despite some clear architectural advantages for this approach, there is no widely adapted protocol for network-level encryption. Using a great variety of sources ranging from technical protocol specifications (RFCs), academic journals, and email archives, it becomes clear that the underlying causes for the lack of encryption are threefold: the technological momentum of a design focus on reliable rather than secure communication, hardware (and software) limitations, and externally imposed restrictions on communication between key designers. Analyzing a host of frequently ignored sources to examine the little-studied topic allows us to better understand how we arrived at the current state of network-level encryption, and create an explanatory framework for why things like the NSA surveillance revealed in the Snowden leaks are still possible on today’s Internet.

Rishi Kaneriya, COS
Title of Project: MyLight: The “One Stop Shop” for the Average Philanthropist

MyLight is a web application designed to serve as the “one stop shop” for the average philanthropist looking to donate to non-profit organizations online. It recommends charities to users based on their personal charitable interests, visualizes data about them in an easy-to-understand way that inspires action, integrates current events about charities in the news, and contains social functionality to help users stay connected to fellow donors.

This paper discusses related charity-finding products before detailing the ways in which MyLight differentiates itself as a truly integrated solution. It also discusses the technical implementation of the application, as well as ways in which it was evaluated, before culminating in a discussion of potential improvements that could refine MyLight’s ability to empower everyday philanthropists in the future.

Gabriela Leichnitz, COS
Title of Project: A Princeton On-Demand Transportation Platform and Its Implications on Policy, Safety, and Accessibility

Current On-Demand transportation networks, as implemented at Princeton University and at other schools across the country, are inherently outdated and inefficient, as they rely solely on phone calls for information sharing. This paper addresses the ways in which an On-Demand transportation platform can use technology to simplify the process and ease the exchange of information for all parties. Furthermore, it delves into the policy implications of such a platform, specifically as it relates to overall effectiveness, efficiency, and safety. It then details a prototype developed using MEAN.JS. This preliminary system consists of a web application, used on desktop for dispatchers but designed for mobile responsiveness for riders. The semester project produced a working prototype intended to address the most glaring policy and use problems, but future potential lies in its integration with Princeton’s network and a complete implementation with more complex features and the direct involvement of the bus driver.

Stephanie Marani, COS
Title of Project: InfraShare Mobile: Crowdsourcing Plant Health Using Near-Infrared Photography

As trees and other vegetation are crucial in regulating and maintaining our ecosystem, monitoring their health is an important task. This job often falls to those who work for environment-related institutions; however this does not have to be the case. Many organizations have begun to use crowdsourcing and volunteer recruitment to help collect environmental data, which allows for ecosystem monitoring to be increasingly efficient. This thesis introduces InfraShare Mobile, an open source mobile application framework that provides a simple, easy process for plant health data to be obtained and analyzed. It allows anyone to use commercial-off-the-shelf devices, including webcams and digital cameras, to take near-infrared photos of vegetation and then analyze these photos on their mobile devices. These photos are then uploaded to a companion web application where they can be viewed along with their location and other information. After using three different cameras to evaluate InfraShare Mobile, it is clear that the application allows for detailed data on plant health to be collected on a wide scale, low cost basis, and that it will allow environmentalists to have access to more information about the current state of the natural environment.

Tess Marchant, COS
Title of Project: What Does Facebook Know: Behavioral Targeting for Personalized Ads

This project is aimed highlighting the importance of increasing public awareness and education regarding web privacy. New forms of web tracking are explained and discussed, the specific information being collected by popular social networking sites is disclosed, and the methodology of a new web app that will inform and empower the public to change their browser preferences to reflect their personal beliefs is introduced. This application was inspired by previous research indicating a disconnect between fears people have regarding privacy online and the actions they take to ensure that their online behaviors are, in fact, private. Its ultimate goal is to help connect those fears and actions, and to continue the web privacy discussion in general.

Alec Jacob (A.J.) Ranzato, SOC
Title of Project: I-Robot to We-Robot: Exploring the Effects of Team Structure on Team Dynamics, Decision Making, and Performance, when Working with a Remote Robot

In this study I gave teams of five a team structure, condition or hierarchy, and tasked them with controlling a remote robot through an experimental space with the goal of maximizing exploration and exploitation. They were given 90 minutes, and within that 90 minutes had a series of (max) 10-minute windows to send a series of commands to their robot in bulk consisting of movement and pictures, which were used to help in following planning sessions. Ultimately, three groups emerged, tightly coupled hierarchical and consensus ones, and loosely coupled versions of both. Loosely coupled teams proved to be best suited for the task as they could maintain brief social order centered around their robot teammate where their team structure provided little. They could then neutralize the advantages of their given structure and adopt the advantages of the other.

Paarth Shah, WWS
Title of Project: Redefining the Smart City: An Analysis of the 100 Largest Cities in America

This paper studies the emergence of Smart Cities in the United States. These are not new constructions of cities but rather retrofitting of existing metropolitan areas. 89 of the 100 largest cities in America have used the terms “smart city”, “smart growth”, or “smart technology” in their government documentation. The motive of the paper is to both analyze and refine the definition of a smart city, given its vagueness in the literature. After an analysis of definitions proposed by various stakeholders, it is suggested that a smart city ought to have high levels of information and communication technology, social capital, infrastructure and education. I then use proxy variables for each of these metrics and regress these variables against well-being and satisfaction within a city for the 100 largest cities in America, since the underlying goal of a city is to ultimately increase the level of well-being and quality of life for its citizens. The paper concludes with 4 dyadic case studies which look at pairs of cities which are close to each other in geography and population but perform very differently on the Smart City Index, a weighted average for the 4 variables mentioned above. These dyadic pairs are analyzed to offer some insights into why certain cities have performed better on the Smart City Index relative to others, pointing out key investments made by particular cities in technology, education, infrastructure and social capital which have resulted in higher levels of well-being.

Edward S. Walker, Jr., ORFE
Title of Project: A Method of Pose Estimation Using April Tags for the Picking and Stowing Problems

This paper introduces a new approach to pose estimation for the picking and stowing problems. The approach uses April tags attached to an item to estimate its pose with a single image. In testing, the approach has a root mean square error of 4.5mm against a SIFT and RANSAC method on items used in the 2015 Amazon Picking Challenge. This approach could be valuable for placing items on and off warehouse shelves and for other applications.

2015 Certificate Graduates Independent Work

Gabriel Ambruso, COS
Title of Project: Finding Computer Time: Load-Balancing of Public Terminals

Ease of access to public terminals is a concern for the 77 million Americans that use them. These individuals rely on these terminals for the Internet connectivity and productivity tools they provide. Factors that hinder access to public terminals include heavy competition for their use and an inability to determine what time is best to attend a locale with such terminals.

This paper outlines a framework for load-balancing public terminals. This framework focuses on load-balancing terminals at a location by providing users with information on the expected number of available terminals at any given time and the peak usage hours on any given day. Using this information, users can align their trips with non-peak hours and lessen the amount of competition for public terminals by spreading out their visits. Once this framework is deployed at multiple locations with terminals in a single area, its data can be used to direct users to the location with the most available terminals and increase the efficiency with which these terminals are used.

Green Choi, COS
Title of Project: An Automated Approach to Ad Tracker Detection and Classification

In this paper we propose an automated system for detecting, categorizing, and verifying ad trackers on the web. We base this system on the OpenWPM platform developed by Englehardt et al., which we leverage to create an “aggressive” attempt at maximizing tracker coverage while minimizing the potential negative impact on functionality. We explored the suitability of structural “A/B” DOM tree variations in fitting supervised learning models. In doing so, we observe potential advanced tracking methods like cookie syncing in the wild and attempt to explain the limitations of relying on patterns in DOM structural data in classification. Finally, we propose next steps towards the improvement of tracker detection and classification in hopes of overcoming the observed limitations of DOM structural features in generalizing to the diverse content found across the web.

Cara de Freitas Bart, COS
Title of Project: Safety is Our Priority: The Legal Issues with Autonomous Vehicles

Vehicles are now computers on wheels as increasing amounts of software assist drivers to safely navigate the roads. Autonomous vehicles, for which humans will not have to physically drive the cars, will be ready for production in the next few years. To safeguard humans’ safety on the roads, policymakers must write the necessary legislation to ensure the safe testing, development, and integration of autonomous vehicle technology before it is released to the public.

But is the United States ready for this leap in technology? This research focuses on the development of autonomous vehicles in the United States and the legal issues that must be addressed. Through a combination of legal documents, laws, academic research papers, and interviews with industry experts, an evaluation of autonomous vehicle technology and legislation that prioritizes road safety is presented. Politicians and government officials, who usually lack a strong technical background, are the intended audience because they must address the legislative challenges of autonomous vehicles.

Stephanie Goldberg, ELE
Title of Project: A Comprehensive Survey of the Security of the Internet of Things

With an estimation of 50 billion smart devices utilized across the world by 2020, there is an imperative growing need for security across these devices. The Internet of Things (IoT) is a term referring to smart devices, or those atypical devices and household items that are connected to the Internet, which provides a global communication network. These devices range in use, size and power, but all ultimately provide a more technologically advanced and intuitive environment. These devices, composed of sensors, computing devices, controllers and actuators, take in information about their environment (ex. person, building or vehicle) and utilize their own communication network protocols to connect to the Internet, where they send data, interpret the meaning of the data and then actuate on the environment accordingly.

Unfortunately, these devices today are largely insecure and are quite vulnerable to attacks of all types. By identifying commonalities across all types of smart devices that make up the Internet of Things, this paper will provide a device framework for security that can be applied to make a secure “thing”. This paper will offer a comprehensive set of recommendations on security fixes and protocols that together will form a security backbone for IoT. Additionally, this article will explore policy concerns based around privacy and security issues exploited by these devices.

Elisse Hill, COS
Title of Project: Incentivizing IPv6 Deployment by Improving Transit Performance with the Teredo Protocol

IPv6 is the Internet addressing system that has been slated to replace the existing protocol, IPv4, since the inception of IPv6 in 1998. This replacement is necessary because the number of IPv4 addresses does not satisfy the current demand based on population and the fact that people have multiple Internet-connected devices. IPv6, on the other hand, offers 7.9*10^8 times more addresses than IPv4. Thus, IPv6 adoption is an important step in the future advancements of the Internet. The slow deployment of IPv6 is due to many reasons, but we must hasten deployment.

One solution to do this is to use the Teredo protocol, which encapsulates IPv6 packets in IPv4 packets. This method is the primary method I used in my methodology. However, it is important that adopting Teredo maintains that the system depends on IPv4, when it should use IPv6 exclusively. Therefore, there must be strong methods to encourage IPv6 adoption. One of the methods that I suggest in my paper is to use federal regulation in order to encourage the deployment of this technology and thus help our society solve some of its pending technological issues.

Judy Jansen, ENG
Title of Project: A Web of Non-Sense: Pale Fire as Precaution to Hypertext Literature

When Vannevar Bush envisioned his “memex” machine in the 1945 Atlantic Monthly article “As We May Think,” he anticipated many of the forthcoming changes for literature within the Digital Age. Bush imagined a device that would gather all of literature into a connected system akin to the associational networks of the mind. Although Bush only conjectured this concept, information technology scholar Ted Nelson began to build one of the first hypertext systems, called Project Xanadu, five years later. This ambitious endeavor never achieved its aim of compiling all of literature, but the attempt at a physical manifestation of Bush’s ideas inspired the myriad of hypertext systems to come. After Bush’s publication, many computer scientists and writers alike both conceptualized and criticized hypertext literature: readers could link any writing to any other, breaking down the traditional process of reading a linear narrative.

Vladimir Nabokov explored the idea of hypertext in his illustrious 1962 book Pale Fire. The fictional madman Charles Kinbote narrates the book’s foreword, commentary, and index, which all cite and refer back to a 999-line poem written by Kinbote’s neighbor, John Shade. The commentary surrounding the poem unravels into a convoluted story of its own. Critics have noted briefly the book’s foreshadowing of hypertext literature; Nelson even planned to use Pale Fire for his demo of Project Xanadu in 1969. However, this paper focuses on how Nabokov anticipates hypertext literature and what he believes this means for future literary agency. Through a complex cyclical and linear structure, distorted perspectives, and multivalent language, Nabokov’s Pale Fire warns readers of the danger of losing direction, authority, and clarity in an age of abundantly accessible writing. With the rise of hypertext, Nabokov reminds readers that we must slow down and close read in order to make sense of the truth embedded in webs of information.

Michael Katz, EAS
Title of Project: Bridging Zhongguancun and Silicon Valley: How the Chinese Government Is Constructing a Technology Ecosystem That Conforms to Western Standards of Innovation

China’s post-reform economic development, bolstered by rapid industrial growth, has allowed China to become to world’s largest economy. However, the threats of stagnant growth and the “middle-income trap” due to shifting labor trends have provoked action from the top levels of the state government. With the introduction of the 2006 Medium- to Long-Term Plan for the Development of Science and Technology (MLP), Party officials began to increase rhetoric surrounding an “innovative” China, implementing numerous policies to promote a vibrant innovation ecosystem. However, rather than embrace the qualities that have characterized the success China’s recent tech giants like Alibaba and Baidu, the Chinese government is seeking to conform to a more Western standard of innovation. This paper present a conception of innovation that differs from the traditional, Silicon-Valley-centric view, and critically questions the means by which the Chinese government is politically influencing the direction of Chinese technological development. By financially incentivizing patent generation, as well as selectively funding R&D-focused companies and university departments, Chinese leaders have emphasized a paradigm of “innovation” that is more recognizable for critics who might otherwise dismiss China’s past technological accomplishments. The paper uses Chinese primary sources to closely examine the common themes and discrepancies in Chinese rhetoric surrounding “innovation,” and looks to elucidate the preconceptions that we hold in assessing innovation.

Oscar Li, COS
Title of Project: RAPTor: Routing Attacks Against Privacy in Tor

Tor is an anonymity system that protects its millions of daily users from Internet surveillance. Its users include journalists, law enforcement, activists, businesses, and ordinary citizens concerned with online privacy. Nonetheless, Tor is not completely secure. If an autonomous system (AS) can observe traffic between the Tor client and guard relay and also between the exit relay and destination, the AS can correlate packet timings and sizes to deanonymize the Tor user. This renders Tor useless.

Prior research has investigated this threat but largely in the context of symmetric and static Internet paths. In reality, Internet paths are dynamic and asymmetric. Hence, we present RAPTor – a new set of attacks on Tor that leverage the dynamic and asymmetric nature of Internet paths to deanonymize even more Tor users than previously thought possible. We have built a Tor Path Simulation System that quantifies the impact of RAPTor on Tor security and a Traceroute Monitoring Framework that detects and analyzes RAPTor. On a whole, our work calls attention to the dangers of abstracting network routing in analyzing the security of anonymity systems.

Adam Suczewski, COS
Title of Project: Real-time, Multi-User Facial Detection with Applications

This is a three-part project with an emphasis on implementation and applications. The first part consists of extending the open-source CLMtrackr face detection library to support tracking of multiple users per image or video frame rather than a single user. The second part explores applications made possible by multi-user face detection, with an emphasis on facial recognition. The third part explores societal implications of new facial recognition technologies.

Raymond Zhong, COS

Title of Project: Analysis of the Bitcoin Blockchain

Bitcoin is a virtual currency maintained by a decentralized network of participants, who are able to broadcast cryptographically signed transactions in order to move balances between accounts. The full history of Bitcoin transactions is available to anyone connected to the network; this project involved implementing a set of analytical tools for efficiently indexing and querying up to the full set of transactions. A data store was developed using a key-value database and optimized to achieve significantly higher read/write performance than existing SQL-based blockchain databases. Indexing and query infrastructure were implemented, including functionality for traversing over the graph of transactions and aggregating and generating views of data. Finally, the different parts of this project were integrated in a single application that watches for and processes new blocks as they are broadcast, and serves generated statistics through a web interface.

2014 Certificate Graduates Independent Work

Daniel Chyan, COS
Title of Project: Investigating Censorship through Detecting Modified Content

This paper details the process of creation of a tool to monitor and detect censorship among a
large set of URLs and its application on a popular Chinese news site. The creation of this tool stemmed from an earlier effort to identify potentially censored keywords based off of lexical relations. Development issues from the censored keyword identification system prompted a shift in strategy from a lexical approach to a crawling and monitoring approach. Results from the censorship detection tool has revealed some amount of content modification to the monitored URLs and further exploration is necessary to realize the full potential of this tool. Future application of this tool can lead to better censored keyword detectors and provide, in a timely manner, stronger insight into topics being censored.

Vladimir Costescu, COS
Title of Project: Interviewing with Glass: Investigating a Potential Application of Wearable Technology

In recent years, rapid technological advances have increasingly enabled the miniaturization of computing devices, leading to the proliferation of powerful smartphones, TV streaming dongles such as the Chromecast, and a new array of wearable computers embedded in objects such as watches and glasses. In this paper, I am studying the potential impact of Google Glass in the corporate world, specifically considering the usability of the device as an aid to human resources personnel in the process of conducting interviews with job applicants. To this end, I met with a number of employees at a software company that fulfills US government contracts and pitched the idea of a Glass app that would help streamline the interview process. In the course of discussing the potential functionality of such an app, I gained valuable feedback from key personnel inside the company, including tech leads, human resources personnel, and even the COO and CTO of the company about features they would like to see in an interview app and also about the usability of the device in general.

Owen Gaffney, POL
Title of Project: Uncharted Waters: Re-evaluating the Ethics of Extraterratorial Surveillance

In response to a recent movement – catalyzed by Edward Snowdon’s NSA leaks – in support for an international right to privacy (and corresponding international laws) this thesis does a historical review of the circumstances, causes and purposes around which the West formed its collective ethical framework in relation to the concept of “Just Intelligence.” After establishing this framework a number of changes in the world, brought around by politics and technological advances, are reviewed. The framework is then reevaluated in the context of this changed world and the evolving nature of threats to national security and is shown to fall short in several areas. Ultimately, the NSA’s continuing surveillance of foreign citizens is supported under this new framework of “Just Intelligence.”

Lucas Ho, COS
Title of Project: Meaningful Use Attestation and Hospital Acquired Infections

Research has shown that hospital acquired infections (HAIs) cost our healthcare system $10 billion a year. Furthermore, up to half of these infections can be prevented. Motivated by these facts, recent literature has suggested that increasing electronic health record (EHR) usage can significantly reduce HAIs similar to how checklists improve safety and quality control. Small-scale pilot studies have confirmed this hypothesis, but are these isolated incidents or do they point towards a larger trend? This project seeks to analyze open government data, courtesy of Data.gov, on national EHR adoption (represented in this project as meaningful use attestation) and HAI rates. I will use a JavaScript data visualization library to create a state-by-state visualization of the current relationship between the two factors in order to seek an answer to the question posed above.

Sing Sing Ma, REL
Title of Project: 140 Character Limits: A Study on Change, Responses to Pope Francis, and the Impact of Digital Media

When the Vatican adopted Twitter as a communication method, the conflict between technology and tradition converged onto one social media account. Scholars predicted a decline of religious belief when the Internet allowed everyone a voice, undermining the authority of a pulpit. This paper investigates the question of change and the papacy, using the lenses of influence on media, religious participation, and authority. The primary focus is on the favorites, re-tweets, and mentions of Pope Francis and his tweets.

Carmina Mancenon, ORFE
Title of Project: The Startup Spring: Leveraging Public Policy to Increase Capital Pools for Technology Startups in Turkey and Jordan

Money is an indispensable component of bringing a vision to life in the entrepreneurship space. Indeed, 90% of startups fail primarily due to a lack of sufficient funding, according to the United States Small Business Administration. To this end, governments have the potential to influence the capital pool available to startups through financial policies such as tax incentives and grants. This paper proposes a framework for governments to understand the health of their country from an entrepreneurship perspective, specifically in the technology sector, and enact tailored policies to create an ecosystem conducive to innovation and creation substantiated by comparatively increased financial means. We apply this model to technology startups in Turkey and Jordan.

The methodology used to create this model involves regression and applied time series analyses to deduce the funding crunch area and financial policy priorities. This data is collected from publicly available investment tables on Crunchbase, press releases, and news articles, as well as results from surveys conducted by the Global Entrepreneurship Monitor. These are supplemented by qualitative data based on 30+ interviews conducted with both investors and entrepreneurs in Turkey and Jordan through collaboration with Endeavor Global. Ultimately, we present a systematic, ‘plug-and-chug’ framework for governments to customize in order to begin taking action.

Dillon Reisman, COS
Title of Project: Cookie Crumbs and Unwelcome Javascript: Evaluating the hidden privacy threats posed by the “mashed-up” web

Many modern websites are built on a “mash-up” of numerous web technologies and libraries. This combined with the ubiquity of third-party web tracking can open up a user to an increasingly large array of threats to her privacy from many angles. Our paper is a comprehensive evaluation of how the structure of the web can enable new forms of privacy violation and measures these new threats’ severity.

In this paper, we first define a novel form of passive network surveillance we term “cookie linking.” Through this method an eavesdropper observing a user’s HTTP tracking cookies on a network can transitively link shared unique cookies to reconstruct that user’s web browsing history, even if IP varies across time. Using simulated browsing profiles we find that for a typical user over 90% of web sites with embedded trackers are located in the large component of visited sites created through cookie linking. The privacy implications of cookie linking are made more acute by the prevalence of identity leakage. In a survey of top web sites we find that over half of those sites leak the identity of logged-in users to an eavesdropper in unencrypted traffic. The eavesdropper thus both identifies a user and uncovers a majority of her web history through passive means.

Second, we evaluate how the third-party Javascript-handling practices of popular sites further exposes users to potential privacy violations. We employ a man-in-the-middle attack to model what information malicious Javascript put in the place of approved third-party Javascript can exfiltrate to a malicious server. We find that third-party Javascript is very often permitted to execute in unsupervised environments, where it is free to collect everything from user cookies to keystrokes. Compromised third-party Javascript presents a significant privacy threat against users that many sites help enable.

We ultimately conclude that the most effective method of preventing the above privacy violations is through blocking third-parties on websites, often done via a browser plug-in. These may limit a site’s functionality, however, leaving users without a satisfactory option to protect themselves.

Anna Kornfeld Simpson, COS
Title of Project: History Independent File System on an Insecure Flash Device

Keeping data on a hard drive safe is of critical importance for consumers and advances in file system and computer security have struggled to keep pace with powerful adversarial capabilities. Solid state drives (SSDs) provide new challenges for disk security because of their wear levelling properties: the disk controller maps between physical and virtual memory blocks in order to keep the disk from being worn out too quickly, which means that the operating system cannot guarantee that a particular block is erased or overwritten on the disk. This thesis presents a method for securing file-system history from an adversary with forensic access to such a disk by extending previous work on secure deletion on SSDs.

As well as addressing the technical problems of encryption and systems-building, the design of this project and other security technologies must consider the adversarial scenarios where this technology may be used in order to ensure that the design captures the correct metaphors for secure use. Who are the potential users of the technology? What capabilities will their adversaries have? How will existing policy regimes and social norms affect the adoption of the technology? This talk will describe the technical insights of my thesis project and then focus on the choice of threat model and the impact of the above considerations on the design.

Rosemary Wang, ELE
Title of project: A Study of Mobile Video Power Consumption over HetNets

Given that user consumption of data over mobile technologies and the number of applications requiring higher data rates are increasing, the next generation of mobile technology needs to handle demand for more reliable, higher quality data. In particular, the amount of traffic from video is a growing concern for wired and wireless traffic management. One solution to this problem to distribute the traffic without compromising user experience would be to use heterogeneous networks (HetNets) to switch between technologies or utilize them simultaneously to improve the reliability, quality, and throughput of data. These multiple radio access technologies (multi-RATs) can be used to improve Quality of Experience (QoE) with video at the cost of increased power consumption for the user’s mobile device. This study analyzed video traffic at the packet-level and its impact on device power consumption, determined differences between mobile technologies and wired technologies in both power consumption and packet interactions, and determined the factors that indicate the need to switch to a different technology. Furthermore, these findings apply to the existing policy surrounding net neutrality and the importance of reasonable network management. The usage of multi-RAT implementations raise questions regarding an individual network’s ability to handle video traffic, the increased convergence in technology today, and the differing net neutrality standards for wired and wireless technologies. The conclusions regarding packet-level interactions for video, one of the most bandwidth-heavy applications today, provide a framework for evaluating network neutrality in order to maintain user QoE.

Harvest Zhang, COS
Title of project: Efficient Packet Traceback in Software-Defined Networks

This paper presents an efficient method for performing packet traceback in software-defined networks. While previous work explores tracing packets forward from their point of entry, the problem of packet traceback is to determine, given a packet that has arrived at a switch in the network, all possible paths it could have taken to get there from its point of ingress. Packet traceback is useful for tracing attacks, network debugging, monitoring performance, and so on; multiple autonomous systems may also collaborate to enable packet tracebacks across domains. Given a network policy consisting of functions that define how packets are handled at each switch, we compute a traceback policy that we use to reconstruct the flagged packet’s possible paths through the network. This traceback is performed entirely by the controller without incurring any overhead on the data plane, and no additional flow rules need to be installed at the switch level.

2013 Certificate Graduates Independent Work

R. Auduong, ARC
Title of project: ALMOST HUMAN: Robots in Architecture and the Narrative of Control

In the 21st century, robots are increasingly capable and common in everyday life. As robotic technologies continue to develop, humans like to believe that they are in complete control of technology, but to what extent might robotic technologies exert an influence of their own?

The thesis seeks to explore how humans, robots, and architecture are influencing each other today. The approach for this thesis exploits the natural analogy between humans and robots: Essentially, both sense, “think,” and act, but the mechanisms used are very different. The technological, spatial, and visual consequences of these differences are considered as important indicators of how these three subjects interact today.

The scope of the project encompasses two radically different environments: industrial and domestic. Through case studies of non-humanoid and humanoid robots (Kiva Systems, Baxter, Roomba, and ASIMO), it is shown that human-robot-architecture interactions are very context specific. In industrial case studies, robots have a strong influence over architectural design and the role of the human worker; but in the domestic setting, robot designs are adapted to existing patterns of residential architecture and human behaviors. The interchange between robots, humans, and architecture is multidirectional and multimodal.

Daniel Feinberg, WWS
Title of project: International Regimes of the Internet and Aviation: Structure, Preferences, and Technology

The Internet Corporation for Assigned Names and Numbers (ICANN) has, since its inception, provided scholars with a compelling puzzle: how did a private corporation come into a position of authority over the Internet and what keeps it in control? To address these key questions about ICANN, this thesis seeks to create a cohesive model of technological regimes in order to understand ICANN’s current position as well as its prospects for change. To build such a model, this thesis looks at the case of international aviation in the post-World War II era, studying both the similarities and differences between the two cases. By combining these cases, a model of technological change in complex interdependence can be constructed, providing a theoretical framework that can be utilized to assess ICANN.

Michael Franklin, COS
Title of project: A Statistical Approach to the Detection of Behavioral Tracking on the Web

Online Behavioral Targeting is a controversial practice for which rigorous detection and analysis is challenging. The capacity to make strong claims about Behavioral Targeting in “the wild” would be valuable for policy makers. In this paper we present a conception of browser-server interactions and a novel statistical approach to detecting Behavioral Targeting that leverages this formulation. This approach allows us to make precise claims about Behavioral Targeting and achieve valuable automation of analysis.

Marianne Jullian, COS
Title of project: Visualizing Expression: A Visual Analysis of Literary Works and Nonliteral Copying in the Context of Copyright Infringement

In the domain of copyright law that deals with fictional works, issues of nonliteral copying have been quite contentious. The focus has been on how to protect the public domain against monopolies of ideas that serve as fodder for creative writings, while also providing adequate protection for authors’ expressions of ideas in order to incentivize future work. Several judges have developed tests that can be applied to fictional works, however they are rather abstract and rely on the discretion of those involved in individual court cases.

With this in mind, I sought out to develop an automated method that seeks to identify unique expressions of ideas in literary works. Drawing from discussions of nonliteral copying in the context of copyright infringement, expressions are hereafter defined as patterns composed of the following literary components: writing style, character development, plot themes, parallelism of incidents, and relationships between characters. The method I propose as a tool for detecting nonliteral copying is a data visualization. This method relies on computational linguistics and also on the power of data visualization to uncover otherwise obscured patterns of expression through the use of color, layers, and small multiples.

The efficacy of the linguistic analysis and data visualization is judged by its ability to accurately identify important characters, concepts, and plot developments on works in isolation. Additionally, the efficacy of the data visualization as a tool for identifying nonliteral copying is analyzed using works written by the same author and the comparison of its application to a work and its parody.

Emma Lawless, ANT
Title of project: Trusting Paper, Trusting People: The Role of Documentation for Trustworthy Conditions in Spacecraft Work

My project developed out of six weeks of qualitative fieldwork at two space science laboratories in Boulder, CO. It explores the crucial roles that regimes of documentation played in creating trustworthy working conditions for team members on several NASA missions working out of these labs. In working with technological tools from simple spreadsheet programs to more customized spacecraft visualization tools, my interlocutors employed a variety of low-tech, paper documentation practices which were instrumental in allowing the team members to achieve confidence in their working conditions and the products they were generating. Essentially, far from being empty bureaucratic requirements, paper documents functioned to infuse reliability into the work processes of my interlocutors, contributing to a sense of “trust-in-familiar-form” which characterized the work I observed.

Shreya Murthy, POL
Title of project: A Theory of Privacy

This paper presents a theoretical account of the right to privacy. It discusses the problems that are typically encountered when one attempts to define or defend privacy and explains the need for a conceptually distinct and clearly articulated concept of privacy. It then examines in detail the perspectives on privacy that have been offered by philosophers and legal scholars thus far and then presents a new conception of privacy. Informed by a thorough understanding of the problems of privacy and the shortcomings of the major perspectives, the theory of privacy presented in this paper provides a valuable grounding for both legal and technological approaches to privacy protection.

Eleanor (Nora) Taranto, HOS
Title of project: Too Fast, Too Soon? The Privacy Implications of Electronic-Medical-Record System Adoption

The privacy rights of medical patients are expansive, especially in the United States since the passage of HIPAA in 1996. Since then, medical institutions have also begun to implement electronic medical record (EMR) and electronic health record (EHR) systems at a fast rate. These systems provide some practical benefits for the medical community, but also raise serious privacy concerns—worries in particular about how well such systems protect against confidentiality breaches. The vast number of privacy breaches in these new EMR systems, even with protective mechanisms in place, leads me to make four recommendations that may be useful in preventing more data breaches: 1) strengthening of access control; 2) encryption of stored data as well as data in transit; 3) better use of data logs through the development of anomaly-detection algorithms; and 4) caution on the part of medical institutions and policymakers in adopting only those EMR/EHR systems with adequate protective mechanisms.

2012 Certificate Graduates Independent Work

Jasika Bawa, ELE
Title of project: TUBE – Time-dependent Usage-based Broadband price Engineering

TUBE (Time-dependent Usage-based Broadband price Engineering) is a system that aims to bridge the digital divide by computing and delivering pricing incentives for wireless usage. It is anticipated that this, in turn, will enable wireless providers to make wireless data available to a wider audience.

The notion behind being able to deliver pricing incentives is that charging users different prices for Internet access at different times of the day will incentivize them to spread their demand for bandwidth across various different times of the day. This is also a viable way of maximizing the use of capacity of a wireless spectrum. With the high rate of penetration by smartphones, tablets and other Internet-capable mobile devices, wireless Internet usage has been increasing at an extremely fast pace, with more users consuming larger amounts of data. However, ordinarily, heavy usage is concentrated during a few peak hours of the day which forces ISPs to overprovision in order to handle such concentrated heavy usage. Thus, pricing by timing is advantageous not only for end users (particularly those affected by the digital divide) but also for wireless providers. Finally, although congestion pricing has been implemented, ISPs are increasingly finding that the traditional models are insufficient to meet the challenge of growing demand for bandwidth.

In the fall semester, I helped design and develop an Android application to enable users to control their budget for mobile broadband in an informed manner. This involved providing rich information regarding overall data use, app-specific data use, budget expenditure per day and the like, providing notifications regarding good and bad times to launch data-hungry applications (such as YouTube) and providing the user with a way to schedule applications.

This semester I am working on data analysis to help provide the TUBE team with information regarding user preferences. From a set of surveys filled out by Princeton students, I have had the chance to work on quantifying delay sensitivity to Internet applications. From a second data set of survey participants from India, I will have the opportunity to further quantify socio-economic demographics’ price and delay sensitivity to the same.

Andrew Bristow, SOC
Title of project: Cruelty in the Digital Age – Adolescents and Online Bullying

In the everyday social interactions of adolescents, a number of cruel behaviors associated with bullying have expanded online, sometimes shifting in the process. Existing research shows that the nature of online space allows bullying to be even more damaging than its offline counterpart. Online bullying can have many, if not more, of the same negative consequences of the offline variant.

The current study provides a framework to study online bullying. In particular, this study presents a way to gather information on adolescents’ perspective on this phenomenon and to identify the components of their support network. This study seeks to understand parental involvement in online space and to contribute to a body of literature that claims online and offline spaces have become intertwined in such a way that distinctions between the “real” and “virtual” world are no longer appropriate. I apply this framework to newly collected data from middle school students in the greater Mercer County area. Results of this implementation are discussed at length herein, along with implications for parents, school personnel, and policymakers.

Rebecca Lee, WWS
Title of project: Contested Control: European Data Privacy Regulations and the Assertion of Jurisdiction over American Businesses

In the European Union, a comprehensive data privacy law called the Data Protection Directive governs the collection, use, storage, and dissemination of European personal data. Any data controller – regardless of its geographical location – that accesses and collects European personal data must comply with Directive. However, the Directive was adopted in 1995 and has since become out of date. On January 25, 2012, the European Commission published its proposal for a new comprehensive data privacy law, the Data Protection Regulation.

This thesis examines the Data Protection Regulation and its potential effects on American businesses, consumers, and society. Specifically, I analyze the mechanisms through which European policymakers attempt to secure the compliance of American businesses and the substantive requirements for compliance. I illustrate how data privacy provisions such as the explicit consent requirements and the right to be forgotten conflict with American business practices, political values, and legal principles. I conclude by suggesting that American policymakers may want to take a more active role in influencing the final shape of the Data Protection Regulation.

Jay Parikh, WWS
Title of project: Evading Government Censorship: the Labor Movement’s Use of the Internet

In China, rapid Internet growth had given hope to a renewed civil society movement focused on improving human rights, labor conditions, environmental concerns, and addressing a number of other issues. This hope, however, was tempered with the reality of comprehensive government censorship of information technology.

This paper seeks to clarify the aggregate effect of Internet censorship on the development of domestic civil society institutions in China by focusing on the labor movement and workers’ rights issues. The labor movement serves as an effective vehicle to examine the broader civic sector for two reasons. First, the government is particularly concerned with the effect of organized labor on political reform; therefore, censorship of these movements is pervasive. Second, the labor movement has been adept at harnessing technology since it is the only way they can effectively compete with widespread communication networks possessed by the state and marketplace.

This paper is organized into five parts: first, I present the theoretical argument underlying the power of the Internet in shaping civil society and the Internet’s rise in China. I then examine how the labor movement has used different aspects of information technology to advance its interests and goals. The next section evaluates how the Chinese state uses technical restrictions to monitor and censor CSOs. Once this is established, I analyze the benefits in eliminating these restrictions on both Chinese CSOs and US companies. The paper closes with specific policy recommendations for the Congressional-Executive Commission on China on how the government can partner with technology companies to help labor CSOs.

2011 Certificate Graduates Independent Work

Jennifer King, COS
Title of project: Software Support for Software-Independent Auditing

(Published as Software Support for Software-Independent Auditing — Short Paper. Gabrielle A. Gianelli, Jennifer D. King, Edward W. Felten, and William P. Zeller. EVT/WOTE’09, Proceedings of the 2009 Conference on Electronic Voting Technology / Workshop on Trustworthy Elections)

Thomas Lowenthal, POL
Title of project: BitTorrent Research

Copyrighted material is often shared without the permission of the copyright holder. Peer-to-peer (P2P) systems — including BitTorrent — are a common vector for such sharing. Some copyright holders wish to detect such unauthorized sharing when it occurs, and to discourage it. Several companies offer services designed to detect this sort of unauthorized distribution. These companies typically use proprietary detection techniques, and often boast about the reliability of their particular methods. However, previous research has indicated that these services may not be as reliable as claimed.

We performed a study to investigate the accuracy rates — specifically: to establish an estimated lower bound on the false-positive rates — of various techniques for identifying those who share copyrighted material via BitTorrent without the authorization of the copyright holder. We implemented a selection of detection and verification techniques, ran them against the live BitTorrent ecosystem, and compared the suspect lists they produced against a reliable control technique. This allowed us to estimate the rate at which each of these techniques turns up false positives.