Loading Events

Big Data: Public Policy and the Exploding Digital Corpus

Tuesday, November 23, 2010
8:00 am


Friend Center Convocation Room
35 Olden Street
Princeton, 08544 United States
+ Google Map

The body of digital information held by various entities is both staggering and constantly expanding. Each day we hear new reports of newly digitized “dark” archives, enhanced digital tracing techniques, data privacy breaches, and aggregated data sets. At the same time, much historically important information goes unrecorded — at least in any usable or enduring digital form. How do we reconcile the many different constituencies, technologies, uses, and norms into sensible policy? This conference will gather leading experts from a variety of domains to discuss the challenges of “big data” and the attendant policy considerations.

Background Reading: The Promise and Peril of Big Data, The Aspen Institute

Video Recordings:

Panel 1
Panel 2
Panel 3

Registration and Breakfast (8:00 AM – 9:00 AM)

Keynote Speaker (9:00 AM – 10:00 AM)
David Weinberger, Author of Everything is Miscellaneous and the forthcoming Too Big to Know

Break (10:00 AM – 10:30 AM)

Panel 1: Ensuring Future Access to the History of the 21st Century (10:30 AM – 12:00 PM)

Chair: Jason R. Baron, Director of Litigation, National Archives and Records Administration


  • Victoria Stodden, Assistant Professor, Department of Statistics, Columbia University
  • Brewster Kahle, Digital Librarian and Founder of the Internet Archive
  • Richard J. Cox, Professor in Library and Information Science at the University of Pittsburgh, School of Information Sciences

Reading Materials: Information Inflation: Can the Legal System Adapt? by George L. Paul and Jason R. Baron

Lunch (12:00 PM – 1:30 PM)

Panel 2: Age Old Principles Meet New Technologies (1:30 PM – 3:00 PM)

Chair: Ronald J. Hedges, former United States Magistrate Judge in the District of New Jersey


  • Lucy Dalglish, Executive Director, Reporters Committee for Freedom of the Press
  • Julian Sanchez, Cato Institute
  • Anne Washington, PhD Candidate in Information Systems, School of Business, George Washington University

Reading Materials:

Break (3:00 PM – 3:30 PM)

Panel 3: A Thousand Points of Data (3:30 PM – 5:00 PM)

Chair: Paul Ohm, Associate Professor of Law at the University of Colorado Law School

Reading Materials:

David Weinberger

David holds a PhD in philosophy from the University of Toronto and is the author of several books, including “Everything is Miscellaneous” and the forthcoming “Too Big to Know.” David is also the co-director of the Harvard Library Innovation lab at Harvard Law School, and Senior Researcher at Harvard’s Berkman Center for Internet and Society.

Jason R. Baron

Jason, a former Justice Department litigator, currently serves as Director of Litigation at the National Archives and Records Administration in Washington D.C., and writes and speaks extensively on the subject of e-discovery and e-recordkeeping. Mr. Baron, who received his BA and JD degrees from Wesleyan University and the Boston University School of Law, currently serves as Co-Chair of The Sedona Conference Working Group on E-discovery, is an Adjunct Professor at University of Maryland’s College of Information Studies, and was a founder of the TREC Legal Track, an ongoing international research project evaluating search methodologies.

Victoria Stodden

Victoria is an Assistant Professor of Statistics at Columbia University, completing her PhD in statistics in 2006, and her law degree in 2007 at Stanford University. Her current research focuses on reproducibility of computational results, understanding factors underlying code and data sharing among researchers, how pervasive and large-scale computation is changing our practice of the scientific method, and the role of legal framing for scientific openness and advancement.

Brewster Kahle

Brewster is a computer engineer, internet entrepreneur, activist, and digital librarian. Kahle graduated from MIT in 1982 with a BS degree in Computer Science & Engineering and he is the founder of the Internet Archive and the Open Content Alliance, a group of organizations committed to making a permanent, publicly accessible archive of digitized texts.

Richard J. Cox

Richard is a Professor in Library and Information Science at the University of Pittsburgh, School of Information Sciences. Cox has written extensively on archives and archivists, and his educational background includes a PhD from the School of Library and Information Science at the University of Pittsburgh, an MA in History from University of Maryland, and a BA in History from Towson State College.

Ronald Hedges

Ron is the principal of Ronald J. Hedges LLC. Ron serves as a special master, mediator and arbitrator and consults on electronic discovery and records management. He sat as a United States Magistrate Judge in the District of New Jersey from 1986 to 2007. Among other things, Ron is a member of the adjunct faculty of Georgetown University Law Center and Rutgers School of Law (Newark), where he teaches an introduction to electronic discovery and evidence, and of the advisory boards of Georgetown’s Advanced E-Discovery Institute and The Sedona Conference. He is also a Visiting Research Collaborator at the Center for Information Technology at Princeton University. He holds a BA from the University of Maryland and a JD from the Georgetown University Law Center.

Lucy Dalglish

Lucy is the Executive Director Reporters Committee for Freedom of the Press, and was previously a media lawyer for almost five years in the trial department of the Minneapolis law firm of Dorsey & Whitney LLP. Dalglish earned a JD from Vanderbilt University Law School, a Master of Studies in Law from Yale Law School in 1988, and a BA in Journalism from the University of North Dakota in 1980.

Julian Sanchez

Julian is a Washington, D.C.-based writer and journalist who covers the intersection of privacy, technology, and politics (with occasional forays into pop culture and philosophy). He currently works as a Research Fellow at the libertarian Cato Institute, is a contributing editor for Reason magazine, was previously the Washington Editor for the technology news site Ars Technica, and attended New York University studying philosophy and political science.

Anne L. Washington

Anne is currently a PhD candidate in Information Systems at the George Washington University School of Business. She earned a Masters in Library and Information Science from Rutgers University School of Communications Information and Library Science, a BA from Brown University and from 2001-2009, she worked in the Congressional Research Service at the Library of Congress as an information technology librarian specializing in legislative systems.

Paul Ohm

Paul is an Associate Professor of Law at the University of Colorado Law School who writes in the areas of information privacy, computer crime law, intellectual property, and criminal procedure. Professor Ohm is leading efforts to build new interdisciplinary bridges between law and computer science. Prior to joining the University of Colorado, worked for the U.S. Department of Justice’s Computer Crime and Intellectual Property Section as an Honors Program trail attorney. He holds a JD from UCLA School of Law, as well as a BS in Computer Science and a BA in Electrical Engineering from Yale University.

Jessica Staddon

Jessica is a research scientist at Google working on privacy. Before coming to Google she was an area manager at PARC and a research scientist at Bell Labs and RSA Labs. She holds a PhD in Mathematics and Computer Science and a BA in Applied math, both from University of California, Berkely.

Thomas Lento

Thomas has a social science background and extensive experience with social network analysis, large scale distributed data processing and statistical modeling. He has published research on diffusion of information, communication patterns, and social influence in Facebook, Wikipedia, and other social media systems. As part of his work at Facebook, he builds models of user behavior in order to understand how users engage with the product and he applies the insights gained from his research to guide and evaluate product decisions. He holds a bachelor’s degree from Cornell.

Arvind Narayanan

Arvind is a post-doctoral researcher at Stanford University who completed his PhD at the University of Texas at Austin. His research is on the privacy and anonymity issues involved in publishing large-scale datasets about people.

Daniel Martin Katz

Daniel is a PhD candidate in Political Science and Public Policy at the University of Michigan, obtained his JD from the University of Michigan Law School and an MPP from the Gerald R. Ford School of Public Policy. His research interests include network analysis and law, the complexity of the law as well as analysis of various constitutional, statutory and administrative law making processes.