Talks and presentations

See a map of all the places I've given a talk!

Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?

January 28, 2020

Paper presentation, ACM FAT* 2020, Barcelona, Spain

Many machine learning projects for new application areas involve teams of humans who label data for a particular purpose, from hiring crowdworkers to the paper’s authors labeling the data themselves. Such a task is quite similar to (or a form of) structured content analysis, which is a longstanding methodology in the social sciences and humanities, with many established best practices. In this paper, we investigate to what extent a sample of machine learning application papers in social computing — specifically papers from ArXiv and traditional publications performing an ML classification task on Twitter data — give specific details about whether such best practices were followed.

The Invisible Work of Maintaining & Sustaining Open-Source Software

July 10, 2019

Keynote, SciPy 2019, Austin, Texas

Opening keynote at SciPy 2019, in which I discuss a wide range of issues around the work of developing and maintaining open-source software, based on our team’s ongoing mixed-method research into this topic.

Ethics and Policy Implications of Big Data

February 15, 2019

Panelist, University of California, San Diego, San Diego, California

Panelist on the ‘Knowledge and Culture’ panel at this workshop on algorithms and big data, sponsored by a number of different departments across UCSD.

Documenting Data Science and Documentation in Data Science: an Ethnographic Exploration

January 24, 2019

Talk, eScience Institute, University of Washington, Seattle, Washington

In this talk, I discuss the central yet often passed over role of documentation in data science, based on several recent and ongoing studies and projects about the role and importance of documentation in software packages, datasets, analysis code, research protocols, and research teams.

Cooking Data with Care: The Role of Contextual Inquiry in Large-Scale Quantitative Research

January 23, 2019

Talk, eScience Institute, University of Washington, Seattle, Washington

In this talk, I argue that there is often substantial qualitative contextual inquiry and expertise deployed in quantitative methods. Such insights are crucial to ‘cooking data with care,’ as Geoff Bowker advocated.

Qualitative and Quantitative Studies of Wikipedia (with Aaron Halfaker)

August 23, 2018

Keynote, ACM International Symposium on Open Collaboration (OpenSym), Paris, France

We reflect on a decade of studying Wikipedia using qualitative and quantitative methods.

Designing and Using Data Science Ethically

August 16, 2018

Panel, Machine Learning and User Experience San Francisco (MLUXSF), San Francisco, California

With the rise of Machine Learning and AI to solve human-focused needs, how do we design and use data science ethically to help empower and support people?

The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative Ethnography of Documentation Work

June 07, 2018

Talk, 2018 European Conference on Computer-Supported Cooperative Work, Nancy, France

Data analytics increasingly relies on open source software (OSS) libraries that extend scripted languages like python and R. Software documentation for these libraries is crucial for people across all experience levels, but documentation work raises many challenges, particularly in open source communities. In this collaboration between ethnographers and data scientists, we discuss the types, roles, practices, and motivations around documentation in data analytics OSS libraries.

Knowing User Populations at Scale: From the Science of the State to Platform Governmentality

May 27, 2018

Talk, 2018 Annual Conference of the International Communication Association, Prague, Czech Republic

How can institutions that own and operate large-scale social media platforms come to know “their users” at scale? In this talk, I discuss ways of knowing user populations at scale, drawing on Foucault’s account of governmentality, particularly the role of statistics in the formation of the modern nation state.

The Human Contexts of Computation and Data: Infrastructures, Institutions, and Interpretations

May 09, 2018

Talk, University of California at San Diego, The Design Lab, San Diego, California

In this talk, I discuss the role of qualitative and ethnographic methods in relation to computer, information, and data science. These holistic, reflexive, and meta-level approaches to studying data and computation in context help us better understand how to both support and practice data analytics at various scales.

Key Values: What We Talk About When We Talk About ‘Open Science’

April 20, 2018

Keynote, Open Science Symposium, Department of Second Language Studies, University of Hawaiʻi at Mānoa, Mānoa, Hawaiʻi

Openness in science is hard to disagree with as an abstract principle, but what exactly do we mean when we call for science to be made open – or more open than before? In this talk, I introduce and unpack the many different goals, strategies, products, values, and assumptions of the broad open science movement.

Computational Ethnography and the Ethnography of Computation: The Case for Context

March 26, 2018

Talk, IT University of Copenhagen, ETHOSlab, Copenhagen, Denmark

Ethnography is traditionally a qualitative and inductive methodology that is now widely used to holistically investigate people’s lived experiences in and across cultures. In this talk, I define and discuss two ways of thinking about the role of ethnographic methods around computation, then discuss how my research relates to both.

The Human Contexts of Data: Infrastructures, Institutions, and Interpretations

March 22, 2018

Talk, University of Manchester, Data Science Institute, Manchester, United Kingdom

Publics: Witnessing and Measuring

March 16, 2018

Guest lecture, UC-Berkeley: Human Contexts and Ethics of Data course, Berkeley, California

A guest lecture for Cathryn Carson and Margo Boenig-Liptsin’s course on Human Contexts and Ethics of Data (HIST 182C, STS 100C), focusing on how various publics generate, analyze, and interpret data.

Computational Ethnography and the Ethnography of Computation: The Case for Context

February 26, 2018

Talk, College of Information Studies, University of Maryland at College Park, College Park, Maryland

Computational Ethnography and the Ethnography of Computation: The Case for Context

February 12, 2018

Talk, School of Information Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois

Computational Ethnography and the Ethnography of Computation: The Case for Context

January 11, 2018

Talk, School of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina

The Humanity of Artificial Intelligence

November 01, 2017

Talk, Bay Area Science Festival, Albany, California

Today, “artificial intelligence” seems to be everywhere – in our phones, vacuums, hospitals, and inboxes – but it can be hard to separate science fiction from science fact. Many discussions about AI imagine a fully autonomous superintelligence that designs itself with little to no human intervention, making decisions in ways that humans cannot possibly understand. Yet the work of designing, developing, engineering, training, and testing such systems requires a massive amount of human labor, which is typically erased when such systems are released as products. In this talk, I give a human-centered, behind-the-scenes introduction to machine learning, illustrating the creative, interpretive, and often messy work humans do to make autonomous agents work. Understanding the humanity behind artificial intelligence is important if we want to think constructively about issues of bias, fairness, accountability, and transparency in AI.

“But it wouldn’t be an encyclopedia; it would be a wiki”: The changing imagined affordances of wikis, 1995-2002

October 19, 2017

Talk, 2017 Annual Meeting of the Association of Internet Researchers, Tartu, Estonia

This paper examines the early history of “anyone can edit” wiki software – originally developed in 1995, six years before Wikipedia’s origin. While today, the idea of a wiki is associated with large-scale, massively-distributed encyclopedic knowledge production, this was not always the case. Articles on pre-Wikipedia wikis were often closer to a Joycean stream of consciousness than Wikipedia’s Britannica-inspired texts that speak in single voice, and the underlying wiki platform lacked many of the affordances that are now taken for granted in wiki platforms. In fact, the creator of the first wiki advised Wikipedia’s co-founders that the goals of creating a general-purpose encyclopedia and a wiki were inherently contradictory.

Are the bots really fighting? Behind the scenes of a reproducible replication

October 10, 2017

Guest lecture, UC-Berkeley Department of Statistics: Reproducible and Collaborative Data Science, Berkeley, California

A guest lecture for Fernando Perez’s STAT 159/259 course on Reproducible and Collaborative Data Science, in which I discuss issues of open science and reproducibility around our recent paper Operationalizing conflict and cooperation between automated software agents in Wikipedia: A replication and expansion of ‘Even Good Bots Fight’

Computational Ethnography and the Ethnography of Computation

September 14, 2017

Talk, Berkeley Institute for Data Science, Berkeley, California

Ethnography is traditionally a qualitative and inductive methodology – with its origins in cultural anthropology – that is now widely used to holistically investigate people’s lived experiences in and across cultures. In this talk, I define and discuss two ways of thinking about the role of ethnographic methods around computation, then discuss how my research relates to both.

Autoethnographic Methods for Studying Data-Driven Knowledge Production

August 31, 2017

Talk, 2017 Annual Meeting of the Society for the Social Studies of Science (4S), Boston, Massachusetts

An overview of how to study data science ethnographically by personally engaging in various practices of data science.

Jupyter and the Changing Rituals around Computation

August 25, 2017

Talk, JupyterCon, New York, New York

We (Stuart Geiger, Brittany Fiore-Gartland, and Charlotte Cabasse-Mazel) share ethnographic findings made observing and working with Jupyter notebooks, focusing on how people use Jupyter to create and deliver computational narratives in particular local contexts, like classrooms, hackathons, research collaborations, and more.

Demystifying Algorithmic Processes: The Case of Wikipedia

April 20, 2017

Panel, The 21st Annual BCLT/BTLJ Symposium, Berkeley, California

This talk is part of a panel session titled “Demystifying Algorithmic Processes: What is the role of algorithms in online platforms, what can they do and not do, and how should they be governed?”

“The Wisdom of Bots:” An ethnographic study of the delegation of governance work to information infrastructures in Wikipedia

September 02, 2016

Talk, Annual Meeting of the Society for the Social Study of Science (4S), Barcelona, Spain

Wikipedians rely on software agents to govern the ‘anyone can edit’ encyclopedia project, in the absence of more formal and traditional organizational structures. Lessons from Wikipedia’s bots speak to debates about how algorithms are being delegated governance work in sites of cultural production.

Community Sustainability in Wikipedia: A Review of Research and Initiatives

August 13, 2016

Talk, PyData SF, San Francisco, CA

Wikipedia relies on one of the world’s largest open collaboration communities. Since 2001, the community has grown substantially and faced many challenges. This presentation reviews research and initiatives around community sustainability in Wikipedia that are relevant for many open source projects, including issues of newcomer retention, governance, automated moderation, and marginalized groups.

Governing Open Source Projects at Scale: Lessons from Wikipedia’s Growing Pains

July 16, 2016

Talk, SciPy, Austin, Texas

Many open source, volunteer-driven projects begin with a small, tight-knit group of collaborators, but then rapidly expand far faster than anyone expects or plans for. I discuss cases of governance growing pains in Wikipedia, which have many lessons for running open source software projects.

Administrative Support Bots in Wikipedia: How Automation Can Transform the Affordances of Platforms and the Governance of Communities

June 14, 2016

Talk, Communicating with Machines workshop, Fukuoka, Japan

I discuss cases from a multi-year ethnographic study of automated software agents in Wikipedia, where ‘bots’ have fundamentally transformed the nature of the ‘anyone can edit’ encyclopedia project.

Drowning in Data: Industry and Academic Approaches to Mixed Methods in “Holistic” Big Data Studies

June 11, 2016

Panelist, Annual Meeting of the International Communication Association (ICA), Fukuoka, Japan

This panel extends discusses the potentials and complications of mixed-methods research in big data studies, specifically in cases when population-level data is available.

Successor Systems: Lessons for Big Data From Feminist Epistemology and Activism

June 09, 2016

Talk, Big Data: Critiques and Alternatives workshop, Fukuoka, Japan

I discuss four data-intensive activist projects as "successor systems," discussing the political and epistemological implications of using data to advance activist projects.

Algorithms as agents of gatekeeping, governance, and articulation work in Wikipedia

June 08, 2016

Talk, Algorithms, Automation, and Politics workshop, Fukuoka, Japan

I discuss how algorithmic systems are deployed to enforce particular behavioral and epistemological standards in Wikipedia, which can become a site for collective sensemaking among veteran Wikipedians.

Moderating harassment in Twitter with blockbots: a counterpublic and algorithmic strategy

April 16, 2016

Talk, Theorizing the Web, Astoria, New York

Link to more information

“What the hack?” Hacking culture and discourse in data science pedagogy (with Brittany Fiore-Gartland)

April 15, 2016

Talk, Theorizing the Web, Astoria, New York

Link to more information

Scraping Wikipedia Data

February 17, 2016

Talk, The Hacker Within, BIDS, Berkeley, CA

A tutorial (with Jupyter notebooks) about how to use APIs to query structured data from Wikipedia articles and the Wikidata project.

Why bots are my favorite contribution to Wikipedia

January 16, 2016

Talk, Wikipedia 15th Anniversary Birthday Bash, San Francisco, CA

A short talk to open up an event celebrating the 15th anniversary of Wikipedia. The prompt we were given was "Why [x] is my favorite contribution to Wikipedia."

The Bot Multiple: Unpacking the Materialities of Automated Software Agents

November 12, 2015

Talk, Annual Meeting of the Society for the Social Study of Science (4S), Denver, CO

I examine the roles that automated software agents (or bots) play in the governance and moderation of Wikipedia, Twitter, and reddit – three online platforms that differently uphold a related set of commitments to ‘open’ and ‘public’ online participation.

Crowdsourcing: Theoretical Considerations

November 06, 2015

Panelist, Crowdsourcing and the Academy Symposium, Berkeley, CA

A panel discussing how academics use crowdsourcing in research.

Bot-Based Collective Blocklists in Twitter: The Counterpublic Moderation of a Privately-Owned Networked Public Space

October 23, 2015

Talk, Annual Meeting of the Association of Internet Researchers (AoIR), Phoenix, AZ

This presentation introduces bot-based collective blocklists (or blockbots) in Twitter, which have been created to help various groups better moderate their own experiences on the site.

But it Wouldn’t Be an Encyclopedia; It Would Be a Wiki: Wikipedia and the Repurposing of WikiWikiWeb

May 25, 2015

Talk, Annual Meeting of the International Communication Association (ICA), San Juan, Puerto Rico

In this talk, I examine the early history of “anyone can edit” wiki software – originally developed in 1995, six years before Wikipedia’s origin – focusing on the ways in which this technological infrastructure has been repurposed across communities, domains, and scales.

Peer Production and Wikipedia

April 09, 2015

Guest lecture, Social Aspects of Information Systems course, Berkeley, CA

An overview of Wikipedia and other peer production platforms, discussing issues that link up to the theories discussed in the Social Aspects of Information Systems class.

Moderating Online Conversation Spaces

April 07, 2015

Guest lecture, Social Aspects of Information Systems course, Berkeley, CA

An overview of how various online platforms moderate content, discussing issues that link up to the theories discussed in the Social Aspects of Information Systems class.

Trace Ethnography Workshop

March 24, 2015

Workshop presentation, ISchools Conference, Newport Beach, CA

Link to more information

Situated knowledges and successor systems: developing CSCW systems to enact ideological critiques

March 15, 2015

Workshop presentation, CSCW Workshop on Feminism and Feminist Approaches in Social Computing, Vancouver, BC

Does Facebook Have Civil Servants? On Governmentality and Computational Social Science

March 15, 2015

Workshop presentation, CSCW Workshop on Ethics for Studying Sociotechnical Systems in a Big Data World, Vancouver, BC

Supporting Change from Outside Systems with Design and Data

December 09, 2014

Talk, Berkman Center for Internet and Society, Cambridge, MA

Defining, Designing, and Evaluating Civic Values in Human Computation and Collective Action Systems (with Nathan Matias)

November 02, 2014

Talk, Human Computation Conference (HCOMP), Citizen-X Workshop, Pittsburgh, PA

We review various crowdsourcing and collective action systems, identifying particular sets of civic values and assumptions.

Successor Systems: The Role of Reflexive Algorithms in Enacting Ideological Critique

October 21, 2014

Talk, Annual Meeting of the Association of Internet Researchers (AoIR), Daegu, South Korea

Successor Systems: The Role of Reflexive Algorithms in Enacting Ideological Critique

August 23, 2014

Talk, Annual Meeting of the Society for the Social Study of Science (4S), Buenos Aires, Argentina

Big Data is Bullshit’: Scoping the Next 5 Years of Digital Data Research

May 24, 2014

Panelist, Annual Meeting of the International Communication Association (ICA), Seattle, WA

Data-Driven Data Research Using Data and Databases: A Practical Critique of Methods and Approaches in “Big Data” Studies

May 23, 2014

Panelist, Annual Meeting of the International Communication Association (ICA), Seattle, WA

This panel focuses on the challenges faced by researchers conducting mixed-method research into online platforms, particularly where large amounts of data are widely available.

Successor Systems: The Role of Reflexive Algorithms in Enacting Ideological Critique

May 16, 2014

Talk, The Contours of Algorithmic Life, Davis, CA

Successor Systems: Enacting Ideological Critique Through the Development of Software

April 25, 2014

Talk, Theorizing the Web, Brooklyn, New York

Link to more information

Governing the Commons

April 10, 2014

Guest lecture, History of Information, Berkeley, CA

A lecture on the history of Wikipedia, in the broader context of the history of reference works.

Robotic Ethics and Opportunities

April 04, 2014

Panelist, Robots and New Media, Berkeley, CA

A panel discussing the ethical and political issues that are raised with autonomous robots and software bots.

Size Matters: How Big Data Changes Everything

November 25, 2013

Talk, Bangkok Scientifique, Bangkok, Thailand

A talk introducing various concepts around large-scale data analysis to a general audience, including spam detection and governmental survellance.

Design by Bot: Power and Resistance in the Development of Automated Software Agents

October 23, 2013

Talk, Annual Meeting of the Association of Internet Researchers (AoIR), Denver, CO

Hadoop as Grounded Theory: Is an STS Approach to Big Data Possible? the 2013 Annual Meeting of the Society for the Social Study of Science 4S

October 09, 2013

Talk, Annual Meeting of the Society for the Social Study of Science (4S), San Diego, CA

When the Levee Breaks: Without Bots, What Happens to Wikipedia’s Quality Control Processes? (with Aaron Halfaker)

August 03, 2013

Conference proceedings talk, International Symposium on Wikis and Open Collaboration (WikiSym 2012), Hong Kong

This paper examines what happened when one of Wikipedia's counter-vandalism bots unexpectedly went offline.

Values Where? Interrogating Client-Side Scripting as a Design Process

March 01, 2013

Talk, Theorizing the Web, New York, NY

Community, Impact, and Credit: Where Do I Submit My Papers?

February 26, 2013

Panelist, ACM Conference on Computer-Supported Cooperative Work (CSCW), San Antonio, TX

Using Edit Sessions to Measure Participation in Wikipedia (with Aaron Halfaker)

February 23, 2013

Conference proceedings talk, Conference on Computer Supported Cooperative Work, San Antonio, TX

This paper establishes a quantitative metric for measuring editor activity through temporal edit sessions.

Actor-Network Theory

February 07, 2013

Guest lecture, Social Aspects of Information Systems course, Berkeley, CA

An introduction to Actor Network Theory for students in the Masters of Information Management and Systems (MIMS) course

What Aren’t We Measuring? Methods for Quantifying Wiki-Work.

October 29, 2012

Panelist, International Symposium on Wikis and Open Collaboration (WikiSym 2012), Linz, Austria

Time to Degree: Examining the Experiences of Graduate Students in the Long-Term Ecological Research Network

October 17, 2012

Talk, Annual Meeting of the Society for the Social Study of Science (4S), Copenhagen, Denmark

Trace literacy: a framework for holistically conceptualizing newcomer socialization in socio-technical systems

October 12, 2012

Talk, Infosocial, Evanston, IL

Defense Mechanism or Socialization Tactic? Improving Wikipedia’s Notifications to Rejected Contributors

June 05, 2012

Conference proceedings talk, International Conference on Weblogs and Social Media (ICWSM), Dublin, Ireland

A descriptive study of Wikipedia's highly-automated socialization processes and an A/B test to improve templated messages to newcomers.

Hunting for Fail Whales: Lessons from Deviance and Failure in Social Computing

May 07, 2012

Panelist, Conference on Human Factors in Computing (CHI), Austin, Texas

Black-boxing the user: internet protocol over xylophone players (IPoXP)

May 02, 2012

Conference proceedings talk, Conference on Human Factors in Computing (CHI), Austin, Texas

We introduce IP over Xylophone Players (IPoXP), a novel Internet protocol between two computers using xylophone-based Arduino interfaces

Improving Wikipedia’s Notifications to Rejected Contributors

March 31, 2012

Talk, GCOE International Symposium on Informatics Education, Kyoto, Japan

User-Generated Platforms in Wikipedian Governance

November 03, 2011

Talk, Annual Meeting of the Society for the Social Study of Science (4S), Cleveland, OH

’The Internet is Here’: The Virtuality of ‘On-line Communities in Physical Spaces

November 02, 2011

Talk, Annual Meeting of the Society for the Social Study of Science (4S), Cleveland, OH

Participation in Wikipedia’s Article Deletion Processes (with Heather Ford)

October 05, 2011

Conference proceedings talk, International Symposium on Wikis and Open Collaboration, Mountain View, CA

This paper investigates Wikipedia's article deletion processes, finding that it is heavily populated by specialists.

Machine-Generated Content: Bots and the Governance of Wikipedia

March 04, 2011

Talk, Digital Media and Learning (DML), Long Beach, CA

Trace Ethnography: Following Coordination through Documentary Practices

January 03, 2011

Conference proceedings talk, Hawaii International Conference on System Sciences, Lihue, Hawaii

We detail the methodology of ‘trace ethnography’, which combines the richness of participant-observation with the wealth of data in logs so as to reconstruct patterns and practices of users in distributed sociotechnical systems

Academic Researchers in Wikimedia Communities: Ethics, Methods, and Policies

July 10, 2010

Panelist, Wikimania 2010, Gdansk, Poland

A panel intended to foster a dialog between academic researchers who study Wikimedia projects and the Wikimedia community.

Bot Politics: How is Automation Changing the Wikipedian Society? Critical Point of View II

March 26, 2010

Talk, Critical Point of View: Wikipedia and the Politics of Open Knowledge, Amsterdam, the Netherlands

The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

February 25, 2010

Conference proceedings talk, Conference on Computer Supported Cooperative Work, Savannah, Georgia

This paper traces out a heterogeneous network of humans and non-humans involved in the identification and banning of a single vandal in Wikipedia.

A short paper showing the recent explosive growth of automated editors (or bots) in Wikipedia, which have taken on many new tasks in administrative spaces.

Do Xuan Long

Talks and presentations