The Case for WordPress Telemetry

WordPress prides itself on being an application built by the user for the user. The problem is with the popularity and reach of WordPress today, the distance between the WordPress 1% (or even .1%) and the average user is becoming so vast we (the people who contribute to WordPress core) know almost nothing about the actual people who use WordPress or how they use the application. This will become more of an issue as the application evolves, and it is high time we do something about it.

Lack of data means we’re flying blind

During the development of WordPress 4.7, I was involved in several conversations centered around assumed use of features. The general argument was that based on the 80/20 rule, certain features should be added while others should be removed. I kept brining up the well known fact we don’t have a clue what features 80%, or even 20%, of WordPress users actually use so any claim of validity in the 80/20 rule is guesswork at best, and in response one developer told me, point blank, “we know what the user wants.” I don’t know about you, but in my book that is not the way to build an application for real people.

What we need is raw data based on actual use, and lots of it. What we need, is telemetry. And there is ample industry precedence for collecting such telemetry, in online and offline applications and even in the WordPress ecosystem.

Here’s what I propose:

WordPress core should ship with an opt-in Telemetry feature that collects anonymized data on feature and functionality use.

This is in line with what major software providers do, and it is a feature most users will be familiar with.

The purpose of the Telemetry feature is to collect relevant data about how WordPress is used in the wild. This begs the questions “what is relevant data?”, “who decides what data is collected?”, and “who has access to the collected data?”

Here’s how I imagine it would work:

Implementation and activation

WordPress Telemetry is shipped as a core feature in new installs and an update to existing installs. When Telemetry is first added to the site, the admin gets a prompt asking if they want to contribute anonymized use data to the WordPress project. The default setting is “No” and the admin can change this to “Yes”. Once activated, the Telemetry setting can be changed at any time by the admin.

In more detail:

  • The opt-in selector for the feature should be surfaced on first install or when the site is updated to the first version of WordPress containing the feature is installed.
  • For new installs the opt-in question should appear on the 5-minute install page along with “Allow search engines to index the site” or similar.
  • For upgrades, the opt-in question should be revealed in a dedicated modal.
  • The feature should be disabled by default and the admin can make an active choice to participate.
  • The feature should be controllable at any time through a dedicated section under Settings->General
  • It is possible the best way to make users feel this feature is not a Trojan horse is to ship it as a plugin that auto-installs on opt-in and auto-uninstalls on opt-out.

Data collection

As a benchmark, some core data should always be collected, including but not limited to:

  • Number of themes and plugins installed
  • Frequency of use of specific views (Settings, Customizer, etc)
  • Current version
  • Update status
  • Locale (generalized to country)
  • Language
  • etc

In addition it should be possible to push custom queries to activated users to test for specific interactions, as an example how many users click the Underline button in TinyMCE. I’m not sure exactly what the best approach here is, but this is one idea: The feature queries a centralized service on a weekly / monthly basis to get instructions on what type of data is currently being collected.

The decision on what data to be collected should be done by committee based on current active tickets and features in development that require user data as well as longitudinal studies of user behavior.

Anonymity and transparency

A core requirement for the success of this feature is that data collection must be 100% anonymized. No data collected can be traced back to an individual user. Ideally the feature itself will be built in such a way that even accidental collection of personal data is impossible.

At any time, information about what data is being collected should be available to end-users both on a dedicated page on WordPress.org and through the setting in admin.

All data collected should be made public for scrutiny and use to ensure transparency and enable actual use.

Practical way forward

To prove the viability of this feature I propose a slow incremental deployment: Start with collection of certain uncontroversial datapoints like current language setting, number of themes and plugins, and one UI interaction that needs testing. Once this MVP has proven itself effective, a larger scale testing program can be shipped.

I’ve already created a ticket for this proposal on Trac and I’d love to hear your thoughts and ideas on the topic. To keep the conversation in one place I request that all comments are left on the Trac ticket. For this reason I’ve disabled comments here.

Update February 3, 2017

Matt closed the ticket shortly after an article published in WP Tavern brought it to the surface. I personally think that closure was premature and will continue pushing for the feature. In the meantime I’ve reopened comments here so you can voice your opinion on the matter.

 

2 thoughts on “The Case for WordPress Telemetry

  1. Personally, I believe in an age where people are afraid of even their own ISP’s nowadays, this is a subject that should be handled with caution. Many times will you see “All data collected is 100% anonymous, and untraceable” (Looking at you Windows 10 and Microsoft), this is simply not the case at times. That being said, I am not against user statistics as I do believe it has some value. Being a VPN user, the only time I consented to telemetry was when I could preview the data actually being sent, so I know it’s anonymous. I also believe in disclosure of what happens to such data, as it should be laid out both for the developer and the consumer so they’re both on the same page and they know exactly what’s going on without compromising the quality of the data. That being said, if people still wanted to opt-out, that’s totally fine and their choice.

    1. Yes, I agree. The private data related to the owner should not be touch at all. The privacy agreement is not just some text on paper, but should be abide of and protect the user of WordPress against data and information hacking and misusing.

Comments are closed.