The Case for Telemetry in WordPress

Update August 10, 2017: After publishing this article I reopened the Trac ticket arguing the reason for closing it was no longer valid in lieu of Gutenberg’s collection of telemetry. A few hours later it was announced that usage tracking will be removed from Gutenberg and the ticket was closed as the original argument for closing it is once again accurate.

This is an expanded version of a Twitter thread from earlier today.

#WordPress needs a core method for collecting quantitative user data through telemetry (aka “metrics”). I wrote a post about this on my blog back in December 2016 and filed a ticket on the WordPress development system Trac at the same time. That ticket was promptly closed by project lead Matt Mullenweg arguing “it is off the table for 2017 as it is not within the three focus areas.”

Since then, Gutenberg – the proposed new editor feature for WordPress – has introduced an “opt-in usage tracking system” (telemetry) meaning the argument that telemetry is not within the three focus areas (of which Gutenberg is one) is no longer valid.

One of the biggest challenges WordPress faces is the lack of reliable data about global day-to-day use. Like most Open Source projects, WordPress has relied on community feedback as it’s primary data source. Which is fine for a small project. Problem is WordPress is a Very Big Project with global reach and the majority of it’s users never interface with the community.

I like to say we, the people who talk about, provide feedback for, and design/develop WordPress are the 1%. I wrote about this on my blog in December 2015.

I now think the number is more like 0.1%.

Making decisions based on the traditional community feedback model is making decisions without knowing anything about the majority of users. Some will argue this is fine, that WordPress is developed by those who show up. That’s not a workable or responsible model for a project. It also goes against one of the core principles of WordPress itself, that “the core should provide features that 80% or more of end users will actually appreciate and use,” although the validity of the 80/20 rule was put into question by project lead Matt Mullenweg earlier this year (I wrote about this on my blog as well).

We, the people who build WordPress, have a duty of care to the people we build it for. And those people are not us.

“We can just do user testing,” you say? Sure. Let’s do proper qualitative user testing. That requires staffing, funding, and infrastructure. User testing for a project like WordPress is non-trivial. It requires professional analysis.

Testing one UX feature (like a view) would take 3 dedicated testers, at least 10 subjects, 3 weeks, and result in a >10 page report. User testing like this would be great, but it’s not something we are doing right now, and it’s not on the horizon either. Which brings me to telemetry and quantitative user metrics in WordPress.

Done responsibly and with care, telemetry can be a treasure trove of information for the evolution of WordPress. The key is collecting the RIGHT metrics using the right methodology. Telemetry data often ends up being a numbers soup because too much irrelevant data is collected. What I’m proposing is a lean and targeted approach to telemetry in WordPress:

  1. Telemetry should be an opt-in option that when activated installs a plugin. Admins should be informed about this option by the WordPress dashboard when WordPress is installed and reminded at regular intervals that the plugin is available and whether it’s activated.
  2. Anonymize all collected data at client level before submission.
  3. Collect only basic data at the core level of the plugin (WP/PHP/MySQL version, locale, language, etc.)
  4. Provide up-front info to end-users about what data is currently collected and what it’s being used for, with opt-out options for the plugin as a whole and for granular data collection.
  5. Allow for targeted data collection based on research needs.
  6. Store data on servers owned by the community (not corporate interests). Share data openly to ensure transparency.

Telemetry implemented in this way would give WordPress the ability to inform decisions about current and future features. Some, notably project lead Matt Mullenweg, have said this is not necessary, that it won’t be useful. I disagree.

In my view, making decisions that impact millions of users without metrics to back them up is irresponsible and quite frankly foolish. We run the risk of doing a bear’s favor: something we think will help that actually hurts, all because we don’t have enough information.

There are plenty of arguments against telemetry: anonymity, security, oversight, Big Brother, competitive advantage, etc. If we do this right I am certain we can build a system that alleviates the concerns over anonymity, surveillance, etc. Couple that with up-front disclosure, transparency, and explanation of what data is collected and why and people will sign on.

As for the competitive advantage aspect; we don’t want to share data with our competition; that runs counter to the Open Source idea in my opinion. We can and should share this data with everyone, because it’ll make the web a better place for everyone. It has purpose beyond WordPress. Not collecting this data because we don’t want competitors to have it is like leaving a broken window unmended lest it be broken again.

In short, WordPress needs telemetry. There’s a ticket on Trac proposing this, and Gutenberg has a PR for telemetry.

Inexplicably the Trac ticket is closed because “it is not within the three focus areas” which thanks to Gutenberg is not the case.

What WordPress needs is an open debate on this topic. What are the arguments for and against? What can be gained and what is lost? Should we do this? And if so, how do we do it in an open, transparent, and responsible way that helps inform and elevate the conversation while looking after the interests of all WordPress users?

This discussion belongs in Trac in an open ticket. Closing it down before a proper discussion has been allowed is not the Open Source way.

As of this writing the WordPress telemetry ticket remains closed:

Cross-posted from the original at LinkedIn.


The Case for WordPress Telemetry

WordPress prides itself on being an application built by the user for the user. The problem is with the popularity and reach of WordPress today, the distance between the WordPress 1% (or even .1%) and the average user is becoming so vast we (the people who contribute to WordPress core) know almost nothing about the actual people who use WordPress or how they use the application. This will become more of an issue as the application evolves, and it is high time we do something about it.

Lack of data means we’re flying blind

During the development of WordPress 4.7, I was involved in several conversations centered around assumed use of features. The general argument was that based on the 80/20 rule, certain features should be added while others should be removed. I kept brining up the well known fact we don’t have a clue what features 80%, or even 20%, of WordPress users actually use so any claim of validity in the 80/20 rule is guesswork at best, and in response one developer told me, point blank, “we know what the user wants.” I don’t know about you, but in my book that is not the way to build an application for real people.

What we need is raw data based on actual use, and lots of it. What we need, is telemetry. And there is ample industry precedence for collecting such telemetry, in online and offline applications and even in the WordPress ecosystem.

Here’s what I propose:

WordPress core should ship with an opt-in Telemetry feature that collects anonymized data on feature and functionality use.

This is in line with what major software providers do, and it is a feature most users will be familiar with.

The purpose of the Telemetry feature is to collect relevant data about how WordPress is used in the wild. This begs the questions “what is relevant data?”, “who decides what data is collected?”, and “who has access to the collected data?”

Here’s how I imagine it would work:

Implementation and activation

WordPress Telemetry is shipped as a core feature in new installs and an update to existing installs. When Telemetry is first added to the site, the admin gets a prompt asking if they want to contribute anonymized use data to the WordPress project. The default setting is “No” and the admin can change this to “Yes”. Once activated, the Telemetry setting can be changed at any time by the admin.

In more detail:

  • The opt-in selector for the feature should be surfaced on first install or when the site is updated to the first version of WordPress containing the feature is installed.
  • For new installs the opt-in question should appear on the 5-minute install page along with “Allow search engines to index the site” or similar.
  • For upgrades, the opt-in question should be revealed in a dedicated modal.
  • The feature should be disabled by default and the admin can make an active choice to participate.
  • The feature should be controllable at any time through a dedicated section under Settings->General
  • It is possible the best way to make users feel this feature is not a Trojan horse is to ship it as a plugin that auto-installs on opt-in and auto-uninstalls on opt-out.

Data collection

As a benchmark, some core data should always be collected, including but not limited to:

  • Number of themes and plugins installed
  • Frequency of use of specific views (Settings, Customizer, etc)
  • Current version
  • Update status
  • Locale (generalized to country)
  • Language
  • etc

In addition it should be possible to push custom queries to activated users to test for specific interactions, as an example how many users click the Underline button in TinyMCE. I’m not sure exactly what the best approach here is, but this is one idea: The feature queries a centralized service on a weekly / monthly basis to get instructions on what type of data is currently being collected.

The decision on what data to be collected should be done by committee based on current active tickets and features in development that require user data as well as longitudinal studies of user behavior.

Anonymity and transparency

A core requirement for the success of this feature is that data collection must be 100% anonymized. No data collected can be traced back to an individual user. Ideally the feature itself will be built in such a way that even accidental collection of personal data is impossible.

At any time, information about what data is being collected should be available to end-users both on a dedicated page on and through the setting in admin.

All data collected should be made public for scrutiny and use to ensure transparency and enable actual use.

Practical way forward

To prove the viability of this feature I propose a slow incremental deployment: Start with collection of certain uncontroversial datapoints like current language setting, number of themes and plugins, and one UI interaction that needs testing. Once this MVP has proven itself effective, a larger scale testing program can be shipped.

I’ve already created a ticket for this proposal on Trac and I’d love to hear your thoughts and ideas on the topic. To keep the conversation in one place I request that all comments are left on the Trac ticket. For this reason I’ve disabled comments here.

Update February 3, 2017

Matt closed the ticket shortly after an article published in WP Tavern brought it to the surface. I personally think that closure was premature and will continue pushing for the feature. In the meantime I’ve reopened comments here so you can voice your opinion on the matter.