WordPress prides itself on being an application built by the user for the user. The problem is with the popularity and reach of WordPress today, the distance between the WordPress 1% (or even .1%) and the average user is becoming so vast we (the people who contribute to WordPress core) know almost nothing about the actual people who use WordPress or how they use the application. This will become more of an issue as the application evolves, and it is high time we do something about it.
Lack of data means we’re flying blind
During the development of WordPress 4.7, I was involved in several conversations centered around assumed use of features. The general argument was that based on the 80/20 rule, certain features should be added while others should be removed. I kept brining up the well known fact we don’t have a clue what features 80%, or even 20%, of WordPress users actually use so any claim of validity in the 80/20 rule is guesswork at best, and in response one developer told me, point blank, “we know what the user wants.” I don’t know about you, but in my book that is not the way to build an application for real people.
What we need is raw data based on actual use, and lots of it. What we need, is telemetry. And there is ample industry precedence for collecting such telemetry, in online and offline applications and even in the WordPress ecosystem.
Here’s what I propose:
WordPress core should ship with an opt-in Telemetry feature that collects anonymized data on feature and functionality use.
This is in line with what major software providers do, and it is a feature most users will be familiar with.
The purpose of the Telemetry feature is to collect relevant data about how WordPress is used in the wild. This begs the questions “what is relevant data?”, “who decides what data is collected?”, and “who has access to the collected data?”
Here’s how I imagine it would work:
Implementation and activation
WordPress Telemetry is shipped as a core feature in new installs and an update to existing installs. When Telemetry is first added to the site, the admin gets a prompt asking if they want to contribute anonymized use data to the WordPress project. The default setting is “No” and the admin can change this to “Yes”. Once activated, the Telemetry setting can be changed at any time by the admin.
In more detail:
- The opt-in selector for the feature should be surfaced on first install or when the site is updated to the first version of WordPress containing the feature is installed.
- For new installs the opt-in question should appear on the 5-minute install page along with “Allow search engines to index the site” or similar.
- For upgrades, the opt-in question should be revealed in a dedicated modal.
- The feature should be disabled by default and the admin can make an active choice to participate.
- The feature should be controllable at any time through a dedicated section under Settings->General
- It is possible the best way to make users feel this feature is not a Trojan horse is to ship it as a plugin that auto-installs on opt-in and auto-uninstalls on opt-out.
As a benchmark, some core data should always be collected, including but not limited to:
- Number of themes and plugins installed
- Frequency of use of specific views (Settings, Customizer, etc)
- Current version
- Update status
- Locale (generalized to country)
In addition it should be possible to push custom queries to activated users to test for specific interactions, as an example how many users click the Underline button in TinyMCE. I’m not sure exactly what the best approach here is, but this is one idea: The feature queries a centralized service on a weekly / monthly basis to get instructions on what type of data is currently being collected.
The decision on what data to be collected should be done by committee based on current active tickets and features in development that require user data as well as longitudinal studies of user behavior.
Anonymity and transparency
A core requirement for the success of this feature is that data collection must be 100% anonymized. No data collected can be traced back to an individual user. Ideally the feature itself will be built in such a way that even accidental collection of personal data is impossible.
At any time, information about what data is being collected should be available to end-users both on a dedicated page on WordPress.org and through the setting in admin.
All data collected should be made public for scrutiny and use to ensure transparency and enable actual use.
Practical way forward
To prove the viability of this feature I propose a slow incremental deployment: Start with collection of certain uncontroversial datapoints like current language setting, number of themes and plugins, and one UI interaction that needs testing. Once this MVP has proven itself effective, a larger scale testing program can be shipped.
I’ve already created a ticket for this proposal on Trac and I’d love to hear your thoughts and ideas on the topic.
To keep the conversation in one place I request that all comments are left on the Trac ticket. For this reason I’ve disabled comments here.
Update February 3, 2017
Matt closed the ticket shortly after an article published in WP Tavern brought it to the surface. I personally think that closure was premature and will continue pushing for the feature. In the meantime I’ve reopened comments here so you can voice your opinion on the matter.