Tapptic - How we improved our flagship app from 1,4 to 4,2 stars?

For four years now, Tapptic and M6 have been working together to deliver the best experience to the 6play users. The product and the development teams have always been fully committed to this mission, which explains the success of the most downloaded French app in the stores.

In January 2016, we launched the last major release. We redeveloped the app from scratch by using the best of the new features provided by the mobile development ecosystem. We were quite satisfied with the result and had some good internal reviews but, despite that, the team was quite disappointed and frustrated as the ratings on the stores weren’t good enough: not more than 3.4 on Android and 1.4 on iOS.

We knew that the app was quite stable and the product well designed. Those convictions were based on the multiple test campaigns we launched on the app since the beginning of the year, but despite all our effort the rating remained irremediably low.

In September, we decided to try a new strategy and took a new direction and created a quality evaluation framework around the app based on 3 principles:

Monitor – You cannot fix a bug that you ignore.
Prioritise – Better quality metrics leads to better development decisions.
Engage – Happy users don’t usually rate your app if you don’t ask them to.

Those principles should be followed in a chronological order.

MONITORING – You cannot fix a bug that you ignore

The first thing we did was to figure out what was going on in production from a technical point of view (QoS – Quality of Service) and from a user point of view (QoE – Quality of Experience). Obviously, we were missing something.

We decided to integrate Apteligent, a SDK to monitor the crashes occurring in production in order to have a better vision about the real quality of the app. This very first step should be considered as the bedrock of any strategy in mobile quality development. We often forget it but a mobile application in production is a true black box. Intuition is fine, data is better. We thus tagged the application at a very low level so that we could collect more information about what was happening in there: in which context the bug occurred, what was the network quality, what were the last actions the user performed, etc.

As the video player of the app is important and complex, we also integrated Youbora, a dedicated solution to monitor it. Similarly, we really needed to know exactly which channels were facing issues, at what time and for which video asset(s).

Thanks to those tools, we managed to understand the conditions under which a sneaky bug occurred. We were tracking it for several weeks, we knew that it happened only in specific conditions with high volume. By cross-checking information from the different tools, we managed to determine that the player was crashing while the server was not able to deliver the very first chunk of the video stream (black hole). By fixing this bug, we reduced the number of crashes by up to 80%.

From a user perspective, we managed beta releases in order to get some feedback from premium users: more than 250 users tried new features. We recruited them amongst the highly active segment. Our strategy was less to gather direct feedback than to monitor technical metrics to be sure that we were not missing something huge. On top of it, it allowed us to test the application on exotic devices. Thus, we used the beta testing mainly as a reinsurance stage.

At last, we actively monitored the stores. We pushed all the comments with 3 stars or less in a dedicated Slack channel that we were following on a daily basis. The aim was to spot issues that were not caught by the technical monitoring solutions. The user can encounter a bad experience even if the app doesn’t crash, such as dead end path. For some reason, some of our users faced issues at the launch of the app, yet it didn’t crash so we couldn’t see it in our technical dashboards. We identified the problem on the stores and asked further details to our users in order to fix it. This store monitoring is therefore complementary to the technical one.

At the end of this first step, we had plenty of information but we still had to prioritize what to fix and when to do it.

PRIORITISING – Better metrics leads to better development decisions

When we speak about quality in the software development area, it’s not always easy to know how to make the balance between fixing bugs and implementing new features.

To help us to take the best decisions, we defined which metrics we would use to take decisions. We wanted those metrics to be easy to read and easy to measure. We had a clear dashboard with the metrics that we wanted to improve; we used it during our development rituals as a communication tool. Here are some of the metrics we were following: crash-free sessions (%), crash-free users (%), video failures (%), rating of the app, number of ratings, etc.

On top of KPIs, we defined objectives to achieve and set up a quality contract: if a KPI fell below the quality standard, the development priority changes to the bug fixing in order to improve it.

Between September and November, we released 4 new versions including mainly bug fixes. This velocity was quite challenging for the team but, in the end, absolutely rewarding. By fixing the most impactful bugs, the crash-free users went from 97.4% to 99.6%. From a business point of view, this is massive.

If we have a look at the organisation of the development team, the Quality Owner (QO) supported the Product Owner (PO) in the development lead by helping him to define the development priority. The PO was focusing on new features while the QO was dealing with the bugs to fix.

ENGAGING – Happy users don’t usually rate apps

When we were finally confident enough that the quality was tangible enough, we implemented a feedback feature. The idea, pretty simple, was to ask to our users if they had a good experience while using the app: if they were happy, we asked them to rate the app in the store, if not to give us a feedback so that we can improve it.

We released the new feature at mid-December; the results were immediate:

A massive increase in the amount of ratings. We doubled the number of ratings received on Android and we multiplied them by 5 on iOS.
A significant rise of the average rating. In two weeks, we reached the 4 stars on both iOS and Android. The average rating is now settled at 4.2 stars.

The recovery was especially impressive on iOS as we were coming from a lower rating (1.4 out of 5). After 4 months of hard working, we finally achieved our goal.

Obviously, this is not the end of the road. The good rating is a true satisfaction but we want to go further in the understanding of the user experience. To put it in another way: what means those “4.2” stars and how we could reach the 5 stars? We are thus working on a form that would allow us to better qualify the bad experiences on 6play. Are they due to geoblocking, advertisements or poor video performance?

Another open thread is the integration of a CRM solution in the app. M6 wants their users to be able to contact them directly within the app if they face an experience issue. It’s a complex topic, that requires to manage the support chain end-to-end, but our users deserve that we have this ambition. We are ready for those new challenges.

Finally, an important side-effect of this improvement is that it boosted all the teams working on the product. The fact that they were working on a valued product, with a lot of visibility, is now a real pride. I worked with them for all those months, I know that they are highly committed and professional to the end. Putting their work forward is also a satisfaction; they deserve it.

Feel free to comment and contribute to the discussion, we all know that stars are important, who can say otherwise 🙂

How we improved our flagship app from 1,4 to 4,2 stars in 4 months?

MONITORING – You cannot fix a bug that you ignore

PRIORITISING – Better metrics leads to better development decisions

ENGAGING – Happy users don’t usually rate apps