Story of Ready-to-drink Tea in Indonesia

Yogi Saputro
10 min readNov 9, 2019

Data Visualization from Registered Ready-to-drink Dataset in Indonesia

Photo by Crystal de Passillé-Chabot on Unsplash

There are plenty of tea drink products in Indonesia. They are as common as bottled water or coffee these days. Tea drinks can be found on small stores, supermarkets, malls, restaurants, street vendors, and practically everywhere.

Despite its popularity, data analysis about this product is quite rare. Probably due to lack of ready-to-use data. I’ve found reliable source from Indonesian Food And Drug Administration (Bahasa: BPOM-Badan Pengawas Obat dan Makanan).

Link: http://cekbpom.pom.go.id/index.php/home/produk/r2isesetrjp2lvbjcb4617o7i0/13/row/5100/page/0/order/4/DESC/search/1/minuman%20teh

The data is extracted from html table, converted to CSV, and then cleansed to make it usable for analysis.

You can find complete code and dataset on my Github repository.

#import libraries
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import itertools
import warnings
warnings.filterwarnings(“ignore”)
import io
import base64
from matplotlib import rc,animation
from mpl_toolkits.mplot3d import Axes3D
import os
import re

This dataset is in Bahasa Indonesia. For short translation, these are what each column represents:

  • nomor_registrasi : ID of registered product in BPOM database. In this dataset, this ID uniqueness has been validated.
  • nama_product : product name, basically explaining what product this is. For this dataset, this provides detailed information about the tea drink.
  • merk : brand of the product.
  • jenis_kemasan : type of package.
  • ukuran_kemasan : size(s) of package for every registered product. Please do note that one product may have more than one package size. They are encapsulated in round brackets (a.k.a ‘()’) and separated by semicolon (a.k.a ‘;’).
  • produsen : name of manufacturing company
  • kota_produsen : city where manufacturing company HQ (and probably its plant) is located
  • provinsi_produsen : province where manufacturing company HQ (and probably its plant) is located
dataset = pd.read_csv(“dataset_bpom_tea.csv”)
dataset.head(10)

Distribution of Tea Drink Packages

dataset_kemasan = dataset[“jenis_kemasan”].value_counts().reset_index()
dataset_kemasan = dataset_kemasan[dataset_kemasan[“jenis_kemasan”]>=2]
plt.figure(figsize=(10,8))
ax = sns.barplot(y=dataset_kemasan[“index”][:15],x = dataset_kemasan[“jenis_kemasan”][:15],palette=”husl”, linewidth=1,edgecolor=”k”*15)
sns.set_style(“white”)
sns.despine()
plt.xlabel(“Number of product”)
plt.ylabel(“Package Type”)
plt.grid(True)
plt.title(“Distribution of Tea Drink Packages”,color=’b’)
for i,j in enumerate(dataset_kemasan[“jenis_kemasan”][:15].astype(str)):
ax.text(.7,i,j,fontsize = 9,color=”black”)
plt.show()

Most of tea drinks are packaged in plastic bottle (“Botol Plastik”) or plastic cup (“Gelas Plastik”). Some other packages available are aluminum can, laminated carton, glass bottle, tetrapak, Aluminum foil pouch, and PET plastic bottle. Some products are packaged in plastic (“Plastik”), which is unclear whether it is bottle or cup. It is probably caused by non-standard data entry process.

From this graph, one can conclude that tea drinks largely contributes to plastic production. According to this source , the industry produces roughly 2 billion liters of tea drink annually. Assuming average tea serving size is 350ml, the industry sends 5.7 billion cups/bottles to the environment annually.

Distribution of Tea Servings

Tea serving data can be obtained from “ukuran_kemasan” column. Please note that the column contains many values. So, we will need an array to capture all of those values, and then put them into DataFrame.

#get all serving sizes using regex search over iterated rows
a=[]
for index,item in dataset[“ukuran_kemasan”].iteritems():
for s in re.findall(r’[0–9]+’, item):
a.append(int(s))
#create new Pandas series and group them by 4 serving size groups
index = [‘Small’, ‘Medium’, ‘Large’]
dataset_servings = pd.DataFrame({‘value’: a})
bins = pd.IntervalIndex.from_tuples([(0, 200), (200,400), (400, 600), (600, 2000)])
dataset_servings = pd.cut(dataset_servings[‘value’], bins).value_counts()
ax = dataset_servings.plot.bar(color=”maroon”, alpha=0.7)
sns.set_style(“white”)
sns.despine()
plt.show()

Manufacturers choose to make their serving size not too big. Majority of tea drinks are served in either small bottles (200ml — 400ml) or in cups (up to 200ml). Regular bottle (400ml — 600ml) products exist, although it is not really popular among manufacturers. Some large quantities serving products also exist in market, but it is significantly less than other products.

This finding can be explained from both manufacturer and user perspective. From user perspective, it indicates that Indonesian people preferred compact-sized tea drinks. Probably it is due to easier mobility, and the serving size is just enough to satisfy the thirst. From manufacturer perspective, it is always profitable to sell small servings because it costs less, but sells for more margin. For example, if 500ml tea sells for Rp5.000,00 then 250ml tea always sells for more than Rp2.500,00. Small bottle servings might be an equilibrium zone for tea drinks.

Distribution of Flavours

Tea drink has many combination of flavours. It could be either jasmine tea, black tea, white tea, oolong tea, etc. It can be added with honey or milk as well. Some even have fruity flavours like apple, orange, or blackcurrant. Some tea has flavour combination like honey jasmine tea or fruit black tea. To deal with that requirement technically, new columns are created. These columns will display binary characteristic of tea flavours.

dataset_flavour = datasetdataset_flavour[“is_jasmine”] = [1 if (‘Melati’ in x or ‘Jasmine’ in x) else 0 for x in dataset_flavour[‘nama_produk’]]
dataset_flavour[“is_black”] = [1 if (‘Hitam’ in x or ‘Black ‘ in x) else 0 for x in dataset_flavour[‘nama_produk’]]
dataset_flavour[“is_green”] = [1 if (‘Hijau’ in x or ‘Green ‘ in x) else 0 for x in dataset_flavour[‘nama_produk’]]
dataset_flavour[“is_white”] = [1 if (‘Putih’ in x or ‘White ‘ in x) else 0 for x in dataset_flavour[‘nama_produk’]]
dataset_flavour[“is_honey”] = [1 if (‘Madu’ in x or ‘Honey ‘ in x) else 0 for x in dataset_flavour[‘nama_produk’]]
dataset_flavour[“is_milk”] = [1 if (‘Susu’ in x or ‘Milk ‘ in x) else 0 for x in dataset_flavour[‘nama_produk’]]
dataset_flavour[“is_flavoured”] = [1 if (‘Rasa’ in x) else 0 for x in dataset_flavour[‘nama_produk’]]
dataset_flavour[“is_oolong”] = [1 if (‘Oolong’ in x) else 0 for x in dataset_flavour[‘nama_produk’]]

Now that necessary columns are ready, we can proceed to visualization step. We will use pie chart to show percentage of tea flavour.

# Jasmine pie chart
labels = [‘Jasmine Tea’, ‘Non-Jasmine Tea’]
sizes = [dataset_flavour[dataset_flavour[“is_jasmine”]==1].count()[“is_jasmine”],dataset_flavour[dataset_flavour[“is_jasmine”]==0].count()[“is_jasmine”]]
fig, ax = plt.subplots()
ax.pie(sizes, labels=labels, autopct=’%1.1f%%’, colors =[‘yellow’, ‘azure’])
ax.axis(‘equal’) # Equal aspect ratio ensures the pie chart is circular
ax.set_title(‘Jasmine Tea Product’)
plt.show()

Jasmine tea is popular tea type in Indonesia. The ingredients are primarily tea leaves and jasmine flowers. It is characterized by its pleasant odor.

Green tea is also quite popular. While most of tea drinks in this case is sweet, green tea gives acceptable bitter taste for the population.

This graph shows interesting insight. More than half of tea drinks in Indonesia are flavoured. Different sensation might be expected from tea drink, whether it is fruity , honey, or milk flavour.

Import vs Local Brands

The difference from local and import brands can be identified by its registration number. For this case, only two permits matter.

First, MD registered product indicates that the local producer has its own brand.

Then, ML registered product indicates that the producer import goods and then sell it in Indonesia. In short, a product with this label means this is importing brand.

“nomor_registrasi” column contains information whether a product is registered as domestic product (MD) or imported product (ML). Upon checking the value, those two are only options exist in dataset. So, we can create binary column “is_local” by matching substring in each row of “nomor_registrasi”. If it matches with “MD”, then “is_local” value equals 1. Otherwise, the value is 0.

dataset_brand = dataset
dataset_brand[“is_local”] = [1 if (‘MD’ in x ) else 0 for x in dataset_brand[‘nomor_registrasi’]]

Then, we can make new pie chart from “is_local” column.

Almost ten percent of registered tea drinks in Indonesia are imported. It doesn’t mean imported tea drinks own 10% market share in Indonesia. It means that while foreign brand can exist in Indonesia, local brands still hold their presence to customers.

Manufacturer Location

Manufacturer locations might reveal some interesting pattern. In essence, it is better to produce goods near customers. additionally, manufacturers choose strategic place where infrastructure is mature and distribution cost is low. My hypothesis is most of manufacturers are located around Jakarta area, and then transported.

To test it, we need to code some more.

# Select manufacturer and its location
dataset_loc = dataset[[“kota_produsen”,”produsen”]].drop_duplicates().reset_index()
loc_count = dataset_loc[“kota_produsen”].value_counts().reset_index()
print(“Tea drink manufacturers are distributed in %d cities.” % dataset_loc[“kota_produsen”].unique().size)
# Create graph
plt.figure(figsize=(10,8))
ax = sns.barplot(y=loc_count[“index”][:15],x = loc_count[“kota_produsen”][:15],palette=”husl”, linewidth=1,edgecolor=”k”*15)
sns.set_style(“white”)
sns.despine()
plt.xlabel(“Number of companies”)
plt.ylabel(“City”)
plt.grid(True)
plt.title(“Distribution of Producer Location (City)”,color=’b’,fontsize = 18)
for i,j in enumerate(loc_count[“kota_produsen”][:15].astype(str)):
ax.text(.7,i,j,fontsize = 9,color=”black”)
plt.show()

Tea drink manufacturers are distributed in 49 cities.

Tea drink manufacturers are distributed in 15 provinces.

According to graphs above, tea drink manufacturers spread in 49 cities and 15 provinces. Majority of them are located in Java Island. There are some reasons behind this fact:

  • more than 70% of Indonesian population are located in Java
  • Infrastructures in Java are more developed, thus making it cheaper to distribute the products

List of Companies

dataset_corp = dataset[“produsen”].unique()
print(“Number of companies:”,dataset_corp.size)
dataset_corp

From piece of code above, we found 134 companies are registered as tea drink manufacturer or importer in Indonesia.

Company Product Lines

dataset_cline = dataset[[“produsen”,”merk”]].drop_duplicates()
cline_count = dataset_cline[“produsen”].value_counts().reset_index()
#create graph
plt.figure(figsize=(10,8))
ax = sns.barplot(y=cline_count[“index”][:15],x = cline_count[“produsen”][:15],palette=”husl”, linewidth=1,edgecolor=”k”*15)
plt.xlabel(“Number of product line”)
plt.ylabel(“Company”)
plt.grid(True)
plt.title(“Companies with Most Product Lines”,color=’maroon’,fontsize = 18)
for i,j in enumerate(cline_count[“produsen”][:15].astype(str)):
ax.text(0.55,i,j,fontsize = 10,color=”black”)
plt.show()

PT Sinar Sosro has the most product lines compared to other tea drink manufacturers. It is not surprising since its core competence is in tea industry, and Sosro has been in industry for almost 50 years.

We know that Sosro is an outlier. But given PT Sehat Sukses Sejahtera with 3 product lines makes it into the graph, one might wonder: how many products does each manufacturer have? We can find out using code below.

cline_dist = cline_count[“produsen”]
plt.figure(figsize=(13,7))
ax = sns.distplot(cline_dist, kde=False, color=”maroon”)
sns.set_style(“white”)
sns.despine()
# Add title and axis names
plt.title(‘Distribution of Companies Based on Product Lines’)
plt.xlabel(‘Number of product lines’)
plt.ylabel(‘Number of companies’)
plt.show()

Many tea drink manufacturers hold only one brand. This might indicate that most of tea drink manufacturers have yet to penetrate the market. Maybe they are relatively new to this market. On the other hand, old players have resource and experience to expand their product line in order to gain greater market share.

Companies Producing Less Plastics

Once again, we’ll talk about plastics. Personally, I feel like plastics will be dangerous problem for Indonesia in next decades. While other countries have taken action to limit plastic usage, Indonesian keeps using and wasting more plastics. Here, I’ll emphasize companies that use less plastic as appreciation for not adding more waste to mother earth.

Companies are producing less plastics when the have no product line with plastic packages. Plastic cups, plastic bottles, and PET packages are considered plastic packages.

Basically, we want to make a left join operation. Here’s the approach.

1. Divide “jenis_kemasan” values to two arrays: non-plastics and plastics.

2. Create dataframe with companies producing tea drink in plastic packages.

3. Create dataframe with companies producing tea drink in non-plastic packages.

4. Remove dataframe step 2 from dataframe step 3.

non_plastics = [“Kaleng”, “Karton Laminat”, “Botol Kaca”, “Tetrapak”, “Pouch Aluminium Foil”]
plastics = [“Gelas Plastik”, “Botol Plastik”, “Plastik”, “Gelas Plastik PET”, “Botol Plastik PET”]
dataset_plastics = dataset[dataset[‘jenis_kemasan’].isin(plastics)]
dataset_non_plastics = dataset[dataset[‘jenis_kemasan’].isin(non_plastics)]
corp_plastics = dataset_plastics[“produsen”].drop_duplicates().values
corp_clean = dataset_non_plastics[~dataset_non_plastics.isin(corp_plastics)]
corp_clean = corp_clean[“produsen”].drop_duplicates().reset_index()

Then we can make pie chart with relevant data.

We found 28 companies that doesn’t use any plastic package in their product lines. It equals to 17.3% of all tea drink manufacturers. But it doesn’t mean that 17.3% of tea drinks on the market are not made of plastics. It requires production data (which is more difficult to obtain). Still, I appreciate these companies for choosing not to make plastic-packaged product.

From this finding, we know that there is no sign of plastic pollution awareness among the companies. Plastic packages are used because they are cheap, durable, and easy to manufacture. Negative environmental effects are not considered. There is no enforcement or regulation in Indonesia, yet.

Maybe additional regulation could help. Government can give incentives to company that uses no plastic package in their product line. Another idea, companies producing certain amount of plastics can be subject to penalty.

Closing

Due to limited time, I can only tell little bit of stories about ready-to-drink tea in Indonesia. Still, we have some interesting key takeaways:

  1. Tea drink industry creates crazy vast amount of plastics: 5.7 billion cups or bottles annually.
  2. There are 134 companies playing in Indonesian ready-to-drink tea market and almost 90% are locals.
  3. Most popular tea serving size is between 200ml to 400ml.
  4. Many people like their tea flavoured, whether it is honey, fruit, milk, or anything.
  5. Most manufacturers are located in Java, especially Jakarta area.
  6. Most manufacturers have only one type of product to sell. Product diversification rate is quite low. Sosro is peerless with 16 products in its portfolio.
  7. 82.7% of manufacturers still use plastic packaging.

If you have questions, comments, or critics, please let me know. I think I still have some stories in backlog just from this dataset.

  • packages in different servings
  • servings according to manufacturers
  • flavours in different regions
  • Distribution of fruity flavours

You can find the Jupyter Notebook file and dataset on my Github repository.

--

--

Yogi Saputro

Software developer with MBA degree, mentor, somewhat fatherly figure, data and business synergy enthusiast