Economy

How A Data Janitor Laboured For 30 Hours To Clean Nigeria’s 2020 Approved Budget

By Charles Mba

January 25, 2020

A National budget can be compared to what a man does when it’s his turn to shop for the family for the month. Usually, the woman knows every detail in her head – groceries, toiletries, deodorants, four kinds of pepper, five kinds of seasoning, dry beans, baked beans, green beans, fresh fruits, fruit juice and so on. A smart husband will take a jotter and keep writing these items down against the estimated cost the wife gives him. By the time the list reaches the 55th item, a gentleman is likely going to say, “darling, I want to use the bathroom!”. He clutches his groin with one hand and passes the paper to the wife with the other hand. He jets off from the room with his last plea, “Babe, please add all these up first, my calculator is on the shelf, I want to know if I have enough money to cover all those expenses.” Now assume, the man had been writing the cost of each item in words or in Roman numerals.

That is what Nigeria’s national Budget looks like. It’s handed down to you to calculate, but it’s presented in a way that makes it difficult even for your computer to readily pick the figures for processing. Like that allegorical wife, one wonders why those who prepare Nigeria’s budgets behave that way. Like that bemused wife, one begins to suspect that the man who passed Nigeria’s budget over to us is actually running away from something.

Transparency! That’s what the government is running away from. Though the Nigerian Government, ministries, departments and agencies (MDAs) feign transparency by publishing various data occasionally on their websites, most of the released data are published in protected document format (pdf) instead of formats that are easy to read by computers, such as the spreadsheet (EXCEL) or comma-separated value (CSV) formats.

Governance pro, a Canadian based governance agency asserts that transparency, as a characteristic of good governance, means that the information should be provided in easily understandable forms and media. So, there is a big question mark on the transparency of most ministries, departments, and agencies who put out their data, especially numerical data, in only pdf formats.

The presentation of the 2020 approved budget of Nigeria in pdf format makes it unnecessarily difficult to assess and to draw useful insights from. To liberate the numerical data, the budget document first has to be put in a format such as a spreadsheet format. This sets the figures free to be read by the computer. It allows the person viewing the budget to perform various operations such as addition, subtraction, division, multiplication, averages, etc., and to cross-check totals and subtotals given in the budget.

The availability of data in PDF only reduces the potential value of the data, as the majority of the citizenry will not be able to do anything useful with it. It also delays the process of crosschecking numerical facts, analysing estimated revenues and expenditures, and drawing insights.

However, one agency that ranks highest for keeping various forms of data, especially machine-readable data like the Excel document format is the Bureau of Public Procurement.

So, Nigeria’s 2020 budget document released in pdf form is devalued for all purposes of openness, transparency, probity and accountability. Its pdf presentation creates the first barrier to turning its data into information. When vital information is blocked this way,  insight into the government’s revenue and expenditure is impossible. This indirectly compels citizens and government alike to make poor decisions and causes the latter to repeatedly fashion unrealistic public policies.

Transforming data from one format to another requires not only computer software but also a lot of manual cleaning. This is because the data cleaner is expected to remove, restore, rearrange and tidy up a lot of data components scattered in the wrong parts during the conversion from pdf to machine-readable formats. Thus the manual cleaning process is usually similar to what a janitor does in removing dirt.

It took a total of 30 hours to clean the entire approved 2020 budget of Nigeria. The entire approved budget was fragmented into 47 different budgets of MDAs. All these were stored as separate pdf files sourced from the website of the budget office. The Janitor’s work began by downloading each of the pdf files. A total of 44 pdf files representing the 2020 budget of 44 MDAs were processed.

It is important to note here that the janitor did not process the budget documents of four MDAs because their descriptions read 2017 budget instead of 2020. These MDAs are the code of conduct bureau, auditor general of the federation and federal capital territory administration. Besides these three, there was another code of conduct tribunal budget labelled as the 2009 budget.

The pdf files were then converted to Microsoft word using a Nitro pdf converter. All data in Microsoft word were further copied and pasted in spreadsheets (MS Excel) for further manual cleaning. An attempt to clean the data by converting directly from pdf to excel increased the time for cleaning by 4 minutes in addition to an added cleaning complexity that might not be easy for a new data cleaner.

Data cleansing techniques that were carried out include;

By doing this, the approved budget data becomes a complete interactive tool for those who are affected by government policies to link expenditures by government agencies to the original budget appropriation. This engenders probity and accountability by enabling effective monitoring and assessment of government budget plans, processes and performance.

The converted and clean version can be downloaded for analysis via DATAPHYTE’s Open data portal.