Harleen Kaur: Publication | Information about COVID-19 in India

By Natasha Agarwal and Harleen Kaur.

The presence of timely and reliable data enables informed decision-making by government organisations and individuals. When a machine-readable dataset is released on a website, it is non-rival, and thus has characteristics of a public good. There is a case for state financing or production of information. As Carl Malamund says, "Government information is a form of infrastructure, no less important to our modern life than our roads, electrical grid or water systems". Open Data Governance (ODG) are structured datasets produced by government institutions that are released in a machine-readable format. These datasets contain information such as statistics, plans, maps, environmental data, spatial data, materials of agencies, ministries, parliamentary data, budgetary data, and laws.

Governments across the globe have been actively opening their data through national and regional data transparency portals recognising the need for making data available to the public. The process is informed by ODG principles. There are three main reasons for opening government data; increasing transparency, releasing the social and commercial value of the data, and to encourage participatory governance (Attard et al. (2015)). As an example, the COVID-19 pandemic is best controlled through behavioral changes by each individual. To support such changes, the governments need to open their data about the pandemic at an individual and community level.

The ODG principles defining best practices of data sharing include; i) identifying and publishing high-value datasets in a standardised format (such as a directory of medical professionals, tests conducted and results and information about surveillance), ii) adopting open data scheme protocol to share human and machine-readable, non-proprietary format and include universal resource identifier and linked data to provide access,

iii) removing barriers to data access such as requirements of establishing an account, of proving identity, or payments for data access, and iv) making information available in perpetuity by not deleting/changing data permanently.

In this article, we examine the information systems on COVID-19 in India from the viewpoint of these issues in the design of a high performance statistical system.

Data.gov.in and its limitations

In India, an open data policy the National Data Sharing and Accessibility Policy (NDSAP) was announced in 2012 to open government data to the public by following ODG principles.

The policy requires all ministries, departments, subordinate bodies, organisations, and autonomous bodies of the Indian Government to share all publicly generated non-sensitive data in both human-readable and machine-readable formats. The data is disseminated through a common government data platform deployed and managed by the National Informatics Centre (NIC), Ministry of Communications and Information Technology. It mandated that datasets be periodically updated by government agencies along with comprehensive meta-data which enables data discovery and access through departmental portals.

Furthermore, NDSAP requires the Department of Information Technology (DIT) to publish guidelines to implement NDSAP. The implementation guidelines provide details of the data contribution process including; the role and responsibilities of the data controller, approval, publishing process for catalogs and resources, and management of published datasets.

In compliance with NDSAP, India's national data transparency website, data.gov.in was launched in 2012. Accordingly, data.gov.in provides a uniﬁed catalog of datasets allowing users to browse the dataset catalog, view the meta-data associated with each dataset, comment on and rank various datasets, download available datasets, submit suggestions and queries on the published dataset, and submit a request for those that are not available yet (Chattapadhyay (2013)).

Despite the comprehensiveness of the policy and the accompanying guidelines, agencies have responded predictably, i.e. they neither comply with NDSAP nor with the implementation guidelines. As a result, data.gov.in contains issues such as the absence of databases, duplicate datasets, lack of follow-up, or meta-data (Agarwal (2016) and Buteau et al. (2015)). The terms 'policy document' and 'guidelines' which are often used in India are ineffective in that they do not constrain the executive. Hence, these documents amount to exhortations that have little impact on the incentives of officials in favour of greater opacity, reduced work, or gaining power through the control of data.

Ministry of Health and Family Welfare (MoHFW) and COVID-19 data

We examine the data in the public domain emanating from MoHFW during the ongoing COVID-19 pandemic. To understand the availability of resources for healthcare, we searched for a directory of healthcare providers (both institutions and individuals). The latest hospital directory available on data.gov.in was for 2016 and the latest data for the number of registered allopathic doctors and dental surgeons was available for the year 2013.

The MoHFW is disseminating limited data on the spread of COVID-19 through the data.gov.in portal. For example, as of 1st June 2020, the data reported under mygov.in (not in data.gov.in) contains information on three variables namely (i) total number of persons infected with COVID-19; (ii) COVID-19 infected persons who have been cured/discharged/migrated; and (iii) COVID-19 infected persons who have died. The state-wise distribution of these three variables is available for a given date "T = Today". This data cannot be downloaded. The meta-data for this information is also not available. On the other hand, the data.gov.in only releases daily factsheets in a pdf format summarising this data.

The dissemination of COVID-19-related data by the MoHFW has problems. It gathers detailed COVID-19-related data from the National Centre for Disease Control (NCDC) (surveillance data from the field) and Indian Council of Medical Research (ICMR) (data through the testing laboratory network), which is not reflected in data.gov.in.

The NCDC, under the Integrated Disease Surveillance Project (IDSP), consists of union, state, and district-level units responsible for the surveillance of infectious diseases in India. Although it releases weekly outbreak reports notifying the status of infectious diseases in India, the reports are available only on its website and not integrated on data.gov.in. On the COVID-19 pandemic, the weekly outbreak report dated 10th-16 February, 2020 was the latest available report under IDSP as of 8 June, 2020.

Similarly, ICMR, the designated body under the National Disaster Management Act to coordinate the testing strategy for COVID-19 has been releasing its data through its website and not through data.gov.in. Through its website, ICMR releases information on two parameters, the total number of samples tested for COVID-19 over time, and in the last 24 hours.

Therefore, data.gov.in is not being utilised by the union government agencies for releasing information. Individuals and researchers interested in the government data on the pandemic have to access information available in different silos according to their skills and knowledge. Moreover, none of the information shared is available in a machine-readable or standardised format. This leads to a weak information base on Covid-19 available to the public and to researchers, which hampers the decision making of individuals on the appropriate care that they should take, and hampers policymaking by government organisations for want of data and research.

Data disseminated by state governments

The union agencies are not the only government source on COVID-19 information. We now study the data dissemination protocols for COVID-19 as followed by the states.

We could not find state data on COVID-19 on the data.gov.in website. As a result, the following information was collected through individual COVID-19 portals set up by the states. Table 1 shows that there is heterogeneity in reporting across states. The information shared by the states is classified into three categories; "state-level", "district-level" and "individual-level".

Table 1: State-level reporting parameters for COVID-19 (As of 9 June, 2020)
	Parameters	Delhi	Kerala	Maharashtra	Gujarat	Karnataka	Madhya Pradesh
State-level data	Total COVID-19 confirmed cases	Y	Y	Y	Y	Y	Y
	Active cases	Y	Y	Y	Y	Y	Y
	Total COVID-19 tests conducted	N	Y	N	Y	Y	N
	Hospitalisation status of positive cases	Y	Y	N	N	Only ICU patients	N
	Isolated/quarantined patients	Y	Y	N	Y	Y	N
	Total recovered patients	Y	Y	Y	Y	Y	Y
	Total deaths	Y	Y	Y	Y	Y	Y
District-level data	Number of people under observation	N	Y	N	N	Y	N
District-level data	Number of quarantined/isolated people	N	Y	N	N	Y	N
Individual-level data	Age	N	Y	N	N	N	N
	Gender	N	N	N	N	Y	N
	Comorbidity	N	Y	N	N	N	N

Table 1, placed above, shows the data sharing protocol for COVID-19 in selected states. We may point out a few facts that influence the interpretation of this table:

Data as of 10th June, 2020. Sources: Delhi, Kerala, Maharashtra, Gujarat, Karnataka and Madhya Pradesh.
Maharashtra, Gujarat, and Karnataka share information about the same parameters at the State and District level. The information depicted here is about parameters in addition to the duplicate information.
In the studied states, Gujarat and Delhi inform about the number of patients on ventilators at the state level. However, the information on available hospital beds and ventilators in Delhi is shared under a separate website, https://coronabeds.jantasamvad.org/.
District-level information in Kerala is available for patients hospitalised, symptomatic patients hospitalised, the chronology of positive cases, and hotspots. No other states releases data on these parameters.
Karnataka is the only state which shared anonymised patient data related to their travel history, district, and location of isolation. It also has a dedicated patient case number for individual patients for whom information is shared.
Madhya Pradesh had a dedicated website for individual-level data which was discontinued from 11th May 2020 onwards following the raising of privacy concerns over social media.

We find that in most states, the baseline data includes overall state data about testing rates, persons infected, deaths, and recovery data. However, some states provide additional information such as the number of COVID-19 tests conducted, the number of isolated/quarantined persons, the counts of patients on ventilators, and stable patients. While some states like Maharashtra report data at the district level along with the overall state data, others like Karnataka share information at the individual level. There is a high variation in the type of data shared by the states. For instance, at an individual level, Karnataka reports anonymised information about the demographic details in addition to the baseline data. On the other hand, Madhya Pradesh used to share the name and addresses of the suspected COVID-19 patients to the public while reporting individual-level data. Similarly, Kerala, Maharashtra, and Gujarat report their data at the district level. Kerala reports its surveillance data which is not reported by Maharashtra, and Gujarat. Some states provide daily reports in English, while others do not. For example, Gujarat provides daily reports only in Gujarati.

Most states disseminate data through their COVID-19 websites. However, some resort to reporting through social media. For example, the Maharashtra government website on COVID-19 does not provide information other than that reported in table 1. However, the Maharashtra government has been releasing daily reports providing COVID-19-related information across age, gender, comorbidities amongst other variables through Twitter. While twitter can amplify the transmission of information in a public statistical system, it should not supplant the foundational systems. Data disseminated through a tweet cannot be traced to any government website. Besides, there is inconsistency in the reports shared by the Maharashtra government through twitter. For example, the report dated 22nd April 2020 provides for district-wise distribution of COVID-19 cases in Maharashtra which is not available in the report dated 1st April 2020. The data is a "delete-tweet" away from not being available.

There is also variation in the data sharing format. Most state governments provide data in human-readable formats like pdf. However, some state governments provide some data in machine-readable formats. For example, district-wise data on variables available on dashboard for Gujarat which contains the total number of cases tested for COVID-19, positive cases, patients recovered, people under quarantine, and total deaths can be exported to a csv document. Nevertheless, demographic details of COVID-19 patients or data patients on ventilator/stable, are only available in daily reports in pdf format.

We find that the states do not share their COVID-19 data through the data.gov.in framework. Users have to look for multiple information sources about COVID-19 data to access this data. Within the framework of stand-alone websites providing information, there are two concerns. The first concern is the lack of standardised parameters for information releasing. For instance, few states share the hospitalisation status and the availability of beds which would be useful for the general public in case of emergency. The second concern is the quality of data shared by the states. As discussed, most states share human-readable data and not machine-readable, downloadable data. Meta-data is not available for any state studied making it difficult to interpret. Moreover, the lack of data standardisation makes data non-interoperable. The state-level historical information is unavailable for most states. Therefore, not all data shared by the states is permanent.

Difficulties of CoVID-19 data release seen elsewhere in the world

So far, we have documented variation in what data is being released, and how the same is disseminated, in India. This is a global concern for COVID-19. We map the data reported by selected countries in table 2 below. We find that countries are using two forms of data distribution methods. These are daily updates and dashboards. While daily updates are usually pdf documents, dashboards provide progress of COVID-19 over time. The type of information shared by countries can broadly be classified according to the level of data as "country-level" and "individual-level". Country-level data consists of aggregate information such as the total number of tests conducted, the total number of COVID-19 positive patients, the number of patient hospitalised and deaths, etc. Some countries also share aggregate surveillance data which consists of information about individuals isolated, quarantined, and contact traced. At an individual level, we see a wide variation of data shared by the countries. While India does not provide individual-level data through its Ministry of Health, other countries share demographic information such as age, gender, race/ethnicity, and occupation. A comparison of data disclosed by selected countries is shared in table 2.

Table 2: Country-level data parameters for COVID-19 (As of 10 May, 2020)
Country	Daily updates (DU) or Dashboard (DB)	Total Number of tests conducted	Total Number of COVID-19 +ve patients	Total Number of patients hospitalised	Total Number of deaths	Surveillance data	Individual level data
Country	Daily updates (DU) or Dashboard (DB)	Total Number of tests conducted	Total Number of COVID-19 +ve patients	Total Number of patients hospitalised	Total Number of deaths	Surveillance data	Age	Gender	Race/Ethnicity	Occupation
India	DU and DB	Y	Y	N	Y	N	N	N	N	N
USA	DU and DB	Y	Y	N	Y	Y	Y	N	Y	N
UK	DU and DB	Y	Y	N	Y	N	Y	Y	Y	Y
South Korea	DU and DB	Y	Y	N	Y	Y	Y	N	N	N
Singapore	DU and DB	Y	Y	Y	Y	Y	N	N	N	N
Canada	DB	Y	Y	Y	Y	Y	Y	Y	N	N
Australia	DU and DB	Y	Y	Y	Y	Y	Y	Y	N	N

It can be seen from the above table that most countries report testing data (information about the number of tests conducted), and the number of positive cases and deaths. At the national level, India only reports these minimum consistent variables. Some countries report more variables to the public. For instance, the US, South Korea, Singapore, Canada, and Australia report surveillance data in varying details. A few countries like Canada share their database in a downloadable format. This includes information about quarantined and isolated individuals and details about contact tracing and source of infection. Singapore, Canada, and Australia also report data on the number of cases hospitalised. The UK has recently started reporting information about COVID-19 deaths, disaggregated into deaths inside and outside hospitals. Individual-level data such as age, gender, race/ethnicity, and occupation, is visible in some countries, as is the case in some states (though not the union government) in India. The US releases data about age and race, while the UK releases information about age, gender, race, and occupation. South Korea releases age details for only severe cases and Singapore releases individual-level data only in the event of the death of the individual. Canada releases data about age and pre-existing conditions of the individuals and Australia releases information about age and gender.

Therefore, we find that data release for COVID-19 has issues of lack of standardisation and inter-operability globally. In India, the union and state governments have important deficiencies.

Implications for India

India's existing data infrastructure does not meet the demands of a public health emergency. The implications of this are multifaceted. For example, amid the COVID-19 pandemic, the government had to create a Covid19-warriors dashboard that provides information on doctors, nurses, ASHA workers, and others who could be deployed for immediate response. If data.gov.in had worked well, then the government would have had this information already.

Likewise, the problem of inaccurate databases highlighting data discrepancies in reporting COVID-19 infected persons could have been avoided. An available database infrastructure in data.gov.in would have avoided the need for ICMR to evolve its own data-dissemination method in the middle of the COVID19 pandemic. Besides, the problem of collecting, processing, and releasing COVID-19 data with other databases would have been eased. For example, if the existing data infrastructure had data collection and reporting standards across space like district names with their respective codes, then it would not only be easy to collect the data but also facilitate easier collation with other datasets for enabling interoperability.

Conclusion

In the present article, we highlighted one element of the public health response, the issue of data release by the Indian government authorities for COVID-19. We show that the statistical system for disease surveillance dissemination in India is in a need of reform.

The ODG platform in India, data.gov.in, can play an important role in strengthening India's public health data infrastructure. To realise the utility of public data, a data protocol framework with a legally enforceable mandate on the government is required, as is seen in countries like the US. The principles of standardising, anonymising, interoperability, meta-data release, and grievance redressal in the event of non-release should be in this legal framework.

For the union government, a data.gov.in which utilises the sound principles of OGD release could become a better foundation for data release, and thus improve India's response to an epidemic. State and city governments could choose to use the services of data.gov.in or build their own systems. An indicative list of the essential components of such a portal (as seen in NDSAP and ODG principles) are provided below:

Standardising data release: Standardisation of reported variables such as reporting unit, disease data, language, individual, and community-level data is required. Elements that go towards this include geotagging and coding of hospitals/labs and the adoption of International Classification of Diseases (ICD) for diagnosis and treatment of diseases.
Ensuring privacy: Privacy is a fundamental right in India (Supreme Court of India (2017)). Despite this, states like Madhya Pradesh and Karnataka were seen to be disseminating personally identifiable information of suspected COVID-19 patients. The government would need to adopt various tools at its disposal to protect these rights at an individual and community level. These tools include tagging appropriate data, incorporating principles of Privacy by design (PBD), anonymising and utilising appropriate fiduciary principles (Cavoukian (2011) and Bailey and Goyal (2019)).
Interoperability: Facilitating systems interoperability by incorporating common formats, software standards, and semantic interoperability by incorporating e-governance standards so that the meaning of data is not lost across data silos is required (Wright et al. (2010)).
Adopting an open data scheme: Legislators need to create the frameworks through which the executive is required to release meta-data, and release data in a machine-readable format.
Setting up governance framework: Union, state, and city governments have legitimate authority on how they organise their work, but greater consistency and predictability for API-based access is desirable.

References

Attard et al. (2015): Judie Attard, Fabrizio Orlandi, Simon Scerri, and Sören Auer, A systematic review of open government data initiatives, Government Information Quarterly, 2015.

Chattapadhyay (2013): Sumandro Chattapadhyay, Towards an Expanded and Integrated Open Government Data Agenda for India, IDRC Digital Library

Agarwal (2016): Natasha Agarwal, Open Government Data: An Answer to India's Growth Logjam, SSRN, 16 August, 2016.

Buteau et al. (2015): Sharon Buteau, Aurelie Larquemin and Jyoti Prasad Mukhopadhyay, Open data and applied socio-economic research in india: An overview, IFMR Working Paper, 27 May, 2015.

Supreme Court of India (2017): Justice K.S. Puttaswamy v. Union of India, 2017 (10) SCC 1.

Cavoukian (2011): Ann Cavoukian, Privacy by design: The seven foundational principles, Information and Privacy Commissioner of Ontario, 2011.

Wright et al. (2010): Glover Wright, Pranesh Prakash Sunil Abraham, Nishant Shah, Open government data study: India, The Centre for Internet and Society, 2010.

Bailey and Goyal (2019): Rishab Bailey and Trishee Goyal, Fiduciary relationships as a means to protect privacy: Examining the use of the fiduciary concept in the Draft Personal Data Protection Bill, 2018, Data Governance Network, 2019.

Originally posted here.

Natasha Agarwal is an independent research economist. Harleen Kaur is a researcher at NIPFP. The authors are thankful to Ajay Shah and two anonymous referees for their valuable comments and inputs on the article.

Harleen Kaur

Pages

Wednesday, 8 July 2020

Publication | Information about COVID-19 in India