This blog is part of SIID in the Spotlight #SIIDspotlight
Chris Foster is a lecturer in ICT and innovation in the Information School. This work was funded by SIID, University of Sheffield and the Global Development Institute, University of Manchester and with the support of Centre for Internet and Society, India.
In 2015, many around the world celebrated the agreement of the Sustainable Development Goals (SDGs) and a new agenda for transformative development by 2030. But, practitioners and policy makers were left scratching their heads as to how they were going to monitor the detailed 169 targets and ever more numerous indicators, never mind understanding and achieving these goals.
It is in this context that we’re seeing a growth of interest in using more data in development, and notably large and complex “big data” to help solve development problems. Indeed, we can say that the infrastructures now being built to support big data are likely to become central to how we make development decisions in the future.
How will such data infrastructures shape our thinking about development over the next decade? What types of limitations and biases might they embed? How should they best be designed and implemented? It is these questions that we looked to explore in a recent project exploring big data use in India.
In this work we particularly dug into three cases where big data was being used to support wider development (over commercial goals) – SmartBus, a big data urban transport system; SmartElec, a state initiative using big data for improving electricity systems; and SmartEdu, a state-IT partnership focussing on the education sector (names of projects have been changed).
Digging into big data
Digging into these cases, we found that each of these initiatives were connected into longer, often decades-old histories of data collection and decision making. This meant that new data innovations were being introduced in an attempt to understand long running development problems. Thus, the main focus of SmartBus was on using vehicle tracking and big data innovations to improve the notoriously unreliable city bus services, the SmartEdu initiative was looking to use machine learning to predict risks of school dropouts, with the goal of reducing this key problem in the state.
We found that big data innovation allowed improved integration of rich information flows, and led to centralisation of decision making. In SmartElec, previously manually collected meter data was now digitally collected and aggregated (see images below). The supporting infrastructure allowed a near real-time analysis of the status of the electricity network, and was more effective monitoring around failures and blackouts. A new central data centre played a growing role in processing and analysing this data. In SmartBus, new bus transportation data was aggregated and fed in real-time to large screens in a “control centre” where activity was monitored by administrators.
Digitalisation in SmartElec: Meters such as those on the left supply real time data about the network usage. Even manual meter reading data is now often transferred through an automated reading devices (right) to later be input into the system
Beyond day-to-day monitoring, we also saw signs that the new data was feeding into more strategic decisions. In the electricity sector, upgrades have been plagued by poor and politicised decision making, but the state-wide data from SmartElec is now being used in upgrading decisions. Similarly, SmartEdu data has enabled state school administrators the ability to analyse education data across the state, and led to some valuable insights that could inform educational activity.
More conceptually, there is evidence that these initiatives are playing a role in supporting new forms of state commitments, or citizen interaction. SmartBus has been associated with a ‘Smart City’ initiative and of citizens interacting with a set of efficient urban services. Indeed, SmartBus introduced a citizen mobile app for tracking bus routes which has had over 50,000 downloads. In the SmartElec initiative, state political visions about “24/7 electricity” have in part to emerged from the better data that allows improved management of the electricity system.
Whilst big data has led to these operational, strategic and visionary advances, there were a number of concerns in these projects. One key concern raised was the quality of data being used in these projects, which was often incomplete, short-term, or skewed.
Most problematic was that data from marginal groups was difficult to obtain, so in SmartElec automated electricity data was mainly coming from cities, where rural data was still manually collected. In SmartEdu, the partner IT firm had to employ data entry teams to enter rural education records, and were other processes of “data wrangling” before the machine learning system could be run.
These data limitations pose questions of how representative the data being used is of the population. If certain measures are skewed towards those more affluent, data coming from those more marginal might then be seen as “nonconforming” or even deviant. Moreover, the way that the data is selected, measured and transformed in such systems will be important in determining what processes are made visible by data and what might remain in the shadows.
The Smart Cities Challenge: Such visions can be seen to be made viable by the growth of big data. However in reality big data projects often tend to have a narrower focus
There were also more general questions about the focus of big data projects. These projects were marketed and discussed under lofty development goals, but in implementation they were often quite narrow projects. SmartBus, for all its discussion of smart cities and citizens, was far more focussed on stamping out corruption amongst bus employees than making the city’s public transport smart. SmartEdu has mainly centred on regional administration so far. The school dropout prediction data has yet to be provided to those who would likely have the strongest developmental impact such as local school teachers or heads.
Further, in all these projects there is scant sharing of the new data produced. These projects have not been about the public shining a light on opaque mechanisms of decision making. In fact, with a growing number of public and private actors involved, mechanisms of decision making are becoming even less transparent.
Big data for development
It is often said that “data is the new oil”, a vast untapped resource which we are now beginning to see the value of. If data is the new oil it has huge potential gains but also huge risks. This work has begun to shine light on those gains and risks in terms of development.
Big data projects are in their infancy in countries like India, but as these cases show they are becoming important to support decision making on key development issues, not only at an operational level, but in strategic decision making and in supporting new visions of developmental partnerships between citizens, private sector and the state.
However, these initiatives rarely follow the vision of big data driving transformative changes. They so-far tend to use problematic data to enhance decision making. They also tend to focus on quite narrow aspects of problems in implementation over the bigger development problems that might be more impactful.
We also need to make sure that big data doesn’t solely lead to technocratic solutions, or underplay the importance of integrating with a wider set of social and political activities for development – data on school drop outs won’t help us understand all the social problems behind school absenteeism, data showing electricity pilferage will have limited impact without solving the complexities of local politics of electricity in rural and slum areas.