Participants in Ann Arbor drew attention to particular challenges around Big Data and the workshop benefitted from input from another workshop which focused specifically on Data Ethics.
Among the issues raised were:
ICT4D/ICTD projects often create data traces and metadata (e.g. call records, photo metadata including location). Some of this data may be collected by default without the researchers or practitioners being aware of it. How should this data be handled and under which circumstances should it be a) collected, b) retained, c) analysed, d) combined with other (meta-)data?
Automated data logging is an important tool for tracking software bugs and errors and improving usability. Should we set limits on automated data logging in applications for ICT4D?
How do we protect data against misuse by other actors (e.g. oppressive regimes, commercial interests, fraudsters) taking advantage of data gathered for ICTD/ICT4D work?
What kind of ethical challenges arise with geo-located data? What demands should be set to ensure data sets are truly anonymized?
How can participants retain their right to informed consent when large amounts of complex data is being collected?
Big Data often develops its power when datasets are combined and linked. This means the data is no longer used for the original stated purpose that the participant gave their consent to. What form of consent should be gathered for secondary uses of data? How would this work in practice?
Some have argued that development research should be as non-intrusive as possible and data should only be collected on things relevant to the enquiry. Big Data works from a different logic – one of gathering a maximum of data first and then seeing patterns emerge. This means some less relevant/non-relevant data will be gathered and stored, and then possibly analysed later. How do we find the right balance between the call not to over-examine people and on the other hand remain open to unexpected findings based on novel big data combinations?