25 June 2019
SAGE Ocean has awarded a Concept Grant to Text Wash, a new software tool that anonymises personally identifiable text data, making it accessible to social scientists without compromising its usability for research.
When it comes to doing research with text data, many datasets are protected through ethics boards’ restrictions (eg interviews, crowdsourced texts) and wider data protection frameworks such as GDPR (eg police reports, patient files). As a result, such unique datasets are rarely shared, so that research using text data often focuses on readily available data at the expense of data that could help answer more pressing research questions.
Where they are shared, current approaches to anonymise these data render the texts unusable for follow-up research. Text Wash enables the anonymisation of text data without compromising its quality. It does this by using natural language processing and machine learning to identify and replace sensitive information while preserving the semantic and grammatical structures in text. Personally identifiable information is determined in close collaboration with data protection officers from the government and the police.
Text Wash is being developed by Bennett Kleinberg, Maximilian Mozes and Toby Davies from the Department of Security and Crime Science at University College London, UK. SAGE’s Concept Grant will enable the team to get the tool off the ground and promote ethical and intelligent data sharing practices. Text Wash will be available as an R-package and as a standalone software for non-technical users. For more information, contact: bennett.kleinberg@ucl.ac.uk
Katie Metzler, Associate Vice President of Product Innovation at SAGE, said, “It is our second year running the Concept Grant programme and, once again, we were overwhelmed by the number, variety and strength of the applications. We were particularly impressed by Text Wash and selected it as the winner based on the importance and prevalence of the challenge it addresses, and its potential for wide-ranging impact.
“Out of 47 applications received this year, 31% were either led-by or included women in their teams – up from 21% in 2018. As part of our commitment to encouraging diversity within computational social science, we would like to encourage more applications from women and diverse applicants in 2020.”
The Concept Grant program is a key part of the SAGE Ocean initiative to enable social scientists to work with big data and new technology. The grants support product innovation within social research, funding early stage software ideas that will help social researchers to engage with new computational methods and analyse data at scale.