AI Data Sharing Challenge | ISTF 2019

Ensuring privacy

AI techniques enable the derivation of important new insights from modern-day datasets. While large benefits can be expected from sharing these data, this is not happening today due to organizational and privacy concerns, e.g. in the case of medical data. How could we enable the sharing of data sets while preserving the privacy of sensitive data?

The last decade has seen significant advances in the capabilities of information processing systems to « understand » and make predictions from unstructured forms of data like images, speech and text. For many tasks achieving super-human performance or enabling entirely new sources of insights. This development was due to a combination of advanced algorithms based on self-learning, new and larger data sets and extended computational capabilities.

However, many data sets today exist only in silos and are not publicly available. As a consequence, the insights and social benefit that could be derived from them remain concealed. An example is data from medical studies which in many cases remain disclosed from external analysts even if the research producing it was publicly funded. A major challenge for the sharing of data are privacy concerns. Especially, since modern algorithms might enable the de-anonymization and identification of individuals even if explicit identifiers like name and address have been removed.

Given the enormous potential that applications of modern AI techniques could bring for society when applied to currently non-shared data sets it is an important task to find technical and organizational structures that would enable the sharing and analysis of such data sets while at the same time protecting the privacy of individuals.

 

 

 

Challenger | Dr. Claus Horn, Cognitive Data Scientist, 

Digital and Smart Analytics Group (DSA), Swiss Re

Scientist specialized in converting cutting-edge AI research into profitable business applications. 15 years of experience in developing, implementing and applying machine learning algorithms. Leading researcher in particle physics working at CERN and Stanford University. PhD thesis was among the first applications of machine learning at the petabyte scale inspiring the first machine learning framework used for analyses at CERN. Initiating one of the first data science teams in Switzerland.