class: center, middle, inverse, title-slide # The Open Data Lifecycle ## How datasets are opened in the backroom of government administrations? ### Joël Gombin, Datactivist ### 2020-10-08 --- layout: true <style> .remark-slide-number { position: inherit; } .remark-slide-number .progress-bar-container { position: absolute; bottom: 0; height: 4px; display: block; left: 0; right: 0; } .remark-slide-number .progress-bar { height: 100%; background-color: #e95459; } </style> <div class='my-footer'><span>Sciences Po (ODUR) the Open Data Lifecycle</span> <center><div class=logo><img src='https://github.com/datactivist/slides_datactivist/raw/master/inst/rmarkdown/templates/xaringan/resources/img/fond_noir_monochrome.png' width='100px'></center></span></div> --- class: center, middle #### These slides online: http://datactivist.coop/sciencespo_odur/2020/2/ Source: https://github.com/datactivist/sciencespo_odur/2020/2/ Datactivist's work is freely reusable licenced under [Creative Commons 4.0 BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode.fr). <BR> <BR>  --- class: inverse, center, middle # The lifecycle of open data --- ### Open Data Pipeline: the main step in opening a dataset .reduite[.pull-left[  ]] .pull-right[ The Open Data Pipeline is a modelisation of the open data lifecycle in 7 steps based on Samuel Goeta's [research](https://pastel.archives-ouvertes.fr/tel-01458098). Each of the step can hold a major or a minor importance depending on the dataset and the organisational contexts. Sometimes, when opening a dataset, one needs to come back to an earlier step. **Objective**: understand how data is generaly opened by government administrations. ] --- ### The circulation of dada provokes frictions > Friction resists and impedes. At every interface between two surfaces, friction consumes energy, produces heat, and wears down moving parts. […] > Every movement of data across an interface comes at some cost in time, energy, and human attention. Every interface between groups and organizations, as well as between machines, represents a point of resistance where data can be garbled, misinterpreted, or lost. > In social systems, data friction consumes energy and produces turbulence and heat – that is, conflicts, disagreements, and inexact, unruly processes. Edwards, P. N. et al. (2011) [‘Science friction: Data, metadata, and collaboration’](https://journals.sagepub.com/doi/abs/10.1177/0306312711413314), _Social Studies of Science_, 41(5) --- ### Identification .pull-left[  ] .pull-right[ * Meet the data producers * Explore data released by other organisations * Map an organisation's data ([Data Galaxy](https://www.datagalaxy.com/produit/) for instance) * Document data ] --- ### Data inventory: a longterm process .pull-left[ [](https://www.data.gouv.fr/fr/datasets/recensement-indicatif-des-donnees-publiques-issues-des-services-publics-de-letat/) ] .pull-right[ * The utopy of an exhaustive catalog * An exploratory process * Link an organisation's process with the information systems ] --- ### Approve .pull-left[  ] .pull-right[ * Obtain approval from the hierarchy * Prioritize the opening of datasets * Check the judicial or technical datasets ] --- ### "Good organisational reasons" for not opening a dataset * Data hosted on information systems: explore databases, rebuild database schemas and extract the data * Check if the data can serve harmful users: foresee the risks of reusing data * Improve quality and intelligibility of data which were designed to circulate outside of the organisation * Data is too sensitive to be opened without a mandate from the elected officials or the hierarchy --- ### Edit .pull-left[  ] .pull-right[ _data editing designates the operations with which statisticians transform administrative data into statistical objects (Desrosières 2005)_ * Anonymise the data * Reduce sensitivy of data which could not be opened. * Improve intelligibility of data by explaining acronyms or technical terms * Improve the quality of the data ] --- ### Data quality sprint by la Fing More than 120 checkpoints to verify the quality of data (translated by GLM students!) [](https://infolabs.io/sprint-qualite) --- ### [Dataproofer](http://dataproofer.org/): a tool to automatically control the quality of data  --- ### [WTFCSV](https://databasic.io/en/wtfcsv/) to pre-visualise a dataset  --- ### [Workbench](https://workbenchdata.com) to transform data  --- ### Standardise .pull-left[  ] .pull-right[ * Convert data to an open and machine readable format * Adopt shared standards: GTFS, GBFS, DECP, IATI, OCDS… * Transform the data ] --- ### Transform data into CSV: much more than "save as" .center[.reduite[ [](http://fr.slideshare.net/christophelibertidf/bonnes-pratiquesexcel-cc27juin2013) ]] --- ### Transform data into CSV: much more than "save as" .center[.reduite[ [](http://fr.slideshare.net/christophelibertidf/bonnes-pratiquesexcel-cc27juin2013) ]] --- ### Transform data into CSV: much more than "save as" .center[.reduite[ [](http://fr.slideshare.net/christophelibertidf/bonnes-pratiquesexcel-cc27juin2013) ]] --- ### Transform data into CSV: much more than "save as" .center[.reduite[ [](http://fr.slideshare.net/christophelibertidf/bonnes-pratiquesexcel-cc27juin2013) ]] --- ### Transform data into CSV: much more than "save as" See also [this tutorial](https://infolabs.io/sites/default/files/files/comment_produire_un_fichier_csv_de_qualite_1.1.pdf) --- ### Publish .pull-left[  ] .pull-right[ * Import the data in the portal * Describe the data * Document the metadata ] --- ### [Datasheet for Datasets](https://teamopendata.org/t/traduction-et-adaptation-du-modele-de-description-des-donnees-datasheet-for-datasets/1400): a model to document a dataset .reduite.center[  ] --- ### Update .pull-left[  ] .pull-right[ * Update manually or automatically a dataset * Archive the data * Consider and implement user feedbacks ] --- ### Keys of success .pull-left[ - obtain high level support - create the organisational channels to open data - ease discoverability and reuse of data - interact with users] .pull-right[  ] --- ### 10 commandements of an open data projects  --- class: inverse, center, middle ### Let's have a final meeting time? .center[<img src="https://media.giphy.com/media/L2dgNOcnbj2QE/giphy.gif" height="400"/>] Contact : [joel@datactivist.coop](mailto:joel@datactivist.coop)