class: center, middle, inverse, title-slide # The Open Data Lifecycle ## How datasets are opened in the backroom of government administrations? ### Joël Gombin, Datactivist ### 2020-10-08 --- layout: true <style> .remark-slide-number { position: inherit; } .remark-slide-number .progress-bar-container { position: absolute; bottom: 0; height: 4px; display: block; left: 0; right: 0; } .remark-slide-number .progress-bar { height: 100%; background-color: #e95459; } </style> <div class='my-footer'><span>Sciences Po (ODUR) the Open Data Lifecycle</span> <center><div class=logo><img src='https://github.com/datactivist/slides_datactivist/raw/master/inst/rmarkdown/templates/xaringan/resources/img/fond_noir_monochrome.png' width='100px'></center></span></div> --- class: center, middle #### These slides online: http://datactivist.coop/sciencespo_odur/2020/2/ Source: https://github.com/datactivist/sciencespo_odur/2020/2/ Datactivist's work is freely reusable licenced under [Creative Commons 4.0 BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode.fr). <BR> <BR> ![](https://mirrors.creativecommons.org/presskit/buttons/88x31/png/by-sa.png) --- class: inverse, center, middle # The lifecycle of open data --- ### Open Data Pipeline: the main step in opening a dataset .reduite[.pull-left[ ![](pipeline.png) ]] .pull-right[ The Open Data Pipeline is a modelisation of the open data lifecycle in 7 steps based on Samuel Goeta's [research](https://pastel.archives-ouvertes.fr/tel-01458098). Each of the step can hold a major or a minor importance depending on the dataset and the organisational contexts. Sometimes, when opening a dataset, one needs to come back to an earlier step. **Objective**: understand how data is generaly opened by government administrations. ] --- ### The circulation of dada provokes frictions > Friction resists and impedes. At every interface between two surfaces, friction consumes energy, produces heat, and wears down moving parts. […] > Every movement of data across an interface comes at some cost in time, energy, and human attention. Every interface between groups and organizations, as well as between machines, represents a point of resistance where data can be garbled, misinterpreted, or lost. > In social systems, data friction consumes energy and produces turbulence and heat – that is, conflicts, disagreements, and inexact, unruly processes. Edwards, P. N. et al. (2011) [‘Science friction: Data, metadata, and collaboration’](https://journals.sagepub.com/doi/abs/10.1177/0306312711413314), _Social Studies of Science_, 41(5) --- ### Identification .pull-left[ ![](https://datactivist.coop/aaf/img/identifier.png) ] .pull-right[ * Meet the data producers * Explore data released by other organisations * Map an organisation's data ([Data Galaxy](https://www.datagalaxy.com/produit/) for instance) * Document data ] --- ### Data inventory: a longterm process .pull-left[ [![](https://datactivist.coop/aaf/img/iden_etalab.png)](https://www.data.gouv.fr/fr/datasets/recensement-indicatif-des-donnees-publiques-issues-des-services-publics-de-letat/) ] .pull-right[ * The utopy of an exhaustive catalog * An exploratory process * Link an organisation's process with the information systems ] --- ### Approve .pull-left[ ![](https://datactivist.coop/aaf/img/valider.png) ] .pull-right[ * Obtain approval from the hierarchy * Prioritize the opening of datasets * Check the judicial or technical datasets ] --- ### "Good organisational reasons" for not opening a dataset * Data hosted on information systems: explore databases, rebuild database schemas and extract the data * Check if the data can serve harmful users: foresee the risks of reusing data * Improve quality and intelligibility of data which were designed to circulate outside of the organisation * Data is too sensitive to be opened without a mandate from the elected officials or the hierarchy --- ### Edit .pull-left[ ![](https://datactivist.coop/aaf/img/editer.png) ] .pull-right[ _data editing designates the operations with which statisticians transform administrative data into statistical objects (Desrosières 2005)_ * Anonymise the data * Reduce sensitivy of data which could not be opened. * Improve intelligibility of data by explaining acronyms or technical terms * Improve the quality of the data ] --- ### Data quality sprint by la Fing More than 120 checkpoints to verify the quality of data (translated by GLM students!) [![](https://datactivist.coop/aaf/img/sprint.png)](https://infolabs.io/sprint-qualite) --- ### [Dataproofer](http://dataproofer.org/): a tool to automatically control the quality of data ![](https://datactivist.coop/aaf/img/dataproofer.png) --- ### [WTFCSV](https://databasic.io/en/wtfcsv/) to pre-visualise a dataset ![](https://datactivist.coop/aaf/img/wtfcsv.png) --- ### [Workbench](https://workbenchdata.com) to transform data ![](workbench.png) --- ### Standardise .pull-left[ ![](https://datactivist.coop/aaf/img/standardiser.png) ] .pull-right[ * Convert data to an open and machine readable format * Adopt shared standards: GTFS, GBFS, DECP, IATI, OCDS… * Transform the data ] --- ### Transform data into CSV: much more than "save as" .center[.reduite[ [![](https://datactivist.coop/aaf/img/excel1.png)](http://fr.slideshare.net/christophelibertidf/bonnes-pratiquesexcel-cc27juin2013) ]] --- ### Transform data into CSV: much more than "save as" .center[.reduite[ [![](https://datactivist.coop/aaf/img/excel2.png)](http://fr.slideshare.net/christophelibertidf/bonnes-pratiquesexcel-cc27juin2013) ]] --- ### Transform data into CSV: much more than "save as" .center[.reduite[ [![](https://datactivist.coop/aaf/img/excel3.png)](http://fr.slideshare.net/christophelibertidf/bonnes-pratiquesexcel-cc27juin2013) ]] --- ### Transform data into CSV: much more than "save as" .center[.reduite[ [![](https://datactivist.coop/aaf/img/excel4.png)](http://fr.slideshare.net/christophelibertidf/bonnes-pratiquesexcel-cc27juin2013) ]] --- ### Transform data into CSV: much more than "save as" See also [this tutorial](https://infolabs.io/sites/default/files/files/comment_produire_un_fichier_csv_de_qualite_1.1.pdf) --- ### Publish .pull-left[ ![](https://datactivist.coop/aaf/img/publier.png) ] .pull-right[ * Import the data in the portal * Describe the data * Document the metadata ] --- ### [Datasheet for Datasets](https://teamopendata.org/t/traduction-et-adaptation-du-modele-de-description-des-donnees-datasheet-for-datasets/1400): a model to document a dataset .reduite.center[ ![](./datasheet.png) ] --- ### Update .pull-left[ ![](https://datactivist.coop/aaf/img/update.png) ] .pull-right[ * Update manually or automatically a dataset * Archive the data * Consider and implement user feedbacks ] --- ### Keys of success .pull-left[ - obtain high level support - create the organisational channels to open data - ease discoverability and reuse of data - interact with users] .pull-right[ ![](https://datactivist.coop/recife/img/canvas.png) ] --- ### 10 commandements of an open data projects ![](https://blobscdn.gitbook.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-LRRjDnyXFOdz0BEMwzj%2F-LhkGXlPhynQ4paEwIHR%2F-LhkGcdcC2LLAaZp8efD%2FAffiche_opendata-01.png?alt=media&token=2337a1c4-e060-4b9f-82d7-de1e5859f781) --- class: inverse, center, middle ### Let's have a final meeting time? .center[<img src="https://media.giphy.com/media/L2dgNOcnbj2QE/giphy.gif" height="400"/>] Contact : [joel@datactivist.coop](mailto:joel@datactivist.coop)