Affinity mapping for data.tricoteuses.fr

dachary · Décembre 11, 2019, 7:47

In the context of the user research for data.tricoteuses.fr @eudmo2 @dachary and @eraviart used the transcript of the five intercept interview of developers working with data sets in a session of affinity mapping.

Methodology

A. Generating the sticky notes . In this step, team members write down ideas by reading the interview scripts on separate sticky notes. They should write down ideas exactly was found in the interview and refrain from interpreting any of them. They should not skip an idea they find unimportant. They should not ideas that came to mind but that are not in the interview. The sticky notes are not sorted, they are in chronological order, in a group that has the interview number to identify them.

A different color is used for each interview.
The About you part of the interview gives context but usually does not contain ideas

B. Organizing the notes in groups . After the the ideation session, the team has a workshop devoted to analyzing the notes by:

sorting them into categories: each participant takes the notes of someone else and group them into categories (as many as they want). Each category is identified by a sticky note.
prioritizing each note: when everybody is finished sorting, all participants get together and merge all categories. The categories of a participant is chosen first and they get to explain them to the other participants who debate on their relevance and suggest improvements. When this is done, another participant takes their categories and does the same, merging them with the previous session results. And so on.

About the affinity session

Goal: identifying emerging themes related to the usage or the making of a data set repository by developers

dachary · Décembre 11, 2019, 11:46

Here is the outcome. The categories are grouped in the same column if they are related.

Column 1

Temporalité

Il y a deux types de travail: import one shot, import régulier

Historisation

J’historise les données que je reçois (x2)
Pour savoir si la donnée change, je la charge et je la compare

Column 2

Sources

Je récupère les fichiers depuis un disque partagé
Je cherche des dépôts de données via des moteurs de recherche et pas des catalogues
Je cherche la source de la donnée la plus fraiche, upstream
Un fichier tableur généré manuellement c’est pas bon signe
La granularité de la donnée est un choix (exemple INSEE et 137,000 jeux de données)
Un jeu de données est un ensemble cohérent (x2)
Datasets of metadata

Faciliter la publication

Certains acteurs sont réticents à publier de la donnée

Column 3

Liens entre données

INSEE, le changement de code est une information qui n’est pas dans la documentation mais qui est utile (x32)
Socle commun des données locales

Geo Localisation

OpenStreetMap
J’ai besoin des contours géographiques des circonscriptions

Column 4

Documentation

Manque d’explications métier
Les données INSEE et leur caractéristiques devraient faire partie de la documentation
Quand je découvre un jeu de données, je lis sa description puis j’observe les données (x3)
Je lis la documentation quand je ne comprends pas les données
Il n’y a pas de release notes
La documentation est importante
Je déduis la documentation à partir du contenu des données

Compréhension

Difficulté de comprendre la logique de jeux de données faits par d’autres
La donnée demande souvent un niveau expert pour être comprise

Column 5

Visualisation

Données en graphe plutôt que linéaires
Affiche des graphiques
Rendre l’implicite explicite
UI for people without technical skills
User stories to determine what the UI should look like

Column 6

Protocoles et formats

Avoir des données utilisables dans un tableur (x2)
Conversion vers d’autres formats (x2)
Je récupère quotidiennement des CSV (x2)
Définition d’un jeu de données (vs base de données par exemple)
Involving the data producer when defining a standard

Column 7

API

Je fais des serveurs d’API à partir des données (x3)

Column 8

Recherche

Recherche dans les données avec un moteur comme elasticsearch

Column 9

Mode de travail

Essayer d’en vivre
Travail intermitent
Je travaille pour un collectif
C’est mon hobby

Communauté

Discussion avec d’autres devs pour comprendre la structure des données
Créer un écosystème

Column 10

Automatisation de la mise à jour d’un dépôt

Mise a disposition manuelle / automatique
Je travaille régulièrement à rendre l’import des données plus robuste
La mise à jour des données est quotidienne
Quand le format des données change, je corrige manuellement
Les méthodes de mise à jour dépendent des dépôts de données
Dataflow and process

Infrastructure

Nécessité d’une infrastructure de calcul

Data and software should be published together

Software projects are published on GitHub

Column 11

Nettoyage

Amélioration de données structurées (de la DILA)
Je fais des logiciels qui nettoient les données
Data validation (x2)
Je complète les données pour mes besoins, pas pour les autres
Working with format standards

Column 12

Licence

Smart data = données pas libre
Données ouverte: on peut les modifier, les utiliser, les redistribuer
Les données ouvertes c’est des données publiques

Dépôt libre

Il n’existe pas de dépôt de données uniquement libres (par exemple comme les distributions logicielles)

dachary · Décembre 11, 2019, 12:19

After discussion on the themes, the following were identified as the most important, in that order:

Documentation
Formats
Cleanup
Visualisation
Modification history