Mace Ojala is a student of Information Studies and Interactive Media at the School of Information Sciences, University of Tampere, Finland. He worked as an intern at the Ghent Center for Digital Humanities (GhentCDH) in June-August 2015. In this guest blog he explains how he developed a preliminary data model for the project database of women editors and their periodicals.

During the summer months of 2015, I worked with the WeChangEd research team to create the project’s data model in order to facilitate the project’s collection, communication, storage and analysis of data. As an outcome, I produced a draft data model and discussed it with the research team when the holiday season ended in August. The task was an exciting opportunity to get involved in a young and ambitious project, since the data model guides the subsequent work. Furthermore, there is no single correct way to formalize a complex object domain, only more or less feasible suggestions for data models.

Questions like these guided the design process (in no particular order):

  • What is the purpose and value of the research project?
  • What variables are expected to have explanatory power within the WeChangEd methodological framework?
  • What do the data sources look like?
  • What data exists, and what will be reasonably accessible for the research project?
  • What is typical data in the domain of periodical studies and what, and what are the members of WeChangEd accustomed to?
  • Which ‘things’ (domain objects) does the research project want to have a representation for?
  •  Which variable values should or shouldn’t be discrete and do they cover the relevant aspects of the domain?
  •  Which viewpoints should have priority in data collection and in the production of new knowledge?
  •  What epistemological modes of knowing does the project reach for?
  •  What data density frequency profiles can we expect?
  •  Which tools will be used in collection, storage and analysis?
  •  What are the simplest possible solutions?

Mace1

I met with Marianne Van Remoortel and Jasper Schelstraete in early June. As a starting point for data modeling, I had the research project proposals and sample data. The former explain the motivation, expected types of outcomes, and planned methodologies of the research. The latter is an example data sheet, drawing relations primarily among editors, periodicals, countries and time, and secondarily among the above plus organizations, roles in the publication process, inter-personal and inter-periodical relations and other factors.

In addition to hermeneutics and interpretation of texts, the methodology of WeChangEd includes social network analysis (SNA) and spatiotemporal mapping. These impose some requirements on the data, e.g. unambitious geolocation of objects for which location is a relevant attribute, and explicating the meanings of the various relations between entities.

Most of the modeling involved Entity-Relationship (ER) diagrams familiar from relational database design, and UML class diagrams familiar from object oriented programming. As tools, I used pen and paper, Gephi, nodegoat, Dia and LucidChart. Post-it notes were useful lo-fi tools for design meetings.

Mace2

Post-it data model.

Having a professional background in librarianship, I looked at the data partly as a bibliographical issue, drawing further inspiration from prosopographical literature. A classic problem in library cataloguing (or ‘documentation’ in the legacy of Paul Otlet) is ontological: when we pick up a copy of, let’s say Ons syndicaat, a Belgian trade union journal edited by Stephanie Claes-Vetter in the early twentieth century, what is it that we have in our hand? How abstractly do we want to think about the thing we are holding in our hand? The four-level FRBR model tries to address this, but this would perhaps be far too cumbersome for WeChangEd. On the other hand, making all these distinctions between the work, the expression, the manifestation and the item grants us a lot of expressive power between original copies, digitalized versions, translations and so forth. These will be relevant especially for the longer-term ambitions of WeChangEd. Simplicity is (arguably) a virtue, and assuming dense and consistent lower level data (e.g. text articles), some higher abstraction layers (e.g. issues, volumes, periodical) could partly be inferred or be constructed in network analysis. Decisions like these are connected to workflows and other aspects of the research.

Mace3

Nodegoat implementation of the data model.

Although I created the model in such a way that it does not to tie the research project to any particular tool, I built a functional implementation of the model with nodegoat (other researchers at Ghent University Faculty of Arts and Philosophy are using it, providing a community of peers for WeChangEd). After initial setup, it was iteratively modified based on tests with the sample data and feedback collected from the research team. To complement the functional version of the model, I documented the model as an UML chart.

I expect the model to reshape as the research team familiarize themselves with it, and the data collection and analysis work gets started with whatever tools are finally chosen for the work.

This interdisciplinary task drew from cultural studies, information studies, computer science, Habermas, bibliography, network sciences, literary studies and social science. This was a really exciting task, in cooperation with an enthusiastic and skillful team with an interesting research project.

WeChangEd data model

Mace Ojala

(13/10/2015)