A couple of things have had me thinking recently about data management in archaeology.
You might have seen the Atlantic’s recent article on the digital collection, curation, and analysis of archaeological data. The article emphasizes the massive size of datasets that are being collected particularly with digital methods, and it highlights a few points that will be familiar to archaeologists: we work and think at various scales, many of us are invested in new technological approaches to data, and often whatever documentation we can produce and preserve is all that will remain when the original record is destroyed by the process of our research (or by war, terrorism, or climate change). The article cites projects with data points that are apparently in the billions because of digital techniques—but of course our datasets can become unwieldy even with traditional methods once you take into account decades of research at a site or investigate questions across broad geographic areas. This article speaks to both the research potential of massive datasets, and the logistical challenges they can pose at all levels.
I thought of this article during a meeting of a class in “Responsible Conduct of Research and Scholarship.” The class is a new department requirement related to federal research funding, and so the inclusion of data management is no surprise if you consider the increasing attention paid to this component of NSF grant proposals. In the first session we touched on ways to plan for data management early on in a research project, whether that means selecting stable file formats or making informant information anonymous (for those anthropologists who work with the living). This can be especially challenging as a graduate student; many of us are planning the first project that we will be executing independently, and bringing from its earliest stages through to the end. What steps do we need to take to anticipate the management of data that we have yet to collect, and which will likely end up taking a different form than we expect when we first formulate the project?
So the third thing that brings me to this topic is my own research. Having finished my fieldwork in December, I am now committing most of my time to lab-based sorting and analysis, along with organizing field notes and photographs and databases—and at the same time writing proposals, revising my 3 year life plan every other week, and otherwise trying to stay in touch with the big picture. Trying to balance these drastically different conceptual and practical scales really makes it clear how much effort can go into managing all the details of a project and the data it generates, and how critical it is to do that well in order to transition smoothly to analyzing and synthesizing those results, and then to making them available in a form that could be useful to others.
If I were starting all over tomorrow, I can think of (at least) a few things I would do differently with regards to record keeping and planning for database management. I think some big challenges for graduate students directing research are accurately estimating the scale and volume of data that will result, and developing systems of organization that will continue to make sense if strategies for sampling evolve over different phases of the project. Most of my work in this area has depended not on formal training but on observing the practices of other projects, remembering things that were difficult when I’ve worked with other datasets, and spending hours fiddling around with my tables in Access.
I have been thinking a lot about revision in writing lately, and perhaps there are some relevant comparisons and contrasts between writing and building databases. A first draft of a written work very often needs to be “re-envisioned” to be improved, perhaps through reworking its structure and reconsidering what information it is meant to convey. Many writers benefit from the feedback of readers as they move through revisions of written work; is this true for “data work” too? I know that each time I have had reason to share some portion of my preliminary dissertation data, it has forced me to refine the organization a bit, to check that my coding and conventions are accessible to another person, and to otherwise revise my database. But the structure of a database can be difficult or impossible to change once a project is really underway, in part because strategies for data collection are usually conceived along with plans for data management.
As for making my data accessible after I finish my current work, I expect to include many appendices in my dissertation, but also to archive materials digitally with The Digital Archaeological Record (tDAR). I initially looked into the terms and requirements for tDAR to fulfill a requirement—but doing so prompted me to think about how archived data really gets used. I haven’t personally undertaken any serious work with some of the archaeological data that is recently being archived digitally and made accessible (e.g. the Digital Index of North American Archaeology), although I have made use of other relevant types of data available online, like NOAA’s coastal LiDAR. I think finding ways to seek out and incorporate more resources for available data will be a future goal of mine.
Archaeologists are always thinking about long time scales, the durability of materials, and the transmission of knowledge. Even so, there can be some disconnect when it comes to maintaining our own records in a way that will be readily accessible and understandable for future researchers. Graduate students out there, is this something you’re being trained in before delving into your research? What experiences do you have working with more novel forms of data collection, management, or archiving? Looking beyond the data you collect yourself, what ways have you found to work with the data that’s already available in digital archives?
Resources and Links
- Archaeology’s Information Revolution (The Atlantic, March 3 2016)
- tDAR (The Digital Archaeological Record)
- Ten Simple Rules for the Care and Feeding of Scientific Data (by Goodman et al. 2014) – A concise and sort of charming article that I came across it while googling possible silly titles about caring for your data for this blog post.