Taking Care of Data

A couple of things have had me thinking recently about data management in archaeology.

You might have seen the Atlantic’s recent article on the digital collection, curation, and analysis of archaeological data. The article emphasizes the massive size of datasets that are being collected particularly with digital methods, and it highlights a few points that will be familiar to archaeologists: we work and think at various scales, many of us are invested in new technological approaches to data, and often whatever documentation we can produce and preserve is all that will remain when the original record is destroyed by the process of our research (or by war, terrorism, or climate change). The article cites projects with data points that are apparently in the billions because of digital techniques—but of course our datasets can become unwieldy even with traditional methods once you take into account decades of research at a site or investigate questions across broad geographic areas. This article speaks to both the research potential of massive datasets, and the logistical challenges they can pose at all levels.

I thought of this article during a meeting of a class in “Responsible Conduct of Research and Scholarship.” The class is a new department requirement related to federal research funding, and so the inclusion of data management is no surprise if you consider the increasing attention paid to this component of NSF grant proposals. In the first session we touched on ways to plan for data management early on in a research project, whether that means selecting stable file formats or making informant information anonymous (for those anthropologists who work with the living). This can be especially challenging as a graduate student; many of us are planning the first project that we will be executing independently, and bringing from its earliest stages through to the end. What steps do we need to take to anticipate the management of data that we have yet to collect, and which will likely end up taking a different form than we expect when we first formulate the project?

datasupervision

Very thoroughly supervised excavation at Weeden Island, FL, Dec 2015

So the third thing that brings me to this topic is my own research. Having finished my fieldwork in December, I am now committing most of my time to lab-based sorting and analysis, along with organizing field notes and photographs and databases—and at the same time writing proposals, revising my 3 year life plan every other week, and otherwise trying to stay in touch with the big picture. Trying to balance these drastically different conceptual and practical scales really makes it clear how much effort can go into managing all the details of a project and the data it generates, and how critical it is to do that well in order to transition smoothly to analyzing and synthesizing those results, and then to making them available in a form that could be useful to others.

If I were starting all over tomorrow, I can think of (at least) a few things I would do differently with regards to record keeping and planning for database management. I think some big challenges for graduate students directing research are accurately estimating the scale and volume of data that will result, and developing systems of organization that will continue to make sense if strategies for sampling evolve over different phases of the project. Most of my work in this area has depended not on formal training but on observing the practices of other projects, remembering things that were difficult when I’ve worked with other datasets, and spending hours fiddling around with my tables in Access.

boxes

Boxes of excavated material on their way to becoming data

I have been thinking a lot about revision in writing lately, and perhaps there are some relevant comparisons and contrasts between writing and building databases. A first draft of a written work very often needs to be “re-envisioned” to be improved, perhaps through reworking its structure and reconsidering what information it is meant to convey. Many writers benefit from the feedback of readers as they move through revisions of written work; is this true for “data work” too? I know that each time I have had reason to share some portion of my preliminary dissertation data, it has forced me to refine the organization a bit, to check that my coding and conventions are accessible to another person, and to otherwise revise my database. But the structure of a database can be difficult or impossible to change once a project is really underway, in part because strategies for data collection are usually conceived along with plans for data management.

P1020221

Data collection teamwork at Weeden Island, FL, Dec 2015

As for making my data accessible after I finish my current work, I expect to include many appendices in my dissertation, but also to archive materials digitally with The Digital Archaeological Record (tDAR). I initially looked into the terms and requirements for tDAR to fulfill a requirement—but doing so prompted me to think about how archived data really gets used. I haven’t personally undertaken any serious work with some of the archaeological data that is recently being archived digitally and made accessible (e.g. the Digital Index of North American Archaeology), although I have made use of other relevant types of data available online, like NOAA’s coastal LiDAR. I think finding ways to seek out and incorporate more resources for available data will be a future goal of mine.

Archaeologists are always thinking about long time scales, the durability of materials, and the transmission of knowledge. Even so, there can be some disconnect when it comes to maintaining our own records in a way that will be readily accessible and understandable for future researchers. Graduate students out there, is this something you’re being trained in before delving into your research? What experiences do you have working with more novel forms of data collection, management, or archiving? Looking beyond the data you collect yourself, what ways have you found to work with the data that’s already available in digital archives?

Resources and Links

One comment on “Taking Care of Data

  1. dover1952 says:

    Hi Christina. I am sorry no one else has responded to this post since you put it up on SEAC Underground.

    I am not a graduate student in anthropology, but I did have one very frustrating data-related problem once upon a time in the environmental science field where I work. My field is quite analogous to archaeology because we generate large quantities of field and laboratory data in the course of our work.

    I do not recall all the fine details because this happened so very long ago. However, one of my past companies was writing a huge proposal for doing environmental cleanup work at a military base. Just in case you are not aware of this, private sector companies devote as little money as possible to writing proposals because the time and energy it takes to write a proposal is considered to be a severe financial drag on company overhead. Therefore, if you have a specific task that would take you 60 hours to do on a normal paid work day that is charged to a client contract, the company will often demand that you do that same task (or something very close to it) in only 3 hours. Yikes!!!

    One day I was sitting in my office, and a member of a proposal team walked in out of the clear blue and told me that they needed an estimate of the amount of contaminated soil volume (contaminated by various RCRA metals) that would need to be processed on their proposal site. They went on to say that they wanted me to derive that estimate from another contaminated site far away with similar pollutant constituents that had already been cleaned up. I was given this huge volume of data (both paper and electronic) to be used in working up my estimate. On the surface and intuitively, it looked pretty straight forward and easy to do. However, when I started looking at the data structure closely, it soon became clear that this was not going to be an easy task at all. The persons who had structured and organized the data for the purposes of this older project had done so just to address their specific project problems. They never considered that another person years later would need to use their data in a different way for a very different purpose.

    I quickly figured out what needed to be done and how it could be done most efficiently to get the data into a form we could use for an estimate. Unfortunately, “most efficiently” with this contorted data turned into a slow-grinding, hand-calculator-based mathematical nightmare that took two weeks to finish—and that was going fast as possible and sweating blood to get it done. I got the new data done in just the form we needed for the estimates in our proposal and sent it down to the technical proposal staff. They had no problems with it, or else some technical person would have come upstairs to my office to talk about it.

    One or two weeks later, the Proposal Coordinator (basically a “Glorified Secretary with no technical background) stomped angry as Hell into my office and screamed out: “You took 80 hours to do this data workup!!!!???? You should have been able to do all this in just 3 hours. We write whole proposals for less than 80 hours!!!!” I calmly explained to her why it took that long,and this person just went: “Hmph!! We’ll see about this!!!,” and this person stomped away still angry as Hell. It really pissed me off because I had worked so extremely long and hard to convert all that data, and this person had no clue about what it actually took to do a task like that. I got in no trouble whatsoever with technical management because they evidently understood the effort had been involved and long—and that was a necessity.

    The Proposal Coordinator was laid off not too long after this incident because the company was having work acquisition problems, and I later left voluntarily to take a much better job at much higher pay. About two years later, a really close friend of mine at another company called me up one day, and this former Proposal Coordinator had just applied for a Proposal Coordinator job at his company. He wanted to know if this person would be a good person to hire. I had to be honest with my friend, and I had to tell him the whole, truthful story (as i understood it) about the anger incident I had had with this person. I have no idea whether he ever hired this person or not—never followed up.

    Main Point: Yes, absolutely. The structure and organization you leave your project data in at the end of your project can cause people who come after you to undergo tremendous personal suffering and extra expense if they need to use your data in a different way than you did. It was a wonder I did not have a heart attack trying to convert all of that data at breakneck speed. Our proposal staff had a wholly unrealistic and uninformed idea of how much time it would take to redo the data. People got angry because they had just ASSUMED the original data existed in a form that would be easy for them to use “as is.” One unemployed person may have failed to get a job because of their angry behavior and failure to understand the data conversion task that was needed.to get the data into a new form needed for the proposal. And we did need that data in converted form—no other form would have done the job for the proposal.

Leave a comment