Science & Research
December 2016 | Volume 22, Number 3
by CARLY CAMPBELL
Digital archives are an increasingly important historical aspect of data management and science. Researchers, and the agencies that support them, are experiencing an increased demand for archiving of both new and older data. Emphasis is being placed on publication of collected data alongside peer reviewed reports. Traditionally, research findings have been distributed through presentations, trainings, and publications, in which data are available in analyzed and summary form. However, further management of raw data is now encouraged, and increasingly required both for funding and publication from a variety of journals and to meet federal standards.
The Aldo Leopold Wilderness Research Institute (ALWRI) has, like many research organizations, been under pressure to manage its data in a more open access environment. In May 2013, Executive Order 13462, and the associated memorandum, mandated that US federal agencies “collect or create information in a way that supports downstream information processing … requirements include open formatting, usable metadata, data standards, and machine readability.” This directive comes at the end of a long deliberation by the Office for Science and Technology (Goben et al. 2013). It is just one piece in a larger conversation about the role of data and digitization in the modern world. In 2010, the National Science Foundation (NSF) began requiring all grant applicants to include data managements plans (Hernandez et al. 2012). Early 2015 saw the latest in a series of meetings on the NSF Public Access Plan (Silverthorne 2015), as well as a new set of guidelines promoting public access for data related to all journal publications from the American Meteorological Society (Mayernik et al. 2015). Similar announcements have been made, or are under discussion, across many disciplines. For ALWRI, mounting requirements are only part of the drive to develop a comprehensive data archive.
A digital data archive is a resource. The more accessible data are, the more added value can be realized by current and future analysis possibilities. For wilderness social science, access to data is vital for understanding the changing dynamic between wilder-ness and wilderness visitors. Moreover, the methodologies used in wilderness data collection can serve as a model to researchers and organizations, drawing upon a database for precedent. An archive, and an internal catalog, is useful for any agency to understand its own progression through time. ALWRI, with protection responsibility of nearly 50 years of wilderness science (1967 to 2016), currently has only limited access to that history until archives are complete.
Widespread Issues and Possible Solutions
The call for digital raw data is the result of a global debate about the role of scholarly communities in a digital-technology world. As various disciplines and communities explore methods of archiving long-term data sets, conflicts have emerged over the methodology, and legalities, of the archiving process. There are several main issues of concern: primarily, on whom does the responsibility fall to create the archive? How does a large online database preserve the context, and intention, of a study? Most importantly for individual organizations, what is the financial cost of putting together a database and an archiving team?
There are some implicit challenges to creating a data archive. An accessible database requires a search-able infrastructure. Commonly, data is structured and described through metadata – a data overview that gives information about the what, when, where, and why of the original set. Metadata creation also needs consistency and readability across changing formats, both digitally and in analog form. Who takes responsibility for sharing data and collecting information in a suitable format? The major candidates are the scientist and the archivist. One argument is that the scientist, who knows the study best, should be responsible for managing his or her own data (Wilson 2010). The opposing side argues that it should be handed over to a trained “data archivist” who knows formatting standards. The most common resolution to these concerns comes from two places: success of archiving teams composed of both scientists and trained historians, and the increasing encouragement for scientists to be trained to archive their data as part of the scientific process.
The question of responsibility is largely driven by concern for context. Although summarized publication data is no longer considered adequate for extending the use of research, providing raw data on its own removes the context in which it was created. Beyond the data itself, a traditional analog archive (i.e., hard copy files) generally includes all the components of a project, with researcher correspondence, study plans, measurement instruments, progress reports, budget sheets, and final publications. The danger of the digital archive and open-access data sets is the lack of meaning for sheets of numbers by themselves (Klein et al. 2014).
An Internet-based archive has the ability to bring together “large and dispersed collections of material” (Monks-Leeson 2011, p. 39). Data are organized outside of their original meaning, within pools of similar projects. How does one determine what is valuable? Economists writing for the Canadian Journal of Economics call for data to be coded and archived before submission, as part of the scientific process, especially as the logistics of the information is best understood with the expertise of the one who gathered it (McCullough et al. 2008). However, the data has to be understandable, and most importantly, replicable, to have valid meaning: “The ability to replicate a study is typically the gold standard by which reliability of scientific claims are judged” (National Research Council 2002).
In the realm of wilderness social science, this is further complicated by the necessity of combining quantitative data with qualitative research. Many studies that have come out of ALWRI have been concerned not just with the state of the wilderness (such as visitor perceptions of wear and tear and number of wildlife encounters) but also with the experiences and opinions of the wilderness visitors. Studies have investigated concepts such as dimensions of experiences, threats to solitude, and interpersonal conflict. Archiving in any form encounters the problem of context, but qualitative work especially has “special relationships – between the researcher and his or her data, research participants, industry partners and research collaborators – [which] could not easily … be transferred into an archive” (Broom, Cheshire, and Emmison 2009, p. 116). For an organization such as ALWRI, neither summary publications nor raw data adequately reconcile this problem. The resulting argument is for the use of metadata in a digital format, and an internal catalog.
The financial investment into data archiving is a consideration of any institution and the managers making decisions. Scientists must be trained to archive, or archivists hired to revisit research. A database must be created, or found, and then maintained. Depending on the nature of the data, an enormous amount of time and effort may be required to scan or code documents. In 2013, more than US$30 million was paid toward the National Archives Record System through the Forest Service Greenbook assessment (USDA Forest Service 2016). Although cost varies significantly from project to project, an archive can be an expense.
However, development of searchable archived databases lead to data reuse, which can be an extension of the value of research. A study done in 2011 sought to quantify the touted benefits of data sharing. It found that a project originally funded in 2007 by the National Science Foundation resulted in 16 direct paper publications. They searched PubMed for the datasets from that 2007 project, which had been submitted to the Gene Expression Omnibus repository. They found that the data contributed indirectly to more than 1,250 published articles (Piwowar, Vision, and Whitlock 2011). Managers of the soil archives at the Northern Great Plains Research Laboratory expressed a similar sentiment in that “there are numerous opportunities for research using the … soil archives; opportunities that on-site personnel realize will only be brought to fruition through collaborative efforts with other researchers” (Liebig, Wikenheiser, and Nichols 2008, p. 977). Cooperative teamwork within and among research organizations should be part scientific and part economic consideration. A digital database that extends the use of a funded study is a positive investment.
Wilderness Social Science Research
The Aldo Leopold Wilderness Research Institute is a federal research group dedicated to the improvement of stewardship in wilderness and similarly protected areas. This collab-oration connects an interdisciplinary and interagency team of scientists and is the center for research on the role of wilderness in larger social and ecological systems, evaluation of monitoring and management tools, and research on public attitudes toward restoration and intervention.
The delivery of knowledge to wilderness managers and other scientists has always been a long-term goal of ALWRI, and in recent years, it has taken up the call for data dissemination by examining its own historic files. One additional pressure is the aging of the researchers originally connected to studies, who have moved on to other positions, projects, or retirement. This is a classic dilemma of archiving, in which the context and knowledge behind data is in danger of being lost with the researcher who produced it (Rausher et al. 2010). However, Alan Watson has been instrumental in bridging that gap.
Alan Watson has been an active staff researcher since before the Leopold Institute was officially founded out of a Forest Service Work Unit (in 1993). Before 1988, Watson was an academic researcher working on several Forest Service–sponsored wilderness research projects. He is one of the founding executive editors of the International Journal of Wilderness and has represented ALWRI in five Fulbright appointments (Fin-land, Russia [twice], Brazil, and the Republic of China) as well as on the Executive Committee of the World Wilderness Congress. His contribution to wilderness science, the wilderness stewardship community, and to ALWRI, is vast and irreplaceable. He has provided leadership in more than 40 research projects, and has collaborated with many academic scientists on projects, each resulting in publications and data sets of long-term value.
Equally important is Watson’s personal knowledge of each study: the reasons behind them and the linkages between them, the outcomes of each project, and the decision making involved. Expertise in various methodologies is important, as many of the studies overseen by ALWRI do not strictly involve quantitative, statistical results. Rather, there has been a dual approach of quantitative surveying and qualitative interview processes, which produce and inform complex decisions.
While most research reports include summaries of methods employed, often the descriptions in publications are insufficient to allow full replication or even full understanding of data transformations and coding (Corti 2012). Especially for US federal research agencies, the last decade has seen short funding for new opportunities. It is a critical time to gather and manage what has been produced up to this point, before it is lost between the digital process and physical realities of declining budgets and personnel.
Collaboration between scientists of various disciplines and experience is vital for a data archive that is as comprehensive as possible. Social science researchers such as Alan Watson transfer their knowledge of complex study methodology into context for data that would otherwise be lacking.
As research institutions put forward the effort to archive, the need for standard formatting and procedure arises. Different journals, libraries, and organizations have different standards; for the US government, common requirements have existed for some time.
In 1990, the Office of Manage-ment and Budget established the National Spatial Data Infrastructure Committee (NSDI) (Federal Geographic Data Committee 2015). The NSDI is a line of supervising committees that oversee nationwide interagency publishing of federal data. In 1995, the Federal Geographic Data Committee (FGDC) published a mandatory standard that dictates consistent formatting and terminology for metadata. ALWRI and the US Forest Service are both guided by the NDSI and the FGDC Standard throughout the archiving process.
ALWRI, the Rocky Mountain Research Station, and the US Forest Service utilize the software Metavist, an R&D program that assists in the creation of metadata. The result is “data about data.” Metadata are used to answer such questions as what data were collected, how they were collected, why they were collected, how reliable they are, and what issues should be accounted for when working with them” (USDA Forest Service 2015). The metadata produced by ALWRI follow the Biological Data Profile developed by FGDC in 1998, as well as profile category standards such as ISO 19115 and the National Research and Development Taxonomy. These standards are the necessary system in creating accessible and understandable data.
A dynamic team has been working with the ALWRI to create metadata for a digital archive. It is a slow process, as each study’s methods, coding, and survey data reflect a unique team of scientists, as well as the research interests of a particular time and place. While meticulously going through old folders and study files, the archive team found several projects that stood out as resources for researchers and the agency.
Case Study I
One complicated metadata was a 1991 study done in the Alpine Lakes Wilderness. Focused on themes of visitor solitude and encounter rates, it was teamed with a biophysical assessment of visitor impacts and attitudes toward restoration (Watson et al. 1998). It was a study that explored methods of observation and data gathering, the effort and range of which were not fully described in the final publication. The researchers used nine different methods of measurement, which produced nine different data sets and coding manuals. Data were gathered by different groups of people, including wilderness technicians. One process involved “trained observers” systematically observing selected groups, in which a researcher would hike at approximately the same speed as a visitor group, and then stop at destination points to test real-time observations of social conditions and travel patterns. Ranger observations and exit surveys were also employed to learn about travel patterns and social conditions encountered on wilderness visits.
The Alpine Lakes project was a test of methodologies. Its value extends beyond the results in publications or even the varying success of each data set. For managers, and scientists struggling through form approval or study design, a project such as Alpine Lakes provides detailed descriptions of several monitoring methods as well as comparative results for each method. Each study and data set that is archived should be regarded as a historic resource in this way.
Case Study II
Some studies have been replicated over time, relying on older data for trend analysis. Periodically, ALWRI has conducted or funded studies with at least a partial purpose of replicating comparable data from earlier studies. For instance, nine wilderness and wilderness study areas were studied very early by Lucas (Lucas 1980) specifically to establish baselines in visitor characteristics and attitudes at the following locations: the Desolation, Bob Marshall, Cabinet Mountains, Selway-Bitterroot, Mission Mountains, Great Bear, Scapegoat, Spanish Peaks, and the Jewel Basin Hiking Area. However, looking back on the studies that served as a baseline, it was discovered that there were missing and incomplete data sets. For example, The Selway-Bitterroot Wilderness base-line data set was not included in data files that had been maintained across computer systems.
While putting together the historic catalog, the missing data were found in the form of original surveys piled in boxes in the closet. After sorting through the boxes, and attempting to match the 1971 coding manuals to the surveys, the discovery took a remarkable turn. Alongside the 45-year-old surveys that had once been stored away were questionnaires that had no associated publications or research summaries. The original researchers had done a special sample of a primitive area in Idaho called Salmon-River Breaks, which were not included in the baseline publication. In previous compilations of data sets from the time period, there was no record of these data existing. Additionally, the Salmon-River Breaks primitive area itself no longer exists, since the 1980 Central Idaho Wilderness Act combined several smaller areas into the Frank Church-River of No Return Wilderness (Wilderness.net 2015). The process of archiving has therefore restored the full set of baseline studies and revealed an entirely untapped historical data set, as well as generated the rescue of paper-based data at risk of being lost.
The case of the baseline studies seems particularly telling as they were intended for determining changes in wilderness use and users over time. For new analysis to be meaningful in the modern context, it is critical for researchers to have full access not just to the data from last year, or coming years, but also from 45 years ago.
Wilderness social science and conservation research involves careful observation of the natural world and the way humans interact with it. The value of research data only grows over time, when considering environmental impacts and shifting societal values. Effective wilderness stewardship demands an understanding of the consequences of management decisions. Every manager’s knowledge and skill base can be increased by access to an archive of studies dealing with wilderness science.
The Aldo Leopold Wilderness Research Institute as a unique organization is committed to the creation of a comprehensive archive and file catalog. For scientific research, the process requires scientists, archivists, and the support of management. It is a critical moment to bridge digital processes and ensure data are preserved and accessible. For any organization, archiving is both an investment and a resource. In the realm of wilderness science, access to raw data and meta-data is vital for understanding the changing dynamic of wilderness, the environment, and people. The searchable database for wilderness research can be found at http://www.fs.usda.gov/rds/archive/.
CARLY CAMPBELL is a recent history graduate from the University of Montana, where she also worked for the Aldo Leopold Wilderness Research Institute on a team with Chris Armatas to diligently archive nearly 50 years of social science data sets. Their efforts earned them an award from the chief of the Forest Service for their contribution to application of wilderness research to applied stewardship problems. Carly’s minor is in studio arts; she has several ink prints in the university and Zootown Arts Center archives, as well as a print spread in the UM literary magazine, The OVAL; email: firstname.lastname@example.org.
Broom, A., L. Cheshire, and M. Emmison. 2009. Qualitative researchers’ under-standings of their practice and the implications for data archiving and sharing. Sociology 43(6): 1163–1180. DOI: 10.1177/0038038509345704.
Corti, L. 2012. Recent developments in archiving social research. International Journal of Social Research Methodology 15(4): 281–290. DOI: 10.1080/13645579.2012.688310.
Federal Geographic Data Committee. 2015 (last updated April 20). Home [Homepage of FGDC] [Online]. Available at http://www. fgdc.gov/ (accessed April 23, 2015).
Goben, A., D. Salo, and C. Stewart. 2013. Federal research. College & Research Libraries News 74(8): 421–425.
Hernandez, R. R., M. S. Mayernik, and M. L. Murphy-Mariscal. 2012. Advanced technologies and data management practices in environmental science. BioScience 62: 1067–1076. DOI: 10.1525/bio.2012.62.12.8.
Klein, M., H. Van de Sompei, and R. Sanderson. 2014. Scholarly context not found: One in five articles suffers from reference rot. PLOS ONE 9(12): e115253. DOI: 10.1371/journal.pone.0115253.
Liebig, M. A., D. J. Wikenheiser, and K. A. Nichols. 2008. Opportunities to utilize the USDA-ARS Northern Great Plains Research Laboratory soil sample archive. Soil Science Society of America Journal 72(4): 975–977. DOI: 10.2136/ssaj2007.0324N.
Lucas, R. C. 1980. Use patterns and visitor characteristics, attitudes, and preferences in nine wilderness and other roadless areas. Research Paper INT-253. Ogden, UT: USDA Forest Service, Intermountain Forest and Range Experiment Station.
Mayernik, M. S. M. K. Ramamurthy, and R. M Rauber. 2015. Data archiving and citation within AMS journals. Weather & Forecasting 30(2): 253–254. DOI: 10.175/2015WAF2222.1.
McCullough, B. D., K. A. McGary, and T. D. Harrison. 2008. Do economics journal archives promote replicable research? The Canadian Journal of Economics 41(4): 1406–1420.
Monks-Leeson, E. 2011. Archives on the Internet: Representing contexts and provenance from repository to website. The American Archivist 74(1): 38–57.
National Research Council, Science, Technology, and Law Panel. 2002. Access to research data in the 21st century: An ongoing dialogue among interested parties. Washington, DC: National Academy Press.
Piwowar, H. A., T. Vision, and M. C. Whitlock. 2011. Data archiving is a good investment. Nature 473(7347): 285. DOI: 10.1038/473285a.
Piwowar, H. A., M. J. Becich, H. Bilofsky, and R. S. Crowley. 2008. Towards a data sharing culture: Recommendations for leadership from academic health centers. PLOS Medicine 5(9): e183. DOI: 10.1371/journal. pmed.0050183.t001.
Rausher, M. D., M. A. McPeek, and A. J. Moore. 2010. Data archiving. Evolution. 64(3): 603–604. DOI: 10.1111/j.1558-5646.2009.00940.x.
Silverthorne, J. 2015. National Science Foundation Public Access Plan: Advisory Committee for Biological Sciences Meeting. April 22, 2015. Arlington, VA: National Science Foundation.
USDA Forest Service. 2015. Metadata Standards [Research Data Archive] [Online] Available at http://www.fs.usda.gov/rds/archive/Metadata/Standards (accessed April 24, 2015).
———. 2016. FY 2015 Budget Justification [Budget & Performance] [Online] Available at http://www.fs.fed.us/about-agency/budget-performance (accessed July 22, 2016).
Watson, A. E., R. Cronn, and N. A. Christensen. 1998. Monitoring inter-group encounters in wilderness. Research Paper RMRS-RP-14. Fort Collins, CO: US Department of Agriculture Forest Service, Rocky Mountain Research Station.
Wilderness.net. 2015. Frank Church-River of No Return Wilderness [Online] Available: http://www.wilderness.net/index. cfm?fuse=NWPS&sec=wildView&wna me=Frank%20Church-River%20of%20 No%20Return (accessed April 28, 2015).
Wilson, A. 2010. How much is enough: Metadata for preserving digital data. Journal of Library Metadata 10(2–3): 205–217. DOI: 10.1080/19386389.2010.506395.