ETD PDF

The Big Data Lifecycle in Open Ecoinformatics: Curation, Analysis, and Sharing

Citation

Flathers, Edward. (2022-05). The Big Data Lifecycle in Open Ecoinformatics: Curation, Analysis, and Sharing. Theses and Dissertations Collection, University of Idaho Library Digital Collections. https://www.lib.uidaho.edu/digital/etd/items/flathers_idaho_0089e_12323.html

Title:
The Big Data Lifecycle in Open Ecoinformatics: Curation, Analysis, and Sharing
Author:
Flathers, Edward
ORCID:
0000-0002-2816-0781
Date:
2022-05
Keywords:
data repositories data science open science
Program:
Natural Resources
Subject Category:
Forestry; Geographic information science and geodesy; Remote sensing
Abstract:

Research data go through a cyclical process from the point of their conception during project planning, through experimental design and sample design, data collection, organization, analysis, storage, curation, and, ideally, re-use. Historically, not all steps in the lifecycle have been given the same level of attention.Much of the data researchers have collected have become “dark data,” often recorded on paper and, once the project has concluded, consigned to a file cabinet, never to be seen again. There is a reproducibility crisis in the sciences that is being slowly revealed to have quietly spread across many disciplines, casting doubt on the veracity of some published results. Even when methods are transparent and data published, we face challenges agreeing exactly what the rules are for sharing research data with each other. Chapter 1 provides an introduction and background information about Big Data, the data lifecycle, the FAIR Data Principles, and concepts surrounding open science. Together, these topics provide a foundation and motivation for the material in the remaining chapters. Chapter 2 applies the concept of service-oriented architecture from computer sciences to the task of designing an OAIS (Open Archival Information System) data repository. Such repositories are used to store, curate, and manage research data, and to provide visibility and access to research data that help to enable re-use. Chapter 3 provides an example of using the concepts of open science to produce research products using transparent methods that are clearly reproducible. While generating a model predicting levels of organic carbon found in soil in the Northwestern United States, the key to ensuring that results are reproducible is to publish all research data and computer code used in analysis and preparation of those results. Chapter 4 addresses the issue of how we express and agree upon common rules for data sharing. As data sharing becomes less personal, more distributed, and potentially more automated, we need formal ways of expressing sharing agreements. Furthermore, these agreements must be easily readable by both humans and machines to be effective. Chapter 5 provides some concluding remarks and considers the material of the earlier chapters in the context of contemporary challenges accompanying the era of Big Data.

Description:
doctoral, Ph.D., Natural Resources -- University of Idaho - College of Graduate Studies, 2022-05
Major Professor:
Gessler, Paul E
Committee:
Kenyon, Jeremy; Ma, Xiaogang; Sheneman, Lucas; Goebel, P. Charles
Defense Date:
2022-05
Identifier:
Flathers_idaho_0089E_12323
Type:
Text
Format Original:
PDF
Format:
application/pdf

Contact us about this record

Rights
Rights:
In Copyright - Educational Use Permitted. For more information, please contact University of Idaho Library Special Collections and Archives Department at libspec@uidaho.edu.
Standardized Rights:
http://rightsstatements.org/vocab/InC-EDU/1.0/