Data Management Planning Checklist
The following are questions that can assist in developing a strong data management plan. Not all questions may apply to your project.
Describing Your Expected Data:
- What datasets are expected to be produced from the project? Include both raw data and processed data and any anticipated derivative applications or models.
- What form(s) and format(s) are the data in?
- What consistent naming methods are being used for data files or folders?
- Has any data in the set been collected from other sources, a.k.a. third-parties? If so, have you cleared any copyright concerns to use and re-publish this data?
- What specific tools or software are needed to view, process, or visualize the data? Is any proprietary software needed?”
- Who owns the data?
- Who is responsible for managing the data?
- Who will have access to the data during the project?
Enabling Discovery of Your Data:
- Who will be responsible for documenting the data (creating the metadata)?
- What standards will be used for the metadata (e.g. XML-based like EML, Dublin Core)?
- Have you used any formal specialized vocabularies, code lists, thesauri, or taxonomies (e.g. phylogenetic taxonomies, ISO topics)?
- Have you used any customized abbreviations or shorthand? Are they explained in full in the data documentation?
Enabling Long-term Storage of Your Data:
- Where will the data be stored during the project?
- What backup measures will be implemented?
- Where will the data be archived for long-term storage?
- Will you expect to alter or update archived data, or is it permanently finished once archived?
- How long should the data be stored by an archive or repository?
- How large is the dataset, and if relevant, what is its anticipated rate of growth? (e.g. MB/year)
Enabling Sharing of Your Data:
- How should the data be made accessible?
- Who are the potential audiences for the data?
- How could the data be re-used and re-purposed?
Copyright, Security, Privacy Concerns:
- Does the dataset include any sensitive information subject to confidentiality concerns?
- If possible, describe any required or special measures required by funders, lab, or IRB.
- Will the data be collected in the United States or other? If so, where?
- Do the data have unique identifiers?
Enabling Citation of Your Data:
- What publications, discoveries, or further datasets have resulted from the data?
- Who will receive credit for authoring the data? In what order, if any, should the authors be given?
- What organizational name should be referenced in citing the data?