The logo for the DSK


important

The Data Stewardship Knowledgebase is under construction. Expect empty pages, warning signs, and hammers and nails left on the floor. It also might change drastically without notice.

The DSK aims to be a handbook of useful resources for both current Data Stewards handling data and future Data Stewards-to-be which are just approaching the subject. To this end, it has a few main goals:

  • Define what data stewardship is, and provide insight on what meaningful data stewardship should look like in different contexts, with particular emphasis in the context of public research.
  • Aggregate in an orderly way the resources found scattered on the internet, as data management can be a diffuse topic touching many aspects in many different contexts;
  • Integrate information from other websites with additional context and, if needed, create new resources to fill in gaps from publicly available knowledge.
  • Define lists of best practices and methods, as well as providing ways to find and define such methods, in a wide array of contexts;
  • Provide practical guides and how-tos to deal with common or recurring problems when dealing with data stewardship and management in different contexts.
  • Promote principles of meaningful data stewardship in many research contexts, and provide teaching material useful to promote such principles to a wide audience by Data Stewards and other people interested to do so.
  • Promote the critical evaluation of the philosophy of science and the method of doing science of research groups and institutions through the collection of useful resources and teaching materials.

The DSK is structured in four broad categories of interest: Open Science, Computer Science Toolbox, Policy and Legal Issues and Stewarding the Data Lifetime. They are described below, so that you may be aware of the overarching structure of the DSK.

Open Science

The profession of Data Steward, and the concept of meaningful, useful data stewardship for the benefit of the community is the culmination of years of Open Science philosophy. This section aims to explore the aspects of Open Science, in particular in the context of data management. It covers topics such as:

  • What Open Science is;
  • Why is Open Science the right direction for researchers and research institutions to take;
  • What could go wrong if Open Science is implemented badly;
  • What do Data Stewards do in the context of Open Science;
  • How to efficiently teach Open Science concepts to others;
  • Why data and data stewardship matters so much for Open Science;
  • Why a third party (like a researcher) might be interested in implementing Open Science and Data Stewardship policies;

Computer Science Toolbox

In the modern day, data is almost always manipulated digitally in some form. Even physical objects might be listed in a digital index, or scanned and digitalized altogether. For this reason, a Data Steward has to have some computer science knowlege and a toolbox of digital hammers and wrenches which are useful when dealing with digital data. This section covers topics such as:

  • What digital data is;
  • How digital data is encoded, transmitted and shared with others;
  • What formats are available to save data in;
  • What is metadata and in which formats are available to represent it;
  • What data infrastructures are and how to manage them (as potential administrators);
  • Technologies to manipulate, reshape, fuse and split data;
  • Determination of costs related to data management (e.g. storage and computing power);
  • Knowledge of relevant tools that can be used to obtain, reshape, reuse, manipulate and share data throughout a research project.

The administration of data, especially personal data, may be subject (or should be subjected) to laws. This section aims to aggregate such concepts and make a data steward both aware of them and capable of dealing with them. It covers topics such as:

  • National and International privacy laws regarding personal data;
  • Legal issues when reusing other’s code and data;
  • Ethical concerns of releasing, reusing and otherwise manipulating data;
  • Determining the ethical and legal risks related to handling specific types of data;
  • How to give recognition when reusing a piece of data produced by others;
  • Creation of effective Open Science policies and plans of action for groups and organizations;
  • Fulfilling Open Science/Data Stewardship requirements for funding bodies that require them (i.e. DMPs);
  • The soft skills required for effective management and administration of an organization interested in implementing data stewardship practices;

Stewarding the data lifetime

The most expansive and eterogeneous section, "Stewarding the data lifetime" deals with the philosophical, pratical and technical aspects of data stewardship, from the planning of data collection, to the manipulation of fresh data, to its potential deletion or archival, etc... This section is heavily context-specific: ideas that might apply to data in the context of biological science might not be relevant to Architectural studies, and vice-versa. This section covers many topics, and some examples include:

  • How to plan data collection, even at large scales and with many data collection partners;
  • Determining when, where and how to store newly created data;
  • Defining and measuring data quality for specific data types in specific contexts;
  • Designing and implementing data curation procedures, from collection to archival;
  • Solving the discard problem and defining methods and formats of long to very-long term preservation of archive data;
  • Determining the best methods of reusing published data to limit useless expenditures, with particular regards to ascertaining data quality and usefulness for the purpose.

Contributing

Thank you for wanting to contribute! Before contributing, please read the contributing guide in the Github repository of the project.

After you are familiar with how to contribute, you can use the edit icon in the top-right of each page to edit that page directly on GitHub and open a pull request with your change.

All contributions are treasured. You can find a list of all contributors in the contributors page. Thank you to all these wonderful people!