Core competence of Data Stewards
Data stewardship and the position of Data Steward (DS) is relatively recent (~ 2017). Therefore, the "core competences" of DSs - what DSs do and what they know - are still being considered.
In this report, the FAIRsFAIR consortium has analised job offerings and other similar resources and generated a competence framework for DSs.
Here are reported such competences, with some modifications, from the above document.
note
Further work on this page will link core competences to relevant pages in the Data Stewardship knowledgebase.
Data Management
"Data Management" is an umbrella term covering all aspects of working with data, similar to "data handling". Many of these concepts also fall under the broad term "🔰 data curation".
- Develop and implement strategies for:
- Data collection;
- Data storage;
- Data preservation;
- Ensuring data is compliant with FAIR principles.
- Create Data Management Plans and Data governance policies, which are aligned with best practices in the field.
- Know and use relevant data and metadata data types and formats, as well as use and develop common standards for data and metadata.
- Be familiar, develop and use metadata management tools.
- Ensure recording of data provenance, including creation and manipulation, also through data publishing.
- Develop and implement strategies for long-term data archival, including:
- Develop data archival policies which complies with open science principles, open access policies and best practices for interoperability;
- Archival of metadata, with specific emphasis on data provenance;
- Policies for long term data accessibility and assurance of data integrity;
- Estimation of long-term data archival costs.
- Develop policies and methods to measure data quality and ensure compliance with community standards, also in coordination with data owners;
- Develop, implement and supervise policies on data protection, especially when sharing data, including:
- Compliance with data privacy laws such as the GDPR;
- Ethical issues;
- Address legal issues if necessary;
- Digital data security and integrity, referring to malicious data access, stealing and tampering;
- Collaborate with other Data Stewards and manage a team of Data Stewards;
- Coordinate data-related activities between departments and between departments and external collaborators in accordance with local and foreign data policies;
- Define domain-specific data management requirements, and supervise their development, also in collaboration with other departments.
- Coordinate and supervise data acquisition.
- Develop policies for the implementation of open science principles, including FAIR data;
- Define, develop and supervise required infrastructure for data management and archival;
- Provide tools, guidance and training to other experts that deal with data (e.g. researchers).
Data Engineering
"Data engineering" encompasses actual technologies that deal with data: collecting, analysing, transferring, storing and sharing it.
- Be familiar with modern computer science technologies, specifically to:
- Design and implement data analytics applications;
- Design and develop experiments, processes and infrastructure for data handling during the whole data lifecycle, including:
- Data collection;
- Data storage;
- Data cleaning (munging);
- Data analysis;
- Data visualization;
- Data archival;
- Develop and prototype specialised data handling procedures for specific needs.
- Develop and manage infrastructure for data handling and analysis, with emphasis on big data, data streaming and batch processing, while ensuring provenance and FAIRness.
- Develop, deploy and operate data infrastructure, including data storage, while following data management policies, with specific attention to the implementation of FAIR principles.
- Apply data security mechanisms throughout the data lifecycle, including designing and implementing data access policies for different stakeholders.
- Design, build and operate SQL and NoSQL databases, with particular attention to data models (structure), consistent metadata, data vocabularies and data accessibility.
- Develop and implement policies and methodologies for data reuse, interoperability and integration of local (i.e. of the organization) and external data.
Research methods and Project management
Data stewards need to work closely with researchers and other experts before, during and after research projects. It is therefore important to have competences in research management and more broadly project management. Some of this concepts might seem obvious and broad to people who have a research backgroud, but this might not be the case for people in all backgrounds.
- Create new knowledge (i.e. concepts, understandings, relationships and capabilities) through the scientific method based on scientific facts and data;
- Discover new approaches to achieve research goals, also through the re-usage of available (FAIR) data and software.
- Use available domain-related knowledge to generate novel sound hypotheses;
- Inspect and periodically audit the research process, with specific regards to quality, (i.e. integrity, soundness, and usefulness), openness and inclusivity.
- Design, develop and supervise data-driven projects, which include:
- Project planning;
- Experimental design, also in conjunction with domain experts such as Data Science, data infrastructure and other data stewards;
- Data collection;
- Data handling.
Domain-specific competences
Each research domain works with wildly different data types, formats and sources. This means that each domain requires a different sets of competences. This sections tries to outline in which contexts this domain-specific knowledge has to be taken into account.
- Use and adopt general Data Science methods to domain-specific issues, such as:
- Data types;
- Data presentations;
- Organizational roles and relations;
- Analyse, collect and assess data to achieve organizational goals, such as quality assurance of the organizational system;
- Identify and monitor performance indicators to identify and asses potential organizational challenges and needs. Specify data models, transparency policies and handling procedures for such performance indicators.
- Monitor and analyse indicators to identify current trends and potential future developments in local adoption of policies, methods, tools and other areas related to data management, FAIR implementation and open science. Ensure transparency of the process;
- Coordinate organization-level activities between different domains related to data management, provenance and analytics, with particular focus on data FAIRness throughout the data lifecycle.