important
The Data Stewardship Knowledgebase is under construction. Expect empty pages, warning signs, and hammers and nails left on the floor. It also might change drastically without notice.
The DSK aims to be a handbook of useful resources for both current Data Stewards handling data and future Data Stewards-to-be which are just approaching the subject. To this end, it has a few main goals:
- Define what data stewardship is, and provide insight on what meaningful data stewardship should look like in different contexts, with particular emphasis in the context of public research.
- Aggregate in an orderly way the resources found scattered on the internet, as data management can be a diffuse topic touching many aspects in many different contexts;
- Integrate information from other websites with additional context and, if needed, create new resources to fill in gaps from publicly available knowledge.
- Define lists of best practices and methods, as well as providing ways to find and define such methods, in a wide array of contexts;
- Provide practical guides and how-tos to deal with common or recurring problems when dealing with data stewardship and management in different contexts.
- Promote principles of meaningful data stewardship in many research contexts, and provide teaching material useful to promote such principles to a wide audience by Data Stewards and other people interested to do so.
- Promote the critical evaluation of the philosophy of science and the method of doing science of research groups and institutions through the collection of useful resources and teaching materials.
The DSK is structured in four broad categories of interest: Open Science, Computer Science Toolbox, Policy and Legal Issues and Stewarding the Data Lifetime. They are described below, so that you may be aware of the overarching structure of the DSK.
Open Science
The profession of Data Steward, and the concept of meaningful, useful data stewardship for the benefit of the community is the culmination of years of Open Science philosophy. This section aims to explore the aspects of Open Science, in particular in the context of data management. It covers topics such as:
- What Open Science is;
- Why is Open Science the right direction for researchers and research institutions to take;
- What could go wrong if Open Science is implemented badly;
- What do Data Stewards do in the context of Open Science;
- How to efficiently teach Open Science concepts to others;
- Why data and data stewardship matters so much for Open Science;
- Why a third party (like a researcher) might be interested in implementing Open Science and Data Stewardship policies;
Computer Science Toolbox
In the modern day, data is almost always manipulated digitally in some form. Even physical objects might be listed in a digital index, or scanned and digitalized altogether. For this reason, a Data Steward has to have some computer science knowlege and a toolbox of digital hammers and wrenches which are useful when dealing with digital data. This section covers topics such as:
- What digital data is;
- How digital data is encoded, transmitted and shared with others;
- What formats are available to save data in;
- What is metadata and in which formats are available to represent it;
- What data infrastructures are and how to manage them (as potential administrators);
- Technologies to manipulate, reshape, fuse and split data;
- Determination of costs related to data management (e.g. storage and computing power);
- Knowledge of relevant tools that can be used to obtain, reshape, reuse, manipulate and share data throughout a research project.
Policy and Legal issues
The administration of data, especially personal data, may be subject (or should be subjected) to laws. This section aims to aggregate such concepts and make a data steward both aware of them and capable of dealing with them. It covers topics such as:
- National and International privacy laws regarding personal data;
- Legal issues when reusing other’s code and data;
- Ethical concerns of releasing, reusing and otherwise manipulating data;
- Determining the ethical and legal risks related to handling specific types of data;
- How to give recognition when reusing a piece of data produced by others;
- Creation of effective Open Science policies and plans of action for groups and organizations;
- Fulfilling Open Science/Data Stewardship requirements for funding bodies that require them (i.e. DMPs);
- The soft skills required for effective management and administration of an organization interested in implementing data stewardship practices;
Stewarding the data lifetime
The most expansive and eterogeneous section, "Stewarding the data lifetime" deals with the philosophical, pratical and technical aspects of data stewardship, from the planning of data collection, to the manipulation of fresh data, to its potential deletion or archival, etc... This section is heavily context-specific: ideas that might apply to data in the context of biological science might not be relevant to Architectural studies, and vice-versa. This section covers many topics, and some examples include:
- How to plan data collection, even at large scales and with many data collection partners;
- Determining when, where and how to store newly created data;
- Defining and measuring data quality for specific data types in specific contexts;
- Designing and implementing data curation procedures, from collection to archival;
- Solving the discard problem and defining methods and formats of long to very-long term preservation of archive data;
- Determining the best methods of reusing published data to limit useless expenditures, with particular regards to ascertaining data quality and usefulness for the purpose.
Contributing
Thank you for wanting to contribute! Before contributing, please read the contributing guide in the Github repository of the project.
After you are familiar with how to contribute, you can use the edit icon in the top-right of each page to edit that page directly on GitHub and open a pull request with your change.
All contributions are treasured. You can find a list of all contributors in the contributors page. Thank you to all these wonderful people!
Core competence of Data Stewards
Data stewardship and the position of Data Steward (DS) is relatively recent (~ 2017). Therefore, the "core competences" of DSs - what DSs do and what they know - are still being considered.
In this report, the FAIRsFAIR consortium has analised job offerings and other similar resources and generated a competence framework for DSs.
Here are reported such competences, with some modifications, from the above document.
note
Further work on this page will link core competences to relevant pages in the Data Stewardship knowledgebase.
Data Management
"Data Management" is an umbrella term covering all aspects of working with data, similar to "data handling". Many of these concepts also fall under the broad term "🔰 data curation".
- Develop and implement strategies for:
- Data collection;
- Data storage;
- Data preservation;
- Ensuring data is compliant with FAIR principles.
- Create Data Management Plans and Data governance policies, which are aligned with best practices in the field.
- Know and use relevant data and metadata data types and formats, as well as use and develop common standards for data and metadata.
- Be familiar, develop and use metadata management tools.
- Ensure recording of data provenance, including creation and manipulation, also through data publishing.
- Develop and implement strategies for long-term data archival, including:
- Develop data archival policies which complies with open science principles, open access policies and best practices for interoperability;
- Archival of metadata, with specific emphasis on data provenance;
- Policies for long term data accessibility and assurance of data integrity;
- Estimation of long-term data archival costs.
- Develop policies and methods to measure data quality and ensure compliance with community standards, also in coordination with data owners;
- Develop, implement and supervise policies on data protection, especially when sharing data, including:
- Compliance with data privacy laws such as the GDPR;
- Ethical issues;
- Address legal issues if necessary;
- Digital data security and integrity, referring to malicious data access, stealing and tampering;
- Collaborate with other Data Stewards and manage a team of Data Stewards;
- Coordinate data-related activities between departments and between departments and external collaborators in accordance with local and foreign data policies;
- Define domain-specific data management requirements, and supervise their development, also in collaboration with other departments.
- Coordinate and supervise data acquisition.
- Develop policies for the implementation of open science principles, including FAIR data;
- Define, develop and supervise required infrastructure for data management and archival;
- Provide tools, guidance and training to other experts that deal with data (e.g. researchers).
Data Engineering
"Data engineering" encompasses actual technologies that deal with data: collecting, analysing, transferring, storing and sharing it.
- Be familiar with modern computer science technologies, specifically to:
- Design and implement data analytics applications;
- Design and develop experiments, processes and infrastructure for data handling during the whole data lifecycle, including:
- Data collection;
- Data storage;
- Data cleaning (munging);
- Data analysis;
- Data visualization;
- Data archival;
- Develop and prototype specialised data handling procedures for specific needs.
- Develop and manage infrastructure for data handling and analysis, with emphasis on big data, data streaming and batch processing, while ensuring provenance and FAIRness.
- Develop, deploy and operate data infrastructure, including data storage, while following data management policies, with specific attention to the implementation of FAIR principles.
- Apply data security mechanisms throughout the data lifecycle, including designing and implementing data access policies for different stakeholders.
- Design, build and operate SQL and NoSQL databases, with particular attention to data models (structure), consistent metadata, data vocabularies and data accessibility.
- Develop and implement policies and methodologies for data reuse, interoperability and integration of local (i.e. of the organization) and external data.
Research methods and Project management
Data stewards need to work closely with researchers and other experts before, during and after research projects. It is therefore important to have competences in research management and more broadly project management. Some of this concepts might seem obvious and broad to people who have a research backgroud, but this might not be the case for people in all backgrounds.
- Create new knowledge (i.e. concepts, understandings, relationships and capabilities) through the scientific method based on scientific facts and data;
- Discover new approaches to achieve research goals, also through the re-usage of available (FAIR) data and software.
- Use available domain-related knowledge to generate novel sound hypotheses;
- Inspect and periodically audit the research process, with specific regards to quality, (i.e. integrity, soundness, and usefulness), openness and inclusivity.
- Design, develop and supervise data-driven projects, which include:
- Project planning;
- Experimental design, also in conjunction with domain experts such as Data Science, data infrastructure and other data stewards;
- Data collection;
- Data handling.
Domain-specific competences
Each research domain works with wildly different data types, formats and sources. This means that each domain requires a different sets of competences. This sections tries to outline in which contexts this domain-specific knowledge has to be taken into account.
- Use and adopt general Data Science methods to domain-specific issues, such as:
- Data types;
- Data presentations;
- Organizational roles and relations;
- Analyse, collect and assess data to achieve organizational goals, such as quality assurance of the organizational system;
- Identify and monitor performance indicators to identify and asses potential organizational challenges and needs. Specify data models, transparency policies and handling procedures for such performance indicators.
- Monitor and analyse indicators to identify current trends and potential future developments in local adoption of policies, methods, tools and other areas related to data management, FAIR implementation and open science. Ensure transparency of the process;
- Coordinate organization-level activities between different domains related to data management, provenance and analytics, with particular focus on data FAIRness throughout the data lifecycle.
Emoji Key
Many links are tagged with emojis. Here's what they mean:
- Type of content:
- ➰ > A link to another page in the DSK.
- 💬 > Opinion piece, presentation, blog post or other content by an individual or organization.
- 📰 > News article, editoral or piece by a journalist.
- 🏢 > Official communication from an organization, oftentimes an institutional (i.e. Government-backed) organization.
- 🧑⚖️ > Text of a law or other binding document currently active in one or more countries. Associate the ❌ emoji if the law is no longer in effect.
- 📑 > Published research article or review in a canonical peer-reviewed journal, or similar (e.g. ongoing open peer review).
- 📄 > Preprint in a preprint server.
- 📃 > Poster or other vignette.
- 📕 > Book or long-form report.
- 💁 > Presentation to a meeting, conference, etc...
- 📝 > Official agreement, treatise or manifesto of purpose with no legally binding effects published by an organization or group of organizations.
- 🔨 > Tool, practical resource checklist or handbook.
- Format of content:
- The default format is a simple webpage (HTML), and has no associated emoji.
- 🔻 > PDF (
.pdf
). - 🔸 > Presentation (e.g.
.pptx
). - ▶️ > Video or other multimedia formats.
- Language:
- The default language is English, and has no associated emoji.
- 🇮🇹 > Italian.
- 🇫🇷 > French.
- 🇪🇸 > Spanish.
- Accessibility:
- The default accessibility is unrestricted (e.g. an Open Access paper), and has no associated emoji. Such resources should be freely perusable without any expense by the user (other than a computer, electricity and a web connection, obviously).
- 🔒 > This resource is paywalled, requires a login or is not publicly and freely available due to other reasons.
- 🔐 > This resource requires a login or registration in order to provide its services, but it is otherwise free to use or read.
- Content quality or fruibility:
- 🔰 > Easy to use, understand or in general a beginner-friendly resource.
- ⭐ > This resource is particularly important or fundamental for a topic.
- ❌ > Retracted, false or misleading information.
- Other:
- 🍪 > This website requires the usage of cookies.
- 📥 > This link immediately downloads a file to the user's computer.
- ⚫ > This link has been screened, but no other emoji tags apply.
Not all links are fully tagged. Please consider contributing if you find an error or an omission.
Open Science
The profession of Data Steward, and the concept of meaningful, useful data stewardship for the benefit of the community is the culmination of years of Open Science philosophy. This section aims to explore the aspects of Open Science, in particular in the context of data management. It covers topics such as:
- What Open Science is;
- Why is Open Science the right direction for researchers and research institutions to take;
- What could go wrong if Open Science is implemented badly;
- What do Data Stewards do in the context of Open Science;
- How to efficiently teach Open Science concepts to others;
- Why data and data stewardship matters so much for Open Science;
- Why a third party (like a researcher) might be interested in implementing Open Science and Data Stewardship policies;
What is Open Science?
definition
Open science is a set of principles and practices that aim to make scientific research from all fields accessible to everyone for the benefits of scientists and society as a whole. Open science is about making sure not only that scientific knowledge is accessible but also that the production of that knowledge itself is inclusive, equitable and sustainable.
- 🏢 ⭐ UNESCO definition of Open Science
- UNESCO 🔻 🏢 Recommendations for Open Science
- 🏢 🔻 📥 ⭐ Strategic Research and Innovation Agenda: Critical success factors for Open Science in Europe.
- See sections 1.3 for the definition of Open Science and some historical facts.
- 🏢 🔻 📕 UNESCO - Open Science Outlook 1.
- This is a very long document (74 pages), on the status of Open Science in 2023, but has a section of "Key Messages" that summarize its message. These include the benefits of Open Science, how to achieve its goals, how it has grown and what it needs to grow further.
- 🏢 🔻 📥 Horizon Europe's application template with a section on Open Science practices
- Under the methodology section, the grant specifies that applicants should "Describe how appropriate open science practices are implemented as an integral part of the proposed methodology. Show how the choice of practices and their implementation are adapted to the nature of your work, in a way that will increase the chances of the project delivering on its objectives [e.g. 1 page]. If you believe that none of these practices are appropriate for your project, please provide a justification here."
- 💬 🇮🇹 Elena Giglia - Open Science è una necessità, non una noia burocratica
- An overview article about Open Science, scholarly publishing and the importance of making research accessible to everyone, also under the light of the covid-19 pandemic.
- 💬 💁 🔻 Dr. Jon Tennant - Open Science is just Good Science
- Tennant touches on what Open Science is, its benefits, and how to put it in practice.
History of Open Science
This section covers the history of Open Science, from its inception, to crucial events in its history, to the current day.
The Open Movement in Europe
Open Science has strong backing from the European Commission:
- 🏢 📕 European research area policy agenda Years 2022/2024
- Neelie Kroes, ex European Commissioner for Competition and Digital Agenda - 💬 ▶️ Let's make science Open
Open Science and Covid-19
The Covid-19 pandemic has highlighted the importance of Open Science. This section includes resources that discuss how Open Science has helped in the fight against Covid-19 and how it went wrong in some cases.
- 🍪 📑 Open science approaches to COVID-19
- "In response [to COVID-19], researchers have adopted open science methods to begin to combat this disease via global collaborative efforts. We summarise here some of those initiatives, and have created an updateable list to which others may be added."
- 📑 OECD - Mobilizing Science in times of crisis
- What practices were used during the pandemic, how they worked, an the work needed in the future to make science more open and collaborative.
- 📕 🔻 The State of Open Data 2021
- "Open Data saves lives".
- A report on open data, including sections on the state of open data, its role in the life sciences, how open data can combat fraud, and how to engage researchers in its creation and use.
- 📰 Collaboration in the times of COVID
- 💬 All prints should be preprints
- An opinion piece on how the pandemic has shown the importance of preprints in scientific communication, as they immediately share knowledge with the community and the wide public. It also highlights some misconceptions about preprints.
- 💬 💁 ▶️ Robert Terry - Implications of the pandemic for publications
- 📰 Calling all coronavirus researchers: stay open
- A controversial editorial from Springer-Nature (which is a for-profit, closed publisher) that calls for researchers to share their data and findings openly during the pandemic.
- 💬 🍪 Don't lockdown research results
- 📰 STM makes open all coronavirus research for the duration of the outbreak.
- 💬 The purpose of publications in a pandemic and beyond
Open Science Organizations
This page collects some information about open science organizations together with a brief description, their motives and goals, and the services they offer.
Coalition S
cOAlition S is an organization built around "Plan S", a committment to make all articles written on publicly-funded research Open Access, effective immediately. You can read more on the cOAlition S about page and on :memo: Plan S.
- The 🔻 🏢 Coalition S preamble is the founding document of the coalition, with all considerations made when creating it plus its goals.
COARA
COARA, the Coalition for Advancing Research Assessment, is an organization striving to reform the methods for research assessment in accordance to ➰ Open Science principles.
In particular, they aim to find methods to reward all types of research outputs, not only publications and patents.
COARA is a coordinated group effort divided in 📰 COARA National Chapters and 📰 COARA Working groups. The COARA Website is the access point of all resources for the COARA initiative.
- Stakeholder may sign the "⭐ 📝 COARA - Full text agreement" in order to participate. It defines concrete goals, such as the creation of action plans.
- What are Action Plans and published COARA action plans on Zenodo
- 📝 🔻 COARA - See Annex 4 of the full text for useful practical tools and options to consider when dealing with research assessment: Annex 4 of the full text agreement
- A list of all signatories of the agreement is available in the signatories page of the COARA website
- Origins of COARA:
- The COARA funding document by the European commission: :office: :closed_book: European commission report - 2021 - Towards a reform of the research assessment system
- 🏢 📕 2021 - Outcomes of proceedings, on research evaluation;
- 🏢 📕 2018 - Commission Recommendations on access and preservation of scientific information
- 🏢 📕 2022 - Outcomes of proceedings, on research evaluation;
COARA and the force behind it has produced some changes:
- 📰 The ERC abandons the Impact Factor
- 📝 Making FAIReR assessments possible
- 🍪 📰 University of Uthrect rejects current university ranking standards
- 📰 Impact factor abandoned by Dutch university in hiring and promotion decisions
- 📰 Sorbona ditches WOS
- 📰 DORA case studies on the implementation of alternative metrics
- 🔨 Reformscape, an overview of ongoing changes and policies in research evaluation. Works a bit like a search engine.
Alternative metric sources, detached from canonical publishers and publishing in general are crucial for COARA. Here are a few tools and resources built for that regard:
- 🔨 Leiden Open Ranking, for the scientific performance of universities, publication-centric.
- 🔨 Open Alex, a non-profit organization that indexes publications. Similar to Elsevier's Scopus and Clarivate's Web of Science, Open Alex is an open, transparent replacement.
- 🔨 Open Citations provides a public, free index of citations (which papers cite which) for bibliometric and research purposes. It is chiefly useful for other applications that query its underlying knowledge graph (e.g. through SPARQL).
- 💬 The benefits of Open science are not inevitable: monitoring its development should be value-led
- 📕 ⭐ 🔻 OPUS - Reforming research assessment, on alternative indicators and metrics of researcher performance
- ⚫ PathOS and the :hammer: PathOS indicator handbook
- 📕 🏢 🔻 European commission - Indicator frameworks for fostering open knowledge practices in science and scholarship
Miscellaneous resources on the reform of research evaluation:
- ▶️ 💁 🇮🇹 Open Science Cafe' - Riforma della valutazione della ricerca
- 📄 What we talk about when we talk about research quality. A discussion on responsible research assessment and Open Science
- 📰 Revisiting the metric tide
- 🇮🇹 💬 Qualita' o formalita'?, an italian document on the European research assessment reform.
Scientific Communication
This section deals with scientific communication. In particular, it focuses on the role of publishers, how the publishing industry has changed over the years, and what new opportunities are available for researchers in the modern era.
- 🍪 💬 How to reclaim ownership of scholarly publishing
- Source of the quote "“I chose to study science because I wanted to publish in Nature,” said no undergraduate student ever."
- A piece about how the current publishing system is not working for researchers.
- 📝 SIMBA - Value of global scientific publishing
- Scientific publishing is a 12 billion dollar industry (in 2022).
- 🍪 📑 Rosendaal H. –Geurts P. Forces and functions in scientific communication:an analysis of their interplay, CRISP 1997
- An overview of the history of scientific communication, what functions it serves, and future prospects of its form in the digital age.
- 💬 Jean-Claude Guedon - Scholarly Communication and Scholarly Publishing
- How scholarly communication and scholarly publishing have diverged over the years, how scholarly publishing in the digital age currently is and how it could become.
- 📃 101 innovations in scholarly publishing
- 🍪 💬 Open science needs no martyrs
- An interview with Toma Susi, of the University of Vienna, touching on the need for reform in the publishing industry but that noone should pay for it in terms of career progression.
- 📑 🔻 Disrupting the subscription journal's business model for the necessary large-scale transformation to open access
- "This paper makes the strong, fact-based case for a large-scale transformation of the current corpus of scientific subscription journals to an open access business model."
- 📰 Costs, benefits of making all articles free to read, the stance of publishers.
- This is a controversial piece. On one hand, it provides an overview of open access, Plan S, and coalition S, but on the other hand it shows some of the arguments against open access. Unsure of what it should be tagged with.
- 💬 ACS and author's right retention, :newspaper: ACS news on Green Open Access and 💬 COAR's response of ACS news
- These articles discuss the American Chemical Society's new policy on Green Open Access, which requires authors to pay a fee to deposit their work in a repository.
- 💬 Jeff Pooley - Large Language Publishing, how LLMs are created on journal data.
The case of Elsevier
These resources discuss in particular the editor Elsevier, as a case-study.
- 💬 Publisher control of all scholarly infrastructure
- How publishing groups have started to control all aspects of research output: from planning research questions, to literature review, to data collection, to peer review, to publication, to dissemination.
- 📑 Jefferson Pooley - Surveillance Publishing
- "This essay develops the idea of surveillance publishing, with special attention to the example of Elsevier. A scholarly publisher can be defined as a surveillance publisher if it derives a substantial proportion of its revenue from prediction products, fueled by data extracted from researcher behavior."
- Navigating Risk in vendor data privacy practices, an analysis of Elsevier's ScienceDirect
- 📝 SPARC's 2021 Update
- SPARC is "a non-profit advocacy organization that supports systems for research and education that are open by default and equitable by design." (https://sparcopen.org/who-we-are/). This document "[...] suggests organizational changes in academic institutions to both (1) manage increasing strategic and ethical challenges and (2) deploy hammers and analyze data to better understand the needs and protect the interests of individuals and communities."
- 📝 📥 🔻 Direct PDF Link
- 📰 💬 Sci-hub, Elsevier and Wiley declare war on research communities in India
Alternatives to traditional publishing
- 📑 Principles of the self-journal of Science: bringing ethics and freedom to scientific publishing
- This article discusses the principles of the self-journal of Science, a new publishing model that aims to bring ethics and freedom to scientific publishing.
- 💬 Ten ways to find open access articles and 💬 alternative ways to access journal articles
Open Access
This section includes resources specifically about Open Access.
- 🏢 Berlin declaration on Open Access
- The founding document of the Open Access movement, it delineates the requirement to move away from paywalled content in the era of the internet towards Open Access. It defines what Open Access is, and how to support the transition to the open paradigm.
- 🍪 ScienceOpen - Open Access Survey results
- A survey of 60 researchers about Open Access.The low number of respondents makes the results not very reliable.
- Sampling strategy is also not clear. This may have been a convenience sample, on people who participated in a ScienceOpen event, making the results not generalizable.
- 📑 Shift academic culture through publication, an article discussing how exploitative publishers are a problem, especially discriminating poorer researchers.
- European Commission - 🏢 🔻 Study of scientific publishing in Europe (2024), on the state of scientific publishing in Europe, including publishing costs.
- 🇫🇷 🏢 Barometer of Open Science, data on the progressive shift to open publishing practices in France.
- DoaJ - 🔨 Open Access Journal repository
- Open Science Cafè - 🇮🇹 💁 Attività europee per l'open access
Sherpa helps authors decide where to publish, including services that compile what their rights are after publication. See 🔨 About Sherpa for an overview:
- 🔨 Sherpa Romeo: what are the archiving polices of different journal publishers? An author can go here to learn how to open up their articles, even when publishing in a closed-access journal.
- 🔨 Sherpa Juliet: what are the publishing requirements of funding agencies? Authors can check the publishing requirements based on who funds their research.
- 🔨 Sherpa Fact: combining data from Romeo and Juliet, it shows if journals are compliant with best publishing practices.
Some universities provide open access publishing services. An example is 🇮🇹 Sirio, for the University of Turin.
So called "hybrid journals" provide both open access and closed access articles. They are 🏢 generally regarded are bad for open access.
Preprints
A Preprint is an article ready to be sent for peer reivew. Such versions of the articles :bookmark_tab: usually differ little with their peer-reviewed counterparts, and are therefore a valid open alternative to reading regular articles.
The coronavirus pandemic required immediate action. Preprints were essential for this, as they provided immediate knowledge to the public.
- :bookmark_tab: Tracking changes between preprints and postprints during the coronavirus highlights how not much changes between preprint and published article during peer review.
Talking points
This section includes resources that discuss the importance of Open Science to a wider audience, including anectodes, examples, stories from researchers, comics, etc. They can be useful to introduce Open Science during talks, presentations and conferences.
- 🍪 📄 Valid reasons not to participate in open science practices
- There are no reasons. This is a joke paper.
- Dr. Glaucomflecken's videos on 🍪 ▶️ Academic Publishing and 🍪 ▶️ How to publish a manuscript
- These videos are a parody of the academic publishing system, highlighting some of its absurdities.
- 💬 Defence against the dark arts: a proposal for a new MSc course
Publishers can be very protective about the published data: it makes them a lot of money. See for instance, the case of Researchgate v publishers, Researchgate bows to publishers and Researchgate announcement on the topic.
Reproducibility Crisis
The reproducibility crisis we are experiencing in many research areas has highlighted the importance of Open Science. This section includes resources that discuss the reproducibility crisis and how Open Science can help alleviate it.
Computer Science toolbox
In the modern day, data is almost always manipulated digitally in some form. Even physical objects might be listed in a digital index, or scanned and digitalized altogether. For this reason, a Data Steward has to have some computer science knowlege and a toolbox of digital hammers and wrenches which are useful when dealing with digital data. This section covers topics such as:
- What digital data is;
- How digital data is encoded, transmitted and shared with others;
- What formats are available to save data in;
- What is metadata and in which formats are available to represent it;
- What data infrastructures are and how to manage them (as potential administrators);
- Technologies to manipulate, reshape, fuse and split data;
- Determination of costs related to data management (e.g. storage and computing power);
- Knowledge of relevant tools that can be used to obtain, reshape, reuse, manipulate and share data throughout a research project.
important
This section is heavily under construction.
Basics of computer science
- Files and filesystems
- Basics of the internet and shared computing
Programming languages
- What are programming languages?
- Python
Data Structures, serialization and storage
- Basic data structures and types
- Serialization and Deserialization
- Compression
AI and Machine Learning
- 🏢 📰 Use of generative AI in research guidelines by the European Research Area Forum
- 💬 Clearbox - AI Apocalypse, what to really worry about, garbage in, garbage out.
- 💬 Revelate - Bad data could ruin your AI dreams
Policy and legal issues
The administration of data, especially personal data, may be subject (or should be subjected) to laws. This section aims to aggregate such concepts and make a data steward both aware of them and capable of dealing with them. It covers topics such as:
- National and International privacy laws regarding personal data;
- Legal issues when reusing other’s code and data;
- Ethical concerns of releasing, reusing and otherwise manipulating data;
- Determining the ethical and legal risks related to handling specific types of data;
- How to give recognition when reusing a piece of data produced by others;
- Creation of effective Open Science policies and plans of action for groups and organizations;
- Fulfilling Open Science/Data Stewardship requirements for funding bodies that require them (i.e. DMPs);
- The soft skills required for effective management and administration of an organization interested in implementing data stewardship practices;
Intellectual Property Rights
- 🏢 🔻 Congressional research service report on generative artificial intelligence and intellectual property rights.
- 💬 🇮🇹 Simone Aliprandi, # Intelligenza artificiale e creazioni “sintetiche”: le intricate questioni di diritto d’autore
- 💬 🔒 🍪 Ben Lorica - The future of creativity
- 🏢 📰 Artificial Intelligence Act by the European Parliament
Stewarding the data lifetime
The most expansive and eterogeneous section, "Stewarding the data lifetime" deals with the philosophical, pratical and technical aspects of data stewardship, from the planning of data collection, to the manipulation of fresh data, to its potential deletion or archival, etc... This section is heavily context-specific: ideas that might apply to data in the context of biological science might not be relevant to Architectural studies, and vice-versa. This section covers many topics, and some examples include:
- How to plan data collection, even at large scales and with many data collection partners;
- Determining when, where and how to store newly created data;
- Defining and measuring data quality for specific data types in specific contexts;
- Designing and implementing data curation procedures, from collection to archival;
- Solving the discard problem and defining methods and formats of long to very-long term preservation of archive data;
- Determining the best methods of reusing published data to limit useless expenditures, with particular regards to ascertaining data quality and usefulness for the purpose.
Contributors to the Data Stewardship Knowledgebase
Meaningful contributors to the project will be listed here.
List of maintainers
This is a list of currently active maintainers for the Data Stewardship Knowledgebase, in no particular order. They are responsible for reviewing and merging pull requests, as well as generally maintaining the repository and administering the public spaces of the project:
- MrHedmad - E-mail
luca.visentin
(at)unito.it
, Discord @MrHedmad.
All contributors
This is a list of all contributors to the project. Thanks to all these amazing people!