跳到主要内容
如何让科学更可靠:管理数据的技巧

你正在听的是 健康图书馆:

如何让科学更可靠:管理数据的技巧

2016年7月1日

Nearly anyone who has worked in research is familiar with the frustrating scenario: a postdoc leaves for another job, 他带走了各种有价值的知识. It’s become loud and clear that results from many published scientific studies are unreliable. 而欺诈等违反道德的行为显然助长了这一问题, 一个看起来更温和、更常见的问题也是如此, 数据管理不善. Darell Schmick, research librarian at the Eccles Health Sciences Library at the University of Utah, 描述由于数据管理不善而可能发生的场景, 以及克服它们的方法. 了解更多 在即将召开的研究可重复性会议上.

事件记录

面试官: What's becoming loud and clear is that most scientific studies are, well, unreliable. 但科学家们可以得到大发娱乐. 大发娱乐将在接下来的《大发娱乐提供》节目中讨论这个问题.

播音员: Examining the latest research, and telling you about the latest breakthroughs. 《大发娱乐》将在范围频道播出.

面试官: 我在和Darell Schmick谈话, research librarian at the Eccles Health Sciences Library at the University of Utah. So there are a number of reasons for what some people are calling a research reproducibility crisis, 包括欺诈. But even scientists with the best intentions are at risk for doing sloppy work and there are a lot of reasons for that as well. 你感兴趣的事情之一是数据管理. 我很喜欢这个博士后离开实验室的例子.

Schmick: So it was it an age-old issue where you do a lot of work and it happens to be on your personal computer. It happens to be in a folder with poorly-named files and you produce a bunch of research on behalf of the institution, 但后来, 一旦完成, 你显然把电脑带在身边了, right, 然后转到下一个位置. 然而,这个看似无伤大雅的问题其实有很多含义. 你代表大学产生的数据有所有权问题. 是博士后吗? 是大学的吗?

面试官: 这怎么会导致研究可重复性的问题呢?

Schmick: So if the Postdoc takes all that data with him or her and hasn't been saved into the department bio or anything like that, how can you ensure that you have records of all the work that Postdoc has done? They could have taken just a little bit of it, they could have taken a substantial chunk of it. And it really leaves the PI as well as the rest of the members of the lab with potentially a significant disadvantage.

面试官: 所以有很多知识可能会丢失?

Schmick: 绝对.

面试官: 那么有哪些方法可以避免这种情况呢?

Schmick: We do teach a research administration training class that talks about just the basic fundamentals of just good data management, which involves things like where to properly back up your files and how often to do that. Myself and a couple of the librarians on campus have been working on a pilot for electronic lab notebook technology.

So if, 比如说, you happen to have the perennial issue of lab members recording data in their own personal computers because it's inconvenient to share it, this sort of technology allows for a lab to share in a collaborative notebook technology something that has all those questions that we're talking about answered, 比如备份的频率? 它会被备份在一个值得信赖的来源吗?

面试官: 正确的. Well, and not to mention that most lab notebooks that I've seen are kind of a disaster.

Schmick: 你是说不是所有的科学家都有惊人的笔迹吗?

面试官: 完全. 没错,大发娱乐都是人. 甚至记录信息的方式也因人而异. 所以大发娱乐的想法是,这将变得更加标准化?

Schmick: 你说到点子上了.

面试官: 使其数据更易于访问或更可靠的任何其他方法?

Schmick: When we're talking about optimizing the mechanics of anything in the research process, we want to ensure that we're doing it in a way that is not only accessible by us because we can look at our notes, 大概, 并且能够理解大发娱乐在说什么, 明白大发娱乐在录什么, 理解大发娱乐在这个过程中遇到了什么. But to think about how the results that you're producing are going to be read by somebody else that's not in that same context. 如果你在做一个实验, 你这么做是为了你自己, but you also ensure that you're doing it in a way that if somebody wants to reproduce that experiment, 他们可以这么做.

面试官: 你谈到了在实验室小组中保存信息的方法, 例如, 一个研究小组. 与科学界更广泛地分享信息怎么样?

Schmick: 问得好,朱莉. And a lot of people think that all you are able to really produce is that end product, 那篇定稿的文章. 大发娱乐没有意识到, 很多次, 当大发娱乐做实验的时候, 当大发娱乐产生这些数据的时候, 这些数据可能是有用的数据. It could be good information and good intel for another scientist that's stumbling across that same issue.

If you're embarking on answering a research question and you come across a dataset that has already sought to ask that question, you find out that maybe those results weren't satisfactory enough to produce something into a finished article, 这可能会为你节省几年的时间,否则你就不用重新发明轮子了.

So another thing that we like to talk about is the idea of ensuring that researchers know that the data that they produce is of value and there are places that you can store that. 我想到的一个例子是figshare. And figshare is a repository that you can actually assign a DOI to the data sets that you're uploading on there. Figshare are all about open science so they say as long as you're making it public, 我的意思是, 你可以免费上传.

面试官: So how can sharing data with the scientific community help with research reproducibility?

Schmick: There's a lot of news as of late in the way of that openness toward science where folks on a peer review panel want to see the steps you were able to take in order to draw the conclusions that you were able to take or able to make. And if they're able to go ahead and see that data right there from the start, 它回答了所有这些问题.

It's when things like that data being withheld presents a larger problem not only for you as author but greater implications for science in general. 当大发娱乐开始隐瞒这些数据时, when we start to conceal certain steps in that recipe toward what we ended up with the final product, it leads towards a slippery slope of was it not open science and that closed scholarly environment, 我认为, 某件事值得大发娱乐为之奋斗吗.

面试官: 人们可以去哪里了解更多关于最佳数据管理实践的信息?

Schmick: 有很多地方. If you're embarking on a data management plan there's a great tool called DMPTool. 这是dmptool.org, 它会带你走过这个过程的每一步, “让大发娱乐带您了解数据管理流程的各个步骤." By asking you in a 20 questions format, "What's this data going to be for? 哪些机构会看到这些数据?最后它会给你推荐. 如果你在犹他大学的校园里, I'd encourage you to talk to me or one of our other fine staff at Eccles Health Sciences Library. 我很乐意回答有关数据管理计划的任何问题.

播音员: 有趣,内容丰富,而且都是为了更好的健康. 大发娱乐是Scope健康科学广播.