Bioinformatics Core Service
Data and Resources
-
Web site containing details about the ...
Web site containing details about the Bioinformatics Core service
Additional Info
Field | Value |
---|---|
Access Contact | bioinformatics@umich.edu |
Access mechanism | omitted |
Access protocol | All data sets into different project folders that are essentially shared across all personnel in the Core. That's a convention which has some flexibility for more secure needs. Occasionally (very) we need to do some analysis on a local workstation (e.g. a visualization). Project folders are internal to the core, if we need to exchange data with the PI we will set up a separate collaboration space and move data in and out of there in a dedicated fashion. The collaboration space is a shared directory between the core and research project personnel. The collaboration spaces are therefore a little bit more private. |
Attribution Citation | N/A |
Business processes involved | various research processes |
Collection mechanism(s) | Data comes in through the sequencing core or the PI, we analyze and freeze the output, and then keep it for some length of time. Annotations are downloaded on an ad-hoc basis right now; we're looking to get those onto a scheduled basis. |
Content Exposure | Up until now, we're not capturing data in a longitudinal database, it is left to the PI's to figure out what to share and what not to share; the PI's wholly determine what to make available as is appropriate; we would not do any of this for them. Down the road, with the possibility of secondary use, we want to figure out how to pose those questions to the PI; how to make the case for secondary use. Could make the case via offering to host the physical infrastructure of the data both for the PI as well as for their NIH (or whatever)-mandated publiation purposes, that would be a win-win. |
Data Manager(s) | none at this time |
Data Provided By | BRCF |
Data Steward(s) | none at this time |
Data collection format(s) | almost entirely flat files, 1-2 databases (annotation sources that we import) |
Data element definitions | Thinking annotations, axes of experiments, all detail. It would be good to have a Data Dictionary set up for this information. There is metadata around how the machine was set up to run, and all that. There's also whole genome, exome, and panel bases to run experiments. |
Data lineage / mapping from other sources | 3 feeds: sequencing core (typically DNA Core); whatever sample metadata that we get from the PI; reference data - genome sequence, annotation data - we could get more lineage on the reference data but we generally don't |
Data profile | |
General guidelines for ability to access | omitted |
Grain of data collection | some combination of Gene, Locus (smaller), Sample (larger) |
Higher-level data models (logical and/or conceptual) | Derived data sets - the DNA sequencing core deals with primary data, we deal with secondary (grooming and cleansing - consistent) and tertiary analyses (sense-making - widely variable) |
How is this data used? | We're combining DNA sequence with reference data to identify anomolies and annotate their impact. "DNA" is shorthand for 5 different types of experiments |
Initial Creation Date | on a per-service basis |
Initial use cases/motivations for data collection | We create derived artifacts that assist researchers in understanding the DNA sequencing information for their biological samples. |
Last Modified Date | on a per-service basis |
Physical data infrastructure | omitted |
Physical data model, reverse-engineered | almost entirely in the file system; data sets themselves are fairly silo'd files, so we can change structure easily but there is no longitudinal analysis |
Primary Users / Customers | Research PI's |
Regulatory and other classifications | at this point genomic information is not classified as PHI and we try not to capture any PHI |
Retention schedule | We have a tentative data lifecycle model that would give guidance to the stakeholders, but at this point it's just an idea. We have the infrastructure to use it; we have the infra to store data for about 1-2 years, but we don't really want to be in that business (of being the long-term archive for the investigators). One of our core ideas is to keep it for a year (?) after we finish the analysis, after which we would reach out to the investigators and tell them that if they want it, to take it now, since after a certain time we will either archive or delete it. Certain investigators require some physical infrastructure, the pipeline that allows data to be accessed & leveraged for some time after we've actually done our work with it. This is not part of our core capability but is important to note nonetheless. Happens ~5% of the time. |
Roles / persons involved in collection | BRCF |
Schedule for known changes to Access Conditions | none |
Standards utilized | none |
Storage Format | text binary |
Subject Matter Expert(s) | the team of bioinformatic analysts |
Time Frame Data Collected For | on a per-service basis |
Update Schedule | on a per-service basis |