Bioinformatics Core Service

Data that comprises the bioinformatics core service

Data and Resources

Web site containing details about the ...
Web site containing details about the Bioinformatics Core service
Explore
- More information
- Go to resource

Additional Info

Field	Value
Access Contact	bioinformatics@umich.edu
Access mechanism	omitted
Access protocol	All data sets into different project folders that are essentially shared across all personnel in the Core. That's a convention which has some flexibility for more secure needs. Occasionally (very) we need to do some analysis on a local workstation (e.g. a visualization). Project folders are internal to the core, if we need to exchange data with the PI we will set up a separate collaboration space and move data in and out of there in a dedicated fashion. The collaboration space is a shared directory between the core and research project personnel. The collaboration spaces are therefore a little bit more private.
Attribution Citation	N/A
Business processes involved	various research processes
Collection mechanism(s)	Data comes in through the sequencing core or the PI, we analyze and freeze the output, and then keep it for some length of time. Annotations are downloaded on an ad-hoc basis right now; we're looking to get those onto a scheduled basis.
Content Exposure	Up until now, we're not capturing data in a longitudinal database, it is left to the PI's to figure out what to share and what not to share; the PI's wholly determine what to make available as is appropriate; we would not do any of this for them. Down the road, with the possibility of secondary use, we want to figure out how to pose those questions to the PI; how to make the case for secondary use. Could make the case via offering to host the physical infrastructure of the data both for the PI as well as for their NIH (or whatever)-mandated publiation purposes, that would be a win-win.
Data Manager(s)	none at this time
Data Provided By	BRCF
Data Steward(s)	none at this time
Data collection format(s)	almost entirely flat files, 1-2 databases (annotation sources that we import)
Data element definitions	Thinking annotations, axes of experiments, all detail. It would be good to have a Data Dictionary set up for this information. There is metadata around how the machine was set up to run, and all that. There's also whole genome, exome, and panel bases to run experiments.
Data lineage / mapping from other sources	3 feeds: sequencing core (typically DNA Core); whatever sample metadata that we get from the PI; reference data - genome sequence, annotation data - we could get more lineage on the reference data but we generally don't
Data profile
General guidelines for ability to access	omitted
Grain of data collection	some combination of Gene, Locus (smaller), Sample (larger)
Higher-level data models (logical and/or conceptual)	Derived data sets - the DNA sequencing core deals with primary data, we deal with secondary (grooming and cleansing - consistent) and tertiary analyses (sense-making - widely variable)
How is this data used?	We're combining DNA sequence with reference data to identify anomolies and annotate their impact. "DNA" is shorthand for 5 different types of experiments
Initial Creation Date	on a per-service basis
Initial use cases/motivations for data collection	We create derived artifacts that assist researchers in understanding the DNA sequencing information for their biological samples.
Last Modified Date	on a per-service basis
Physical data infrastructure	omitted
Physical data model, reverse-engineered	almost entirely in the file system; data sets themselves are fairly silo'd files, so we can change structure easily but there is no longitudinal analysis
Primary Users / Customers	Research PI's
Regulatory and other classifications	at this point genomic information is not classified as PHI and we try not to capture any PHI
Retention schedule	We have a tentative data lifecycle model that would give guidance to the stakeholders, but at this point it's just an idea. We have the infrastructure to use it; we have the infra to store data for about 1-2 years, but we don't really want to be in that business (of being the long-term archive for the investigators). One of our core ideas is to keep it for a year (?) after we finish the analysis, after which we would reach out to the investigators and tell them that if they want it, to take it now, since after a certain time we will either archive or delete it. Certain investigators require some physical infrastructure, the pipeline that allows data to be accessed & leveraged for some time after we've actually done our work with it. This is not part of our core capability but is important to note nonetheless. Happens ~5% of the time.
Roles / persons involved in collection	BRCF
Schedule for known changes to Access Conditions	none
Standards utilized	none
Storage Format	text binary
Subject Matter Expert(s)	the team of bioinformatic analysts
Time Frame Data Collected For	on a per-service basis
Update Schedule	on a per-service basis