Data Standards and Integration for Biomedical Research

Ptolemy.V™ is our newly released data integration solution. It makes it easy for researchers to discover data of interest and recombine it into new integrated output data sets, while applying and leveraging data standards – even when data standards were not employed in the original source data.

To make data visible to Ptolemy.V™, it must be imported and registered. During this automated process, Ptolemy.V™ automatically searches for potentially related data elements in its data standards repository. (This is preloaded with many standards such as the caDSR and NINDS.) It also searches the source data sets that have already been imported/registered for additional potentially related data elements. It records the potential relationships in a special database optimized for storing and searching for data relationships. It also stores the name and type of data (classification, date, number, text, etc.) in its meta-data repository along with any additional descriptive data that is provided (such as data descriptions, usage, time stamps, etc).

Next, Ptolemy.V™ imports and stores the source data associated with each data element. For importing, Ptolemy.V™ can access data in your local environment or can connect to cloud storage. Ptolemy.V™ comes with a flexible and scalable raw data repository specially designed and optimized to support columnar data but it can easily utilize your Big Data infrastructure instead (or in addition to its built in raw data repository).

Ptolemy.V™ provides a powerful full-text based search facility that allows a researcher to find data elements based on biomedical concepts. From search results, a researcher can select data elements of interest and review the source data sets from which a selected data element originated and other data elements those data sets contain. It provides a means for a researcher to ‘browse’ the source data associated with a data element, automatically generating key statistics such as a list of unique values that appear in the data, the total number of values, and the number of null values all helping to inform the researcher about the data element and the actual data.

Ptolemy.V™ allows the researcher to select and incorporate one or more data elements in an output data set. Of course, it will also copy the data associated with selected data elements into the output data set. In many cases, the data from each source will need to be converted into a consistent output format. Ptolemy.V™ allows the researcher to create and store a conversion routine that translates the imported source data into the desired output format and it uses these conversions to generate the desired output data set on demand. Data can be converted from any registered data element into any related data element including related standardized data element making it easier for researchers to take advantage of standards. Output data sets can be easily downloaded in a common format that can be imported into tools such as SAS, Excel, or R.

Ptolemy.V™ stores the selected data elements and conversion routines together in a form that can be easily edited and re-executed. This makes it much easier for a researcher to add data elements or regenerate a new version of the integrated data when the source data is updated. Moreover, other users can make modify their own copies of data element selections and conversions to create their own copies of the output data set for download or edit them to create their own variations of the output. Herein lies the power and innovation provided by Ptolemy.V™: the ability to enable data reuse for a whole community of researchers, to easily accommodate changes and additions in the source data, and to easily regenerate integrated output data sets all while maintaining conformance to data standards.