![]() |
|
![]() |
||
![]() |
Home News Features Why BSI? Support Pricing Download | ![]() |
||
|
IMS can assist you in converting legacy data into BSI. The conversion process includes the analysis of existing data, development of a data conversion plan, and executing that plan to load the data into a new or existing BSI database. The amount of effort that this requires is directly related to the structure and integrity of the data to be converted. Below are two examples of data conversions that IMS has performed. Case study 1 - Highly Structured Validated DataProblemIMS was asked to convert the BRINC database for the DAIDS repository. The BRINC data was stored in a highly structured SQL database with tight controls on the data. This included data normalization, default values, referential integrity links, and data rules. In addition, the user system to interface with the database imposed validity checks on user input. The result was an extremely clean database ready for import. ProcessIMS obtained the data and exported it to text file for analysis. This analysis showed minor data linkage problems that were corrected. A new BSI database structure was created using a combination of standard BSI fields and new fields created from the BRINC data. Programs were written to convert the data into this new ontology. The output of these programs was then imported into the new BSI DAIDS database. ResultThe entire BRINC database was imported into BSI and made available to the customer. The entire process took less than two weeks. Case study 2 - Home Grown Access DatabaseProblemIMS was asked to convert a Microsoft Access database and import it into BSI. The database had a front-end GUI for accessing the data that no one knew how to modify. It had been written by an employee who was no longer with the company, and now the system needed additional fields and capabilities added to it. Rather than patch the system, the customer decided to convert to BSI to gain the features that they needed. ProcessIMS obtained the data and started the analysis. The database design was non-normalized and confusing. There was no referential integrity and most fields were free text. No data validation rules were in place. Many of the fields were free text and contained embedded values. IMS wrote several SAS programs to extract, analyze, and convert the data into BSI structures. Normalized fields and modifiers were created to categorize the data. Embedded data was extracted and placed in separate fields. Edit checks were implemented to ensure future data validity. ResultThe Access database was converted and loaded for the customer. Records that lacked referential integrity were sent to the customer for review. The rest of the database (98%) was converted to BSI in four weeks. The customer now has a clean, searchable system with all the functionality that they required. Case Study 3 - Unstructured Data from Multiple SourcesProblemIMS was asked to convert 25 separate data files for the NHLBI instance of BSI. All of these files consisted of a different structure and content, and were stored in a variety of formats including Microsoft Access, Excel, or text format. Little or no data validity checking was done during the collection process for the majority of these files. As a result, the files had missing, duplicated, and conflicting data in them. ProcessExtensive cleaning of the data was required to prepare it for loading into BSI. Each of the 25 data files had to be analyzed and converted separately. The analysis showed that the data was highly compromised. The problems included empty records, partially filled records, non-identical duplicate records, and data that violated referential integrity constraints. IMS worked with the client and the clinical centers that owned the data to determine the best methods to clean each data set. In some cases, the data set went through multiple revisions as the study investigators provided insights or corrections into the data. As a result each of these individual data files had to be treated as a separate conversion effort. Each original data set was divided into three subsets named "The Good", "The Bad", and "The Ugly". The Good data set was one that was ready for conversion and importation into BSI. The Bad data set was one where the records had some identifying information present but that also contained data errors that prevented it from being loaded into BSI. These records were sent back to the collection centers for additional information. The Ugly data set records did not have enough data to identify the specimen and were discarded. ResultThe loading of this data into the BSI database was staggered according to when the individual data sets could be cleaned and prepared. During this conversion, a large amount of time was spent waiting on individual investigators to respond with corrections or information about their data sets. These delays contributed to a labor intensive process took over 18 months to complete. However, the result was a clean, usable database that combined all of the NHLBI specimen records under one system. |
||||
![]() |
![]() |