May 8, 2008

Patient Matching, The Second Step

Posted in OpenMRS, Summer of Code tagged , , at 5:54 am by nribeka

I had another discussion last Tuesday with Shaun Grannis and James Egg and we think the discussion went really well. This time I didn’t ask too much silly questions hehe … We clarify some more on what we want to do with my first project. There are couple of issues that we focus on, such as how to to propagate the u values from the random sampling result to the EM analysis process.

After doing some digging, I found out that the u value is saved in the MatchingConfigRow object  in the non-agreement property. At the end of the random sampling calculation, this non-agreement  property will be assigned with the result of the calculation. Now we already have the u values from the random sampling process. But how do we propagate this u value to the EM analysis process. Dig some more then …

Well, apparently the EM analysis also take MatchingConfig object as the parameter which contains all above MatchingConfigRow. So, now we need to tell the EM analysis process to use this value when the user want to pick to use random sampling. We need to put a switch then to let know the EM analysis which value to be used, some default value or the values from the random sampling process.

Another thing that we discuss in the phone was connecting this process to the Record Linker GUI. Arghhh, I’m not good at GUI programming. I just don’t have the sense of arts to create a good GUI. But, I have to give it a shot hehe …

Some term explanation:

  • Record Linker is the name of the program that I will work on. One of the capability of the program is to combine records from different sources using statistical analysis on those records.
  • MatchingConfig is an object that will store the parameter that will be used for analyzing those records. There are lots of parameters that need to be define, for example where to get the records, what fields can be found in the records etc
  • MatchingConfigRow is an object that will store the options to match each column in the records. These parameters for example, the algorithm that will be used for the matching process. MatchingConfig object contains series of MatchingConfigRow denoting that a single records will contains many columns in it.
  • The random sampling and EM analyzing process will take this MatchingConfig object as their process parameters. This MatchingConfig will be shared by the two process to propagate the result from random sampling to EM analyzing.

Some fact that I learn:

  • When the records are coming from file, there are a few step that need to be done before the file can be analyzed. The file are chopped to only include fields that will be used in the analysis process. After the file is chopped, the file is sorted using the operating system built-in sort function on the blocking fields.
  • Let’s keep some fact for the upcoming posts hehe …

Any question? I hope I didn’t miss anything …

Leave a comment