04.30.08

Patient Matching, The First Step

Posted in OpenMRS, Summer of Code tagged , , at 6:50 am by nribeka

My first phone discussion about my project with my mentor, Shaun Grannis and James Egg, went well. Shaun and James explain to me about the project in details and I think the project is really interesting. I made a couple of stupid questions that is not related to the project though, sorry for that Shaun and James hehe …

My first project is to implement a fully functional random sample analyzer that calculates the rate of random agreement among corresponding pairs of records between two data sources. This rate value will replace the u rate, field agreement rate among pairs that are truly non-matched, that come from the Expectation Maximization analyzer. To get a better overview about linkage process and rationale behind the process you should read this publication about record linkage. If you want to know more about the Expectation Maximization algorithm you can read the wiki or some other journals and publication.

The process for generating u value for each column are as follows:

  • Generate two arrays of Record with the desired size of maximum sampling size
  • Take one Record from each array at a time and do the following:
    • For each demographic data in the Record, match their value using selected String matching algorithm (Jaro-Winkler, Levenshtein, Longest Common Substring or Exact Match)
    • If the value from both Record match each other, then increment match rate of current demographic data.
  • Do over above process until all record have been paired and examined
  • Calculate the u value for each demographic data and set the new u value to the MatchConfig object.

I still need to dig more about the first process and see how each datasource is read and converted into Record object. What do you think about the above process? Did I miss anything?

4 Comments »

  1. beh said,

    May 6, 2008 at 3:06 am

    blogna kueren…..

  2. imsuryawan said,

    May 6, 2008 at 9:46 am

    what a complex algorithm..

  3. Patient Matching, The Second Step « Stop Bitching, Start Coding said,

    May 8, 2008 at 5:54 am

    [...] ask too much silly questions hehe … We clarify some more on what we want to do with my first project. There are couple of issues that we focus on, such as how to to propagate the u values from the [...]

  4. nribeka said,

    May 8, 2008 at 6:15 am

    #bli beh: terima kasih

    #bli sur: ga terlalu kompleks kok sebenarne hehehe …

Leave a Comment