06.05.08
GSoC – Week 1
Wow it’s been a while since I post on this blog hehe … Today just a recap of what I did last week. So here it is:
Last Week
What we did in the patient matching project team is mainly discussing about the possibilities of further step that we will take for the second phase of project. Shaun proposed the idea of OpenMRS patients de-duplication module which will beneficial for the OpenMRS (and me because I will have the chance to learn more about the anatomy of the OpenMRS module). Initially there are two options proposed by Shaun Grannis to acomplished the de-duplication process as we can see in the patient matching project wiki page. But in the end, we decided to improve current patient matching module by adding the new de-duplication feature.
So, as the consequences of this decision, I will have to learn about “adding web pages” to the current patient matching module. Originally the patient matching module only work from behind the screen, intercepting method call and wrap it up with AOP. Because de-duplication process will be user trigered process we need to put a page to put “something” that will trigger the de-duplication process.
Next Week
First thing that I must do is reading. Yes, it’s reading time again. This time I need to read the module documentation to find out how to add those webpages. Other devs, Keelhaul, point me out to this extension point page. Both, Keelhaul and bmckown said that it will adding webpages is pretty straightforward (for them hehehe ….). While bwolfe told me to get one of the module from the repos and study it hehehe … Seems to lots of reading documentation and reading codes this week.
PS: bwolfe, bmckown and Keelhaul is the irc nick name of OpenMRS developers. You can find them (and me) at #openmrs inside irc.freenode.net
05.08.08
Patient Matching, The Second Step
I had another discussion last Tuesday with Shaun Grannis and James Egg and we think the discussion went really well. This time I didn’t ask too much silly questions hehe … We clarify some more on what we want to do with my first project. There are couple of issues that we focus on, such as how to to propagate the u values from the random sampling result to the EM analysis process.
After doing some digging, I found out that the u value is saved in the MatchingConfigRow object in the non-agreement property. At the end of the random sampling calculation, this non-agreement property will be assigned with the result of the calculation. Now we already have the u values from the random sampling process. But how do we propagate this u value to the EM analysis process. Dig some more then …
Well, apparently the EM analysis also take MatchingConfig object as the parameter which contains all above MatchingConfigRow. So, now we need to tell the EM analysis process to use this value when the user want to pick to use random sampling. We need to put a switch then to let know the EM analysis which value to be used, some default value or the values from the random sampling process.
Another thing that we discuss in the phone was connecting this process to the Record Linker GUI. Arghhh, I’m not good at GUI programming. I just don’t have the sense of arts to create a good GUI. But, I have to give it a shot hehe …
Some term explanation:
- Record Linker is the name of the program that I will work on. One of the capability of the program is to combine records from different sources using statistical analysis on those records.
MatchingConfigis an object that will store the parameter that will be used for analyzing those records. There are lots of parameters that need to be define, for example where to get the records, what fields can be found in the records etcMatchingConfigRowis an object that will store the options to match each column in the records. These parameters for example, the algorithm that will be used for the matching process.MatchingConfigobject contains series ofMatchingConfigRowdenoting that a single records will contains many columns in it.- The random sampling and EM analyzing process will take this
MatchingConfigobject as their process parameters. ThisMatchingConfigwill be shared by the two process to propagate the result from random sampling to EM analyzing.
Some fact that I learn:
- When the records are coming from file, there are a few step that need to be done before the file can be analyzed. The file are chopped to only include fields that will be used in the analysis process. After the file is chopped, the file is sorted using the operating system built-in sort function on the blocking fields.
- Let’s keep some fact for the upcoming posts hehe …
Any question? I hope I didn’t miss anything …