ABC easy as 123, what about CDF?


So after one week I have completed my data collection by 102 blinking traces of Rhodamine B on TiO2 single molecules. I have been analyzing the blinking data with a MATLAB using a CPD code. Just as reminder, blinking is the when a molecule jumps between fluorescent and non fluorescent states. This code uses a change point detection logarithm, and determines the number of significantly different intensity states, what the intensity states are and whether or not they are ‘on’ or ‘off’. A state is on/off if it is above/below a threshold, which is also determined by the program. After working up the data, 102 molecules has resulted in 1002 on segments and 246 off segments. Now how do we analyze this?


We are going to use MATLAB codes written by Aaron Clauset to provide us with a histogram and other important values. It is important to understand that right now we are not focusing on the intensity values of on states, but what the time duration is for each state and how many times this time duration is observed. First, the program determines the parameters for a power law function that will best fit the data. Next, a histogram is created, which is a nice visual representation of what is going on, of time duration vs. occurrences and shows the fit line determined in the first step (Fig.1). Lastly, a p-value is determined which signifies how well the function of best fit actually fits a power law. A p-value is necessary because best fit does not it is a power law, the best fit of the data may still be a poor fit compared to an actual power law.  From the p-values we would like to see whether or not data fits a power law and if not, what does it fit?

I am currently trying to understand the mathematics behind the Clauset program. The raw data is recognized by the program as real, discrete values and I am beginning to understand how the parameters for a best fit function are chosen in this case. Hopefully by the next blog I will be able to explain it, but right now I am just beginning to get a grip on it. What would be interesting would be to create a continuous function with the data, such as a cumulative distribution function (CDF). A CDF  is the probability of observing an event less than a specific value. Plugging in a continuous function into the Clauset code might provide a more accurate analysis of the data. However, I have a lot of research ahead of me now in order to find out how to create a CDF and what the mathematics behind finding the parameters of the best fit function and the p-value are.