AldenBlack
Can I put in a request to get a standard deviation function added to the list of "Inference" functions for a function node please? I'm thinking of a function analogous to "MeanValue(v)," except that it returns the standard deviation instead. Or is there a way to do this already?As a somewhat related follow-up question (and maybe this belongs under a different thread), what's the best way to create an effective credible interval for a node (different from a confidence interval)? In other words, I want to know the probability that the mean value or expected value of the node falls between a fixed upper and lower bound, e.g., P(c1 < x < c2), where c1 and c2 are constants, and x is the random variable or node. I've been trying to do this with function nodes (hence my request for standard deviation above), but haven't really been able to get a satisfying solution.Thanks
Quote 0 0
Dan
Adding StdDev(v) in the "Inference Functions" is indeed a good idea. It will be added in the next release. This should help you computing your credible intervals. Thanks
Quote 0 0
AldenBlack
Got the link for the download with the updated version. Thanks for the quick turnaround.
Quote 0 0
AldenBlack
Hi Dan,I've revisited this topic recently. Using the standard deviation lets me calculate a credible region for only very specific cases, such as a completely normal distribution, where it's easy to calculate what percentage of the curve lies between two values. For example, +/- 2 sigma will capture 95% of the area under the curve, so mean - 2*sigma to mean + 2*sigma defines the 95% credible region. Obviously, however, we rarely have a perfect normal distribution, especially after entering evidence into other parts of the network.That being said, I *think* I now know what I need to calculate my desired credible region for any posterior distribution, and I believe it would be rather easy to implement in Bayesialab: in essence, all I need to do is perform a temporary rediscretization of the node. In other words, I need a function that calculates the minimum-width bin that contains X% of the distribution, where X is user specified, e.g., 90%, 95%, or 99%. The function would output the lower and upper bounds of this bin. Alternatively, the reciprocal functionality would be good as well, i.e., stipulating a lower and upper bound, and letting the function output the probability. I know this is identical to simply specifying a discretization in "edit" mode, but if I do, the evidence file I'm working with obviously vanishes, which makes doing this manually largely impractical (not to mention that I may not know where I want that bin to be centered in the distribution a priori, only perhaps its width).Is there any chance of getting this implemented? It appears to me (albeit a naive me), that the required information and structure is already present in Bayesialab.Thanks,Alden
Quote 0 0
Dan
Hi Alden,I like the idea of discretization for getting the confidence interval of any kind of distribution. This is indeed something we can easily do when data is available. However, this would not be possible for conditional distributions. Would that be still useful for you?Dan
Quote 0 0
AldenBlack
When you say "conditional distributions," do you mean posteriori distributions, or just networks without an associated database?In my application, I've learned a network from a big database, and then begin to enter evidence into nodes. I'd just like to be able to track my desired credible interval as I enter evidence. Does that make sense?Alden
Quote 0 0
Dan
Yes, I meant posterior distributions. We can implement a kind a filtering on your data based on the entered evidence, but this will not be equivalent to the posterior distribution you have in the monitors, unless the node on which you enter evidence are directly connected to the node for which you want to get the credible interval.Dan
Quote 0 0
AldenBlack
Hmmm, ok. Sometimes the evidence nodes might be directly connected to the node for which I want a credible interval, and sometimes it won't be. What will the data filtering you mentioned actually provide? Can you walk me through a basic example of how it might work and differ from what's currently implemented in Bayesialab when computing posterior distributions? I don't really care how the "credible interval" information is displayed or calculated, as long as it's theoretically sound and effectively answers the question about the likelihood of a value lying between two values.
Quote 0 0
Dan
If the nodes are not directly connected, the probability distributions resulting from the filtering of your data will not be the same as those obtained with the Junction Tree inference. Filtering is a selection and thus simulates a direct link between the variables. Let suppose you have two unconnected variables in your network, setting evidence in the Junction Tree on one variable will not impact the distribution of the second one, whereas it can change with data filtering.
Quote 0 0
AldenBlack
Ah ok, I think I follow now.So any kind of "credible interval" made with filtering would essentially ignore the network structure, correct? If so, that's still valuable, but not quite as powerful since I really would like to be able get that probability given the assumed structure of the dataset (the learned network).Forgive my naivete for this next question, but why can't the Junction Tree Inference perform the calculation to create the credible interval?Regardless, I'm playing around with Bayesialab to see if I can see a path to a near equivalent functionality. What's your opinion on this sequence of steps:1) Learn my network as normal (I usually ultimately discretize with multi-variate R2 GenOpt).2) Figure out which node I want to compute a credible interval for. Figuring out which node to focus on, for my application, doesn't require any evidence entered in the network, so rediscretizing in the next step won't remove any evidence scenario.3) Manually rediscretize that node with a lot of intervals (~50). The nodes that I want a credible interval for are usually never parents, so this won't create a massive probability table downstream.4) Using this fine discretization, where you could reasonably assume that the probability is uniformly distributed between the lower and upper bound of a bin (i.e., the cdf over a given interval would likely be linear), estimate the bounds on the X% credible region using a basic numerical integration scheme.Performing (4) would simply build off of the existing conditional distribution generated by the junction tree inference, but I don't see a way to code (4) in a function node, even using the StateProb and IndexProb functions.Thoughts?Alden
Quote 0 0
Dan
The JT cannot perform this calculation because it's a network that has been machine learned, i.e. it's not parametric. As far as I know, there is thus no other solution than sampling your associated continuous dataset. As mentioned before, the direct filtering algorithm will ignore the network structure. As per your the 4th step of your workflow, there is indeed no way to directly use the function nodes for that. Your function requires iteration over the states of your node. You need the API for that.
Quote 0 0
AldenBlack
That makes more sense now. Thanks.What is the API? I'm sure it's something I know, but I can't remember it from that acronym...So long story short, it seems like there's no easy way to automate this, correct? The best thing might be just to manually examine the distribution to get an approximation for fourth step it seems.
Quote 0 0
Dan
Our API are Java libraries. We do have 3 API that allows creating models, doing inference, and carrying out some structural learning via java programming. Please see http://library.bayesia.com/display/BlabC/Bayesia+Engine+API for a description of the Modeling and Inference API (the learning one is brand new and is still under development). As per your credible interval problem, you're right, there is unfortunately no easy way to automate this within BayesiaLab.
Quote 0 0
AldenBlack
Got it. Thanks for all the help and patience on this Dan - I really appreciate it.I should be good from here, but if you guys have the time, I think trying to add a credible interval functionality to Bayesialab for a future version would be really helpful for many users.Alden
Quote 0 0