Is there any "goodness of fit" statistics that I can use to decide which discretization is best for my model? That is, if I use one discretization algorithm and fit my model, and then use a different discretization algorithm and fit my model. Is there a statistic that will tell me which model fits better? If I can choose between cutting into 3 categories or cutting into 5 categories, how should I choose (for example)?
Quote 0 0
Gary M
And to add to this question - how sensitive is the modeling to discretization algorithms?  Realize this is a case by case answer but are their any "rules of thumb" on this sensitivity.
Is it a good practice to use one discretization technique then choose optimum model then repeat process with another to test this sensitivity/robustness?
Quote 0 0
There are several articles in the knowledge database that can recommend an appropriate discretization method.  You should probably start there.  One question I have is that you have an option in structural coefficient analysis to re-discretize nodes.  Should you elect to change the structural coefficient and re-learn the model, how is this re-discretization  applied.  An example, suppose that the structural coefficient analysis determines that 5 levels are needed versus 7 for a certain SC (structural coefficient).  How do you know that and how can you apply that knowledge when re-learning the model at that SC?  Or, that's not what really happens when this option is applied during SC analysis?
Quote 0 0
The discretization defines the perception of the "AI agent". The number of bins and the discretization methods have therefore a huge impact on the machine learned model. 

As per the choice of the #bins, it should be driven by the amount of data that is available for learning. The #bins has indeed a direct impact on the size of the conditional probability tables (CPT). Lets suppose you have a node with 2 parents, each of these nodes being represented with 5 bins, you have a CPT made of 5 x 5 x 5 cells. An optimistic heuristic consists in trying to get 5 samples per CPT cell. If you envision a network with at max 2 parents per node, you then need to have at least 625 observations in your data set. This is obviously optimistic because this is assuming that the data is uniformly distributed across the CPT, which cannot be the case (otherwise there is no relationship). However, this heuristic usually allows retrieving the skeleton of the network (i.e. the structure without taking into account the arcs' orientation).

Another way to choose the #bins is to set an initial #, learn the model, and analyze the connectivity of the network. When you have lots of orphans, this usually indicates that you need to decrease the #bins (you can also decrease the structural coefficient, see below). If your network is too connected, you can increase the #bins (or increase the structural coefficient). The discretization can be changed via Learning | Discretization, without having to reload the data. 

As per the discretization method, it depends on the objective of the model. For unsupervised learning, you can use the univariate methods (R2-GenOpt being the best choice for minimizing the quantization error), for supervised learning, it's usually better to use the bivariate methods, for finding the thresholds that allow characterizing the target node (e.g. Tree or Perturbed Tree).

The only way to fairly compare the quality of your choices for unsupervised learning is to use Analysis | Network Performance | Multi-Target. This will give you a broad set of metrics for evaluating which model performs better. For supervised learning, you can use Analysis | Network Performance | Target. It is recommended to use a learning and test sets, or even better, Cross-Validation. 

As per the Structural Coefficient, it has an impact on the complexity of the learned model. It is indeed a way to increase/decrease the weight of your data set (N' = N/SC). Note that this structural coefficient is also used for learning the trees in the bi-variate discretization. If you are not using this type of discretization, it has no impact on the discretization results.

Hope this helps
Quote 0 0