Details
- Supervisors
- Faculty
- Degree label
- Abstract
- The understanding of any large and complex system is bound to the unravelling of its network of relations. While some networks are readily uncovered others necessitate the processing of probabilistic data. The study of relationships through (probabilistic) graphical models has gained a lot of attraction due to its intuitive way to model structure dependencies. The complexity of the task is greatly increased when multiple classes of graphs must be estimated and compared. These groups of networks might be globally similar but also differ to some extent. To add challenge to the complexity, large networks must often be estimated using high-dimensional data where the number of parameters p to estimate is much larger than the sample size n. Robust statistical methodologies must be developed to address these many challenges. In the multivariate Gaussian framework, the precision matrix bears the useful property of encoding the conditional independence structure, i.e. off-diagonal zeros exclude a direct relation between two features of the networks. In high-dimensional settings, the maximum likelihood estimate of a covariance matrix can not be inverted to produce an estimate of the concentration matrix. Many penalized log-likelihood methods have been proposed in the literature in order to address those issues. However, none have fully addressed the outstanding issue: either the optimization problem was not convex or the emphasis was put on forcing the similarity between classes. An interesting approach proposed in the literature was the use, in the optimization problem, of one penalty to control the sparsity pattern and a second one to force similarity. We took advantage of that idea in this master thesis. We introduced a new family of penalties and tested the performances of 4 of its members, of which one was published under the acronym GGL in Danaher et al. (2014). We studied the performances of our proposals on synthetic data simulated from underlying scale-free networks which mimic biological networks. Our simulation results showed our Proposals 1 and 3 were comparable to the GGL penalty in terms of accuracy, (differential) edges detection sensitivity and specificity. We confirmed that the GGL penalty also contributed to the sparsity while our proposals did not encourage it. Proposal 2 was less reliable in terms of algorithmic convergence and showed worse performances. The regularization methods were also tested on a real-data set of microarray gene expression generated from lung epithelial cells obtained from 90 healthy patients and 97 cancer stricken patients. The results confirmed the similarities between Proposal 1 and 3 as most of their visualized sub-networks were identical. Despite, many differences in terms of network pattern, the vast majority of the edges modulated between classes were common to the GGL penalty, Proposals 1 and 3.