Skip to content
2000
Volume 12, Issue 2
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Background: Currently, a major challenge is the treatment and interpretation of actual data. Data sets are often high-dimensional, have small number of observations and are noisy. Furthermore, in recent years, many approaches have been suggested for integrating continuous with categorical/ordinal data, in order to capture the information which is lost in independent studies. Objective: The aim of this paper is to develop a statistical tool for the detection of outliers adapted to any kind of features and to high-dimensional data. Method: Data is an nxp data matrix (n< < p) where the rows correspond to observations, the columns correspond to any kind of features. The new procedure is based on the distances between all the observations and offers a ranking by assigning each observation a value reflecting its degree of outlyingness. It was evaluated by simulation and by using actual data from clinical and genetic studies. Results: The simulation studies showed that the procedure correctly identified the outliers, was robust in front of the masking effect and was useful in the detection of noise. With simulated two-sample microarray data sets, it correctly detected outliers, especially when many genes showed increased expression only for a small number of samples. The method was applied to adult lymphoid malignancies, human liver cancer and autism multiplex families’ data sets obtaining good and valuable results. Conclusion: The actual and simulation studies show the efficiency of the procedure, offering a useful tool in those applications where the detection of outliers or noise is relevant.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/1574893611666160606161031
2017-04-01
2025-10-24
Loading full text...

Full text loading...

/content/journals/cbio/10.2174/1574893611666160606161031
Loading

  • Article Type:
    Research Article
Keyword(s): Biomedical data; data depth; gene expression; microarray; noise; outlier; robust estimation
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test