Skip to main content
Fig. 1 | Respiratory Research

Fig. 1

From: The use of exhaled air analysis in discriminating interstitial lung diseases: a pilot study

Fig. 1

Conceptual flowchart presenting the approach used for statistical analysis. In step 1, a database is build with all clinical data and the preprocessed VOCs data contain three main groups: IPF (n =53), CTD-ILD (n=51) and healthy controls (n=51). In step 2, the machine learning method Random Forests (RF) was used to find discriminatory VOCs. For that purpose three different discriminatory RF models were built. Each discriminatory RF model was constructed on a training set (containing 80% of samples of each group) and validated using an independent test set (containing 20% of samples of each group). Training and test sets were selected using Duplex method (27). First RF algorithm was applied on VOCs data containing IPF and controls to find compounds linked to IPF. The second classification model was constructed on chromatograms belonging to CTD-ILD and healthy controls to allow selecting of VOCs related solely to CTD-ILD. The third RF algorithm was applied on data encompassing breath samples of IPF and CTD-ILD with the purpose to find VOCs differentially profiled between these two pulmonary pathologies. To demonstrate the performance of each RF analysis the receiver operating characteristic curve (ROC) is used and sensitivities and specificities determined. In step 3, the compounds selected as significant in step 2 are combined. In step 4, the final RF model is constructed using chromatograms belonging to IPF, CTD-ILD and heathy controls. In order to demonstrate the differences between the three groups Principal Component Analysis (PCA) is performed on proximities obtained from the final RF model (step 5) with the purpose to visualize the relation between all breath samples

Back to article page