Documentation

Data format

The algorithm needs as an input one CSV matrix with gene expression/methylation/any other numerical data and one CSV file with a network.

Numerical data

Numerical data is accepted in the following format:

  • genes as rows
  • patients as columns
  • first column - genes IDs (Entrez IDs)

For instance:

  GSM748056 GSM748059 ... GSM748278 GSM748279 GSM1465989
1454 0.053769 0.117412 ... -0.392363 -1.870838 -1.432554
201931 -0.618279 0.278637 ... 0.803541 -0.514947 2.361925
8761 0.215820 -0.343865 ... 0.700430 0.073281 -0.977656
2703 -0.504701 1.295049 ... 1.861972 0.601808 0.191013
26207 -0.626415 -0.646977 ... 2.331724 2.339122 -0.100924

Our test data can be downloaded here and further used as a reference for the correct format.

Network

We support both retrieving pre-built networks from NDex as well as uploading custom networks. Custom networks should be defined in a CSV file with two columns representing the interacting genes. The files must not have a header.

For instance:

6416 2318
6416 5371
6416 351
6416 409
6416 5932
6416 1956

We also provide an example of a PPI here.

Metadata

Add clinical data/survival data to enable further analysis. If you do not want to upload additional metadata, you can go to the next step.

Data processing

For internal calculations, BiCoN normalizes the data by applying log2 transformation and then applying z-scores normalization. If your data was already log2 scaled, please uncheck "Log2 transform".

Algorithm parameters

The main parameters of the algorithm are the sizes of the desired solution. Please, indicate the minimal and maximal number of genes you would like to have in each subnetwork.

The algorithm works such that in most cases default parameters deliver the optimal performance. If you want to set advanced parameters, please check "Use advanced parameters: Yes". You can specify the following parameters:

  • Gene set size - number of genes considered in the analysis. The algorithm usually preselects 2000 the most variant genes to speed-up the calculation. This is usually an optimal choice as genes which can be selected for clustering are supposed to have a high variance. If needed, this parameter can be increased, but we do not recommend to go higher than 5000 to guarantee a reasonable runtime.
  • Maximum number of iterations - we generally do not recommend to terminate the algorithm before convergence (which is usually reached in 30-60 iterations). Hence, we recommend to change this parameter only for testing purposes.
  • Number of ants - more ants can explore a larger search space, but also increase the runtime.
  • Evaporation rate - higher evaporation rate will speed-up convergence, but also increase risks to get stuck in a local optimum.
  • Pheromone significance and Heuristic Information significance - please modify these parameters only if you are well aware of their role in Ant Colony Optimization.

Cite

BiCoN was developed by the Big Data in BioMedicine group and the Computational Systems Medicine group at the Chair of Experimental Bioinformatics.

If you use BiCoN in your research, we kindly ask you to cite the following manuscript:
Olga Lazareva, Stefan Canzar, Kevin Yuan, Jan Baumbach, David B Blumenthal, Paolo Tieri, Tim Kacprowski*, Markus List*, BiCoN: Network-constrained biclustering of patients and omics data, Bioinformatics, 2020;, btaa1076, https://doi.org/10.1093/bioinformatics/btaa1076

** joint last author

Contact

If you want to contact us regarding BiCoN: