## Welcome

Welcome to the Molecular Signatures with Biclustering (MoSBi) webtool!

We apply biclustering algorithms on omics data in combination with a novel ensemble approach and find to find molecular signatures in the data. This can yield to new insights of the data, which go further than traditional clustering, where only one dimension is considered.

MoSBi is not only available as a public webservice, but can also be used as an R package, or the webserver can be deployed on your own computer using Docker. To download and run MoSBi locally, look at the

**About**

page.

Please note that your uploaded data will be processed on our servers and automatically be deleted once your session is closed. Note we have only limited capacities on our server. If page freezes occur or your data exceeds the limit, consider running MoSBi locally. Further information are available on the

**About**

page.

## 1. Upload Data

Upload your data in the format as explained on the 'Tutorial' page. You can additionally upload labels and colors for your data. Once data is uploaded, you can proceed to the computation tab.

There are no data size limitations in the version of MoSBi, you are currently running.

Example data from:

Ku et al., Molecular Oncology (2020)### Data summary

#### UMAP visualization

## 2. Compute Biclusters

### 2.1. Algorithms for bicluster computation

Please select one of multiple biclustering algorithms that will be run with your uploaded data. The results of all algorithms will be combined for the ensemble analysis.

The preselected algorithms were selected due to runtime and performance. Other algorithms can result in extensive runtimes. Inexperienced users should not change the selection.

#### 2.2. Optional Parameters for the biclustering algorithms

### Spectral

### Biclustpy

### Bi-Force

### akmbiclust

### 2.3. Run Algorithms

Please press 'Compute Biclusters' before proceeding to the next page.

## 3. Biclustering Results

### 3.1. Distribution of bicluster sizes

Gives an overview of the sizes of biclusters.

### 3.2. Biclusters

Heatmaps of all computed biclusters, sorted by algorithm.

### 3.3. Download biclusters

Biclusters can be downloaded here, or later on page 4.

Download Biclusters (as xlsx table)## 4. Molecular Signatures

### 4.1. Ensemble parameters (Optional)

### 4.2. Calculate Networks

After the bicluster calculations, bicluster networks and communities can be computed here:

#### 4.3. Error model statistics

Visualization of the cut-off estimation with our error model.

### 4.4. Bicluster networks

##### Colored by conditions

##### Colored by algorithm

### 4.5. Molecular signatures

Extract bicluster communities from the network to analyze their signatures in detail.

### 4.6. Download Results

All results can be downloaded here as a zip file.

Download results (as zip file)### (4.7. Recursive Biclustering)

After Communities are calculates, one community can be selected to redo the biclustering its content. This will generate a link to a new MoSBi instance where the data is uploaded automatically. Note that this link will be only accessible for 30 seconds and only for you. During that time, your data will be saved on our server.

Cut-offs are necessary for the extraction of a community. They have to be entered according to the 'Fraction' column in the Community tables above.

## Interactive Molecular Signatures

This page contains interactive versions of the visualizations seen on the molecular signatures page. You may expect increased load times when this page is opened for the first time.

## Tutorial

This tutorial will walk you through the main functionalities of the webtool. For a full analysis all steps 1.-4. need to be completed.

### 1. Data upload

To start the analysis, a dataset must be uploaded. An example dataset can be used by clicking on the button

**Use example data**

or user data can be uploaded. Additionally, labels for the samples in the dataset can be uploaded and custom colors which will be used in the generated plots. However, the latter two are optional and can be added at any time, even after the analysis is done and the plots will be updated automatically with the new colors or conditions.

The data must be uploaded in

**.csv**

format (Comma separated values). Here are examples how the upladed data should look like:

#### Dataset (required)

Names | Sample1 | Sample2 | ... |
---|---|---|---|

Gene1 | 0.12 | 2.35 | ... |

Gene2 | 5.12 | 0.98 | ... |

... | ... | ... | ... |

#### Labels for samples (Optional)

id | condition |
---|---|

Sample1 | Disease1 |

Sample2 | Disease2 |

... | ... |

#### Colors (Optional)

condition | color |
---|---|

Disease1 | green |

Disease2 | red |

... | ... |

Row and column names are required for the dataset. If no labels are uploaded, they can also be generated using k-means clustering. Other options include the possibility to normalize the data and replace missing values. Once the data is uploaded, the size of the dataset is shown with the number of missing values and a UMAP visualization is presented to give an overview of the data and labels.

The data size is limited on our servers. Therefore the maximum number of rows is limited to NA and the number of columns is limited to NA. If your data exceeds that limit please run MoSBi locally. Information for that can be found on the About page.

The data is now ready to continue with the analysis and compute biclusters.

### 2. Compute biclusters

The first step of the analysis is the computation of biclusters, using multiple algorithms. Without prior experience or knowledge of biclustering, is recomended to use the prior selection of algorithms. Of you are only interested in the result of one algorithm, it is also sufficient to select only one algorithm.

The computation can now be started by clicking the

**Compute biclusters**

button. This will execute all algorithms with default parameters. Parameters of all algorithms can be changed in the blue boxes for each parameter individually. A box in the lower right corner indicates the progress of the computation:

Once the computation has finished, an overview about the executed algorithms is shown, indicating if errors occured in some algorithms with a list of all error messages, which allows the you to adapt algorithm parameters and rerun the algorithms.

In the next step, the resulting biclusters can be investigated.

### 3. Biclustering Results

This page gives you an overview about the computed biclusters and the option to download them.

The histogram 3.1. plots the size distribution of all biclusters for rows and columns. Additionally, in 3.2. each bicluster can be selected individually and a heatmap with labels for samples can be inspected. Finally, All biclusters can be downloaded as an xlsx table. Here is an example how a resulting page can look like:

### 4. Molecular Signatures

The final step is the extraction of molecular signatures by creating bicluster networks and community detection.

Additional parameters can be modified, however this is not recommended and should only be done after reading the publication. The networks can then be calculated using the accoding green button. The first two plots then show the estimation of the similarity threshold. For further information, please consult the publication.

The next plots shows the bicluster similarity networks colored by labels for the samples and by biclustering algorithm. Two connected biclusters show a higher than random similarity. On the left, the biclusters are colored by the sample labels that are included in the biclusters. This gives you an overview about the network and the similarities in the results of different algorithms.

Now, communities can be calculated to extract the signatures. A minimum community size can be defined. Afterwards, all communities will be shown in tabs below as networks and tables, where the included samples and features can be investigated. The visualization of the network as a hybrid matrix diagram plots bicusters as heatmatps in the network. This helps to visually inspect the similarity of biclusters.

All results can be downloaded as a zip file afterwards. Additionally, it is possible to extract the content of a community and perform another round of biclustering on this subsetted data. This will temporarily save the date on our server and make it available through the public link that occurs after clicking on the according button. The link will be invalidated after 30 seconds. By clicking again a new link is generated.

The page

**Interactive Molecular Signatures**

provides the same community network plots, but in an interactive form.

## About

Thank you for using MosBi!

### Citation

If you use MoSBi, please cite our publication:

Rose et al. MoSBi: Automated signature mining for molecular stratification and subtyping Proceedings of the National Academy of Sciences, 2022; 119 (16): e2118210119; doi: https://doi.org/10.1073/pnas.2118210119

### Feedback

If you have questions or suggestions how to improve MoSBi, please contact us:

<tim.rose[a.t_)wzw.tum.de>

<josch.pauling[a.t_)wzw.tum.de>

### Code

MoSBi is free software. If you want to run MoSBi locally or have a look at the code here:

### Impressum

Dr. Josch K. Pauling

bidt Junior Research Group LipiTUM

Chair of Experimental Bioinformatics

TUM School of Life Sciences Weihenstephan

Maximus-von-Imhof-Forum 3

85354 Freising

Germany