Server-less bioinformatics web-server

Python
web-server
PyScript
Author

Manish Datt

Published

May 4, 2023

packages = ['matplotlib', 'numpy'] from js import console def addNums(*args, **kwargs): console.log(f'args: {args}') console.log(f'kwargs: {kwargs}') n1 = Element('num1').element.value n2 = Element('num2').element.value n3 = int(n1)+int(n2) console.log(n3) Element('result').element.value = n3 def AAC(*args, **kwargs): from collections import Counter s1 = Element('prot_seq1').element.value s1 = s1.upper() counts = Counter(s1) arr1 = [] for k1,v1 in counts.items(): arr1.append(k1+' '+str(v1)) console.log(arr1) arr1.sort() Element('aac_out').element.value = '\n'.join(arr1) return arr1 def plot_AAC(*args, **kwargs): import matplotlib.pyplot as plt arr_temp = AAC() fig, ax = plt.subplots(figsize=(5,3)) aa = [] aa_count = [] for i in arr_temp: aa.append(i.split(' ')[0]) aa_count.append(i.split(' ')[1]) aa_count = [int(x) for x in aa_count] aa_count = [x/sum(aa_count)*100 for x in aa_count] ax.bar(aa,aa_count) ax.set_xlabel('Amino acid') ax.set_ylabel('Composition (%)') plt.tight_layout() display(fig, target='graph_area', append=False) def calcGC(*args, **kwargs): import numpy as np s1 = Element('dna_seq1').element.value GC_count = 0 for i in s1.upper(): if (i=='C' or i=='G'): GC_count += 1 GC_content = GC_count/len(s1)*100 Element('out1').element.value = np.round(GC_content,2)

PyScript in action.

Bioinformatics research relies heavily on web-servers i.e. websites that take some input, perform specific calculations using that input, and return an output. Various research groups across the world have developed a number of web-servers related to different areas of bioinformatics. In fact, there are some well known catalogues for such online resources e.g., EBI, CRDD, etc. These web-servers offers biology researchers a convenient way to use bioinformatics applications without installing any software. These web-servers are generally build on a LMAP (Linux, MySQL, Apache, and Perl) or a similar technology stack. To setup such a stack requires certain level of familiarity with these technologies and has a learning curve associated with it. In addition, to have these web-servers online, there is a need for a hosting platform which has a cost factor associated with it. This kind of technology stack was essential because there was no way to execute perl/python/R - languages preferred by bioinformatics researchers - code within the browser. PyScript changes that. The new offering from Anaconda Inc makes it possible to run python code from within the browser. This new development opens a whole lot of possibilities in terms of web-server and bioinformatics application development. Now, the python code can be executed at user-side without the need to install python!

The tutorial below will show PyScript in action.

The basics

To use PyScript in an HTML file we need to include the pyscript js and css files. These can be save locally or can be included via online repositories as follows:

<script defer src='https://pyscript.net/latest/pyscript.js'></script>
<link rel='stylesheet' href='https://pyscript.net/latest/pyscript.css'/>

The python code would be enclosed with the py-script tag.

GC content of a DNA sequence

We’ll first make a simple program to calculate GC content for a DNA sequence. For this, we need a text box to get the DNA sequence. A button would be there that would trigger the python code having the function for GC content calculation. Below is the pyscript

We’ll first make a simple program to calculate GC content for a DNA sequence. For this, we need a text box to get the DNA sequence. A button would be there that would trigger the python code having the function for GC content calculation. Below is the python function to perform the required calculation.

<py-config> packages = ['numpy'] </py-config>

<py-script>
    def calcGC(*args, **kwargs):
      import numpy as np
      s1 = Element('dna_seq1').element.value
      GC_count = 0
      for i in s1.upper():
        if (i=='C' or i=='G'):
          GC_count += 1
      GC_content = GC_count/len(s1)*100
      Element('out1').element.value = np.round(GC_content,2)
</py-script>

Important points about the code above:

  • Since we’ll be using the numpy package (round function) so we need to add the optional py-config tag and mention the additional package that needs to be included.
  • The dna_seq1 variable stores the sequence entered by the user in an input box.
<textarea type="text" id="dna_seq1" cols="30" rows="2"></textarea>
  • The out1 refers to a textarea element in the html file in which the GC content value would be displayed.
<textarea name="ta1" id="out1" cols="5" rows="1"></textarea>

Now, all we need is a button that on click would call the calcGC function. Let’s add a submit button and the py-click attribute should be set to calcGC().

<button id="GC" type="submit" py-click="calcGC()">Calculate GC content</button>
Full HTML file
<html>
<head>
  <script defer src='https://pyscript.net/latest/pyscript.js'></script>
  <link rel='stylesheet' href='https://pyscript.net/latest/pyscript.css' />
  <py-config> packages = ['numpy'] </py-config>
  <py-script>
    def calcGC(*args, **kwargs):
      import numpy as np
      s1 = Element('dna_seq1').element.value
      GC_count = 0
      for i in s1.upper():
        if (i=='C' or i=='G'):
          GC_count += 1
      GC_content = GC_count/len(s1)*100
      Element('out1').element.value = np.round(GC_content,2)
  </py-script>
</head>

<body>
    Enter a DNA sequence
    <textarea type="text" id="dna_seq1" cols="30" rows="2"></textarea>
    <br>
    <button id="GC" type="submit" py-click="calcGC()">Calculate GC content</button>
    <textarea name="ta1" id="out1" cols="5" rows="1"></textarea>
</body>
<html>

Below is our web server that takes DNA sequence as input and returns it’s GC content.

Enter a DNA sequence

Plotting the results

Let’s up the ante and add an option to graphically visualize the results. For this we’ll get protein sequence as input, then calculate it’s amino acid composition, and finally make a graph to display the results. Amino acid composition refers to percentage for each amino acid in a primary sequence of a protein.

Enter a protein sequence

Clicking the button below would generate a bar plot with amino acids on the X-axis and percentage on the Y-axis. Note, the Matplotlib pacakage is required for ploting the graph.

Full HTML file
<html>
<head>
<script defer src='https://pyscript.net/latest/pyscript.js'></script>
<link rel='stylesheet' href='https://pyscript.net/latest/pyscript.css' />
<py-config> packages = ['matplotlib'] </py-config>
<py-script>
def AAC(*args, **kwargs):
    from collections import Counter
    s1 = Element('prot_seq1').element.value
    s1 = s1.upper()
    counts = Counter(s1)
    arr1 = []
    for k1,v1 in counts.items():
        arr1.append(k1+' '+str(v1))
        console.log(arr1)
        arr1.sort()
        Element('aac_out').element.value = '\n'.join(arr1)
    return arr1
def plot_AAC(*args, **kwargs):
    import matplotlib.pyplot as plt
    arr_temp = AAC()
    fig, ax = plt.subplots(figsize=(5,3))
    aa = []
    aa_count = []
    for i in arr_temp:
        aa.append(i.split(' ')[0])
        aa_count.append(i.split(' ')[1])
        aa_count = [int(x) for x in aa_count]
        aa_count = [x/sum(aa_count)*100 for x in aa_count]
    ax.bar(aa,aa_count)
    ax.set_xlabel('Amino acid')
    ax.set_ylabel('Composition (%)')
    plt.tight_layout()
    display(fig, target='graph_area', append=False)  
</py-script>
</head>

<body>
    Enter a Protein sequence
    <textarea type="text" id="prot_seq1" cols="30" rows="2"></textarea>
    <br>
    <button id="AAcomp" type="submit" py-click="AAC()">Calculate AA composition</button>
    <textarea name="aac" id="aac_out" cols="10" rows="4"></textarea>
</body>
<html>

Traditionally, once a bioinformatics researcher has written a python code to do some analysis, to ensure it reaches the larger community, they were expected to launch a web-server or make a GUI application. That’s because biologists, in general, don’t have python installed on their computer to directly execute python script. As you can see, PyScript opens a whole lot of possibilities in bioinformatics research when it comes to sharing code with the community. There is no need to worry about setting up a web server and also no need to waste time on designing GUIs for downloadable apps. Today, HTML featuring PyScript is sufficient to make a user-friendly interface for the python code.

To receive updates about new posts, announcements, etc., please share your details below.