genes are so vital for triggering the immune system, that we will use these genes to foretell an individual’s immune response. Right here I’ll display estimate illness charges simply from immune gene frequencies. All of the steps from getting the immune gene information, to figuring out excessive danger nations, and assessing limitations of the mannequin are mentioned and the total code is accessible at github.com/DAWells/HLA_spondylitis_rate.
HLA genes are related to an individual’s response to an infection, vaccination, and sometimes very strongly linked to autoimmune illnesses. So strongly linked in reality, that in massive teams we will predict illness charges from HLA gene frequencies. HLA frequencies are broadly studied and so typically obtainable, permitting us to estimate charges of autoimmune situations which can be lacking or inaccurate because of the challenges of analysis. On this put up we’ll mix research to generate correct estimates of immune gene frequencies and use these to foretell nationwide charges of ankylosing spondylitis.
allelefrequencies.web is a database of human immune gene frequency information from internationally which is an open entry, free and public useful resource (Gonzalez-Galarza et al 2020). Nevertheless, it may be troublesome to obtain and mix information from a number of tasks; this makes it onerous to make the most of all this information. Fortunately HLAfreq
is a python bundle which makes it straightforward to get the newest information from allelefrequencies.web and put together them for our evaluation. (Full disclosure, I’m one of many authors of HLAfreq!).
Ankylosing spondylitis is a type of arthritis, and 90% of sufferers have a selected model of the HLA B gene. To get the frequency of this model in several nations, I downloaded all obtainable frequency for this gene and mixed research of the identical nation, weighting by pattern dimension. In short, the mixture relies on the Dirichlet distribution and we will use a Bayesian method to estimate uncertainty too. Singapore is used for instance within the determine beneath (all figures on this article are generated by the writer). Completely different HLA-B gene variations (often known as alleles) are proven on the y axis, with their frequency in Singapore on the x axis. Knowledge from the unique Singapore research are proven in color, and mixed estimates in black. I centered on the weighted common on this evaluation, which is proven by the black circles. HLAfreq additionally calculates a Bayesian estimate with uncertainty which is indicated by the black bars.

The code used to obtain, mix, and plot the HLA-B allele frequency information for Singapore is beneath.
# Obtain uncooked information
base_url = HLAfreq.makeURL(“Singapore”, customary="g", locus="B")
aftab = HLAfreq.getAFdata(base_url)
# Put together information
aftab = HLAfreq.only_complete(aftab)
aftab = HLAfreq.decrease_resolution(aftab, 1)
# Mix information from a number of research
caf = HLAfreq.combineAF(aftab)
hdi = HLAhdi.AFhdi(aftab, credible_interval=0.95)
caf = pd.merge(caf, hdi, how="left", on="allele")
# Plot gene frequencies
HLAfreq.plotAF(caf, aftab.sort_values("allele_freq"), hdi=hdi, compound_mean=hdi)
Now we have now the nationwide allele frequencies we will pair them with nationwide illness charges to check the correlation. I’ve used the illness charges reported in Dean et al 2014. I log remodeled the illness fee to make it usually distributed so I may match an extraordinary least squares linear regression. As anticipated, there was a major constructive correlation; nations with larger frequencies of HLA-B*27 had larger charges of ankylosing spondylitis. The exception to this was Finland which had an unusually excessive frequency of HLA-B*27 however a middling fee of illness. I eliminated Finland from the mannequin as an outlier, a call which was supported by “statistical leverage”. (Leverage means this one level had too massive an affect on the general mannequin; we wish the mannequin to inform us about nations on the whole not anybody nation particularly).
We are able to use our linear regression mannequin to foretell charges of ankylosing spondylitis in nations the place we all know the HLA-B*27 frequency. This tells us that nations like Austria and Croatia have excessive predicted ankylosing spondylitis charges. Utilizing these predictions will increase the variety of nations with illness fee estimates from 16 to 52 and may also help determine nations that would profit from extra surveillance. On the earth map beneath, nations with low recognized or predicted charges of ankylosing spondylitis are plotted in blue and excessive charges in yellow. International locations with recognized charges are outlined in black and people with predicted charges are outlined in cyan or orange. Cyan is used for nations within the vary of our mannequin and orange is used for nations exterior our mannequin’s vary, see beneath for why that is vital.

We must be cautious about predicting illness charges for nations with HLA-B*27 charges exterior of the vary of our mannequin. Of the 36 nations we have now predicted illness charges for, 10 have HLA-B*27 frequencies larger or decrease than any nation we utilized in our mannequin. Subsequently, we will’t make certain the mannequin will give correct predictions for these nations. Particularly, predictions could also be unreliable for nations with excessive HLA-B*27 charges, we already know that Finland didn’t match our mannequin. This may very well be due to a non-linear pattern however we should not have sufficient information to discover these excessive frequencies.

The nations with recognized illness charges are plotted with stuffed factors. Finland which was omitted from the mannequin is plotted in crimson. The expected illness charges are plotted as open circles, cyan for nations within the mannequin’s vary and orange exterior of it. The arrogance intervals of the mannequin are proven as dashed traces, and the prediction intervals are proven as a gray ribbon. A fast reminder concerning the distinction: we anticipate the true relationship to fall throughout the confidence intervals 95% of the time, and we anticipate 95% of knowledge factors to fall throughout the prediction intervals.
It’s price taking a second to remind ourselves that regardless of this correlation, there are various different components influencing illness charges. Clearly a person’s probability of creating ankylosing spondylitis can also be impacted by their surroundings and different genetic components. So if we wished actually correct illness fee predictions we would wish contemplate these different variables. However given how straightforward it’s to get HLA frequency information, it’s a reasonably spectacular predictor for a illness that may take years to diagnose.
Conclusion
HLA genes have a powerful affect on human well being by an infection, vaccination, autoimmune illnesses, and organ transplants. Due to these sturdy relationships, we will use broadly obtainable HLA frequency information to check these well being traits not directly. Sources like allelefrequency.web and HLAfreq make it simpler to check these relationships, both by taking a look at these correlations straight or utilizing allele frequencies as a proxy when different information is lacking. I hope this put up has obtained you interested by inquiries to ask utilizing HLA frequency information.
References
Gonzalez-Galarza, F. F., McCabe, A., Santos, E. J. M. D., Jones, J., Takeshita, L., Ortega-Rivera, N. D., … & Jones, A. R. (2020). Allele frequency web database (AFND) 2020 replace: gold-standard information classification, open entry genotype information and new question instruments. Nucleic acids analysis, 48(D1), D783-D788.
Dean, L. E., Jones, G. T., MacDonald, A. G., Downham, C., Sturrock, R. D., & Macfarlane, G. J. (2014). World prevalence of ankylosing spondylitis. Rheumatology, 53(4), 650-657.
Wells, D. A., & McAuley, M. (2023). HLAfreq: Obtain and mix HLA allele frequency information. bioRxiv, 2023-09. https://doi.org/10.1101/2023.09.15.557761