STREAM-PRS: a multi-tool pipeline for streamlining polygenic risk score computation
- Rafal Tekreeti

- 35 minutes ago
- 3 min read
By: Sara Becelaere, Yasmina Abakkouy, Deborah Sarah Jans, Margaux David, Séverine Vermeire & Isabelle Cleynen

Polygenic risk scores (PRS) are used to estimate a person’s genetic risk for complex traits or diseases such as e.g. human height, type 2 diabetes, or inflammatory bowel disease. They combine the small effects of thousands of genetic variants into a single score. However, calculating PRS is not straightforward. There are many existing software tools that each use different statistical approaches, and no single tool works best for every trait or disease. This makes it challenging and time-consuming for researchers to identify the most accurate PRS for their specific application.
To tackle this, we developed STREAM-PRS, a flexible and automated pipeline that streamlines the entire process. It compares five widely used PRS tools (PRSice-2, PRS-CS, LDpred2, lassosum and lasssosum2) across four steps (Figure 1): (i) quality control of the genome-wide association study (GWAS) data, used as input for the selection and weighing of the genetic variants included in the score (ii) score calculation using different tuning parameters for each PRS tool on training data and using these tuned parameters on test data, (iii) correction for population structure and standardization, and (iv) selection of the best-performing score based on predictive accuracy.

We demonstrated the STREAM-PRS pipeline on an inflammatory bowel disease dataset. Inflammatory bowel disease is a chronic inflammatory disease of the digestive tract, most commonly in the form of Crohn’s disease or ulcerative colitis. We generated a total of 472 scores and found that lassosum worked best, explaining about 20% of disease risk and achieving an AUC of 0.75, i.e. it correctly predicts disease status about 75% of the time.
By automating and standardizing the calculation of polygenic risk scores, STREAM-PRS saves researchers valuable time and ensures more consistent, reproducible results. It allows side-by-side comparison of methods, helping scientists choose the best tool and parameter settings for their data. Beyond its immediate applications, STREAM-PRS serves as a flexible and comprehensive framework that can be adapted to diverse phenotypes, populations, and research objectives, paving the way for broader adoption of PRS in precision medicine and translational research.
How did VSC contribute to the research?
"The VSC’s high-performance computing environment was essential for developing STREAM-PRS. It enabled us to run memory-intensive Bayesian models, process large-scale genomic datasets like the UK Biobank (N = 357,622), and parallelize hundreds of polygenic risk score computations; making the pipeline scalable, efficient, and reproducible."
Read the full publication in SpringerNature here
STREAM-PRS is available on GitHub here
🔍 Your Research Matters — Let’s Share It!
Have you used VSC’s computing power in your research? Did our infrastructure support your simulations, data analysis, or workflow?
We’d love to hear about it!
Take part in our #ShareYourSuccess campaign and show how VSC helped move your research forward. Whether it’s a publication, a project highlight, or a visual from your work, your story can inspire others.
🖥️ Be featured on our website and social media. Show the impact of your work. Help grow our research community
📬 Submit your story: https://www.vscentrum.be/sys




