Will I Ever Understand What p-value Really Means?

Probably p-value and hypothesis testing are some of the concepts that everyone has some difficulty understanding at first, considering how schools and statistics classes usually teach, how to calculate p-value.When someone is first introduced to hypothesis testing and p-values they always fail to understand why they are doing what they’re doing, and resort to either memorising formulas, searching online for p-value calculators or cram the golden rule:

p < 0.05 implies it is significant
p > 0.05 implies it is non significant.

But today we are here with an intuitive way of understanding what a p-value means, how do we calculate p-value and why do we even need it, to clear part of the mental cloud that exists around it, in people’s mind.Let us begin with a simple example to explain p-value.

Imagine you’re a teacher in a school and you have a class of 30 students. After giving a test to them, most of them fail miserably and the principal comes in and tells you to do something, or you’re fired!

With looming tensions and sleeplessness, you somehow devise a new method of teaching your students and try it out on them for the next one month, after which the students take the test again.‍

Fortunately, the students nail the exam this time and everyone is happy. You still have your job, but again in the middle of the night you start overthinking,

‍Did the students really perform well because of my new teaching method or did this happen by chance, and it had nothing to do with my efforts? It is very possible it might have happened by chance. (especially when large numbers are involved)

‍After thinking about it for long, you decide to get your hands dirty and employ some statistics.

1. Firstly, you decide that comparing average scores of the two conditions (old teaching method and new teaching method) is the way to go. Your observation (what has really happened) is the alternate hypothesis. In terms of formulas:

Mean score (new) > Mean score (old)

2. Next, you ask yourself that what happens if the scores for the two different cases follow the same distribution (a fancy term for continuous histogram). This is called your null hypothesis (another fancy term). Again in terms of formulas:

Mean score (new) = Mean score (old)

3. Now you devise a plan and say to yourself

“I will assume the null hypothesis to be true” (my hypothesis, my rules)
“Then I will find the probability that the alternative hypothesis (what has really occurred) will occur” (using more maths)
If that probability is really low, that means that the effect (the thing which has happened) may not have occurred by chance. Going by the terminology used, this is called as a significant effect
But if that probability is not low, you can reason that the effect has most likely occurred by chance
This threshold you’ll compare your probability to is often called alpha (or significance level) which is usually set at 0.05

‍

Now a glaring question comes into view, “How do you get to this p-value from your data?”

‍By using something called a hypothesis test. There are probably 100s of hypothesis tests out there but the question comes down to which one is best suited for your data. The question of which test to choose is a very detailed discussion and we’ll save it for another post. (let us know in the comments, if that is something you want)

‍Test scores usually follow a normal distribution(note that this has been observed in populations) and the difference in the means of independent samples drawn, from two normal distribution follows a particular type of distribution called the t-distribution.

‍Without going into the details, two independent samples from two normal distributions can be compared using a t-test and that is what we will use. This part will involve a lot of calculations. Just remember that to get to p-values from data, we need some distributions.

In short, we assume that the difference in the sample mean scores (of the two conditions) follow the t-distribution and apply the t-test to get the t-statistic that can be translated to a probability which is our p-value.

You do all this and see that p-value is indeed less than 0.05 and say to yourself, “The probability that students performed well as compared to the last time, by mere chance is very small at 0.05 significance level” and feeling proud of your achievement, you let the sleep come to you.

See how Polly uses the p-value in the analysis. Book a demo today!

Blog Categories

Data Analysis and Management

Data Quality & Compliance

Industry Features

Product & Engineering

Data Science & Machine Learning

Company & Culture

FAIR Data

Others

Thank you for reaching out!

Our team will get in touch with you over email within next 24-48hrs.

Oops! Something went wrong while submitting the form.

Other Resources

Case Studies Dataset Roundup Documentation Glossary Solution Briefs Webinars Whitepapers

Overview

MODULES

features

Managed Services

Data Products

Data Types

Discovery

Preclinical Development

Clinical Research

Precision Diagnostics

Core Facility

Upcoming Webinar - Agentic AI Delivers Human-Accurate Biomedical to Accelerate Precision Medicine

Join us

[Upcoming Webinar] Scaling High-Quality Data Processing: Achieve 4x Cost Reduction for Foundation ModelsRegister Now->

Reserve Your Seat

Will I Ever Understand What p-value Really Means?

Blog Categories

Talk to our Data Expert

Other Resources

Blog Categories

Navigating the Future of Healthcare AI: Opportunities, Challenges, and Ethical Considerations

Navigating the Future of Healthcare AI: Opportunities, Challenges, and Ethical Considerations

Clinical Trials Data: Best Practices for Effective Analysis and Integration

Clinical Trials Data: Best Practices for Effective Analysis and Integration

AI Agents in Healthcare: Real Use Cases, Benefits, and How to Deploy Them Effectively

AI Agents in Healthcare: Real Use Cases, Benefits, and How to Deploy Them Effectively

Scalable Infrastructure for Biomedical Data: Best Practices and Common Pitfalls to Avoid

Scalable Infrastructure for Biomedical Data: Best Practices and Common Pitfalls to Avoid

Understanding Knowledge Graphs: Definition, Benefits, and Best Practices

Understanding Knowledge Graphs: Definition, Benefits, and Best Practices

Visibility Is Power. Preprints Make It Instant.

Visibility Is Power. Preprints Make It Instant.

Clinical Trials Data: Best Practices for Effective Analysis and Integration

EHR Data: Transforming Healthcare through Standardization and Innovation

Scaling Data Pipelines for High-throughput Bioinformatics

Decoding Complexities: The Critical Role of Deconvolution in Spatial Transcriptomics

Challenges with Diagnostics Data Processing Pipelines

info@elucidata.io

info@elucidata.io

info@elucidata.io

Upcoming Webinar - Agentic AI Delivers Human-Accurate Biomedical to Accelerate Precision Medicine

Join us

[Upcoming Webinar] Scaling High-Quality Data Processing: Achieve 4x Cost Reduction for Foundation ModelsRegister Now->

Reserve Your Seat

Will I Ever Understand What p-value Really Means?

Blog Categories

Talk to our Data Expert

Other Resources

Related Blogs

Navigating the Future of Healthcare AI: Opportunities, Challenges, and Ethical Considerations

Clinical Trials Data: Best Practices for Effective Analysis and Integration

AI Agents in Healthcare: Real Use Cases, Benefits, and How to Deploy Them Effectively

Scalable Infrastructure for Biomedical Data: Best Practices and Common Pitfalls to Avoid

Understanding Knowledge Graphs: Definition, Benefits, and Best Practices

Visibility Is Power. Preprints Make It Instant.

Blog Categories

Get the latest news, industry insights, and updates delivered directly to your inbox.

Latest Blogs

Navigating the Future of Healthcare AI: Opportunities, Challenges, and Ethical Considerations

Navigating the Future of Healthcare AI: Opportunities, Challenges, and Ethical Considerations

Clinical Trials Data: Best Practices for Effective Analysis and Integration

Clinical Trials Data: Best Practices for Effective Analysis and Integration

AI Agents in Healthcare: Real Use Cases, Benefits, and How to Deploy Them Effectively

AI Agents in Healthcare: Real Use Cases, Benefits, and How to Deploy Them Effectively

Scalable Infrastructure for Biomedical Data: Best Practices and Common Pitfalls to Avoid

Scalable Infrastructure for Biomedical Data: Best Practices and Common Pitfalls to Avoid

Understanding Knowledge Graphs: Definition, Benefits, and Best Practices

Understanding Knowledge Graphs: Definition, Benefits, and Best Practices

Visibility Is Power. Preprints Make It Instant.

Visibility Is Power. Preprints Make It Instant.

Trending Blogs

Clinical Trials Data: Best Practices for Effective Analysis and Integration

EHR Data: Transforming Healthcare through Standardization and Innovation

Scaling Data Pipelines for High-throughput Bioinformatics

Decoding Complexities: The Critical Role of Deconvolution in Spatial Transcriptomics

Challenges with Diagnostics Data Processing Pipelines

info@elucidata.io

info@elucidata.io

info@elucidata.io