Rigorous Reasoning

Bayesian Probability

Capstone: Bayesian Judgment on Real Evidence

An integrative lesson that asks students to apply Bayesian updating to mixed evidence scenarios: identify priors, compute likelihoods under rival hypotheses, update to a posterior, and communicate the result in calibrated language.

Read the explanation sections first, then use the activities to test whether you can apply the idea under pressure.

InductiveCapstoneLesson 4 of 40% progress

Start Here

What this lesson is helping you do

An integrative lesson that asks students to apply Bayesian updating to mixed evidence scenarios: identify priors, compute likelihoods under rival hypotheses, update to a posterior, and communicate the result in calibrated language. The practice in this lesson depends on understanding Likelihood, False Positive Rate, and Calibration and applying tools such as Respect Base Rates and Distinguish P(E | H) from P(H | E) correctly.

How to approach it

Read the explanation sections first, then use the activities to test whether you can apply the idea under pressure.

What the practice is building

You will put the explanation to work through evaluation practice and quiz activities, so the goal is not just to recognize the idea but to use it under your own control.

What success should let you do

Run the full Bayesian pipeline on at least 3 mixed scenarios, producing priors, likelihoods, posterior as natural frequency, and calibrated verdict.

Reading Path

Move through the lesson in this order

The page is designed to teach before it tests. Use this sequence to keep the reading, examples, and practice in the right relationship.

Read

Build the mental model

Move through the guided explanation first so the central distinction and purpose are clear before you evaluate your own work.

Study

Watch the move in context

Use the worked example to see how the reasoning behaves when someone else performs it carefully.

Do

Practice with a standard

Only then move into the activities, using the pause-and-check prompts as a final checkpoint before you submit.

Guided Explanation

Read this before you try the activity

These sections give the learner a usable mental model first, so the practice feels like application rather than guesswork.

Framing

Running the unit pipeline end-to-end

Earlier lessons taught the pieces in isolation: priors, base rates, conditional probability, and Bayesian comparison of rival hypotheses. The capstone asks you to run the full update cycle on real-looking evidence scenarios.

Real Bayesian reasoning rarely comes with numbers attached. You have to estimate priors, estimate likelihoods under each hypothesis, carry out the update, and express the posterior as a calibrated verdict. Each step requires discipline.

What to look for

  • Write out the prior before computing anything else.
  • Compute likelihoods for each rival, not just the favored one.
  • Express the posterior as a natural frequency when possible.
Bayesian judgment is a pipeline: prior, likelihood, posterior, communication.

Strategy

Choose the move that matches the case

Use a fixed pattern: (1) list at least two rival hypotheses, (2) estimate a prior for each, (3) estimate the likelihood of the observed evidence under each, (4) compute the posterior using natural frequencies or a Bayes factor, (5) write a calibrated plain-English verdict.

Natural frequencies are usually clearer than probabilities. Instead of saying 'P(H|E) = 0.72', say 'out of 100 cases like this, about 72 turn out to be H'. This phrasing makes your reasoning easier to check and easier to communicate.

What to look for

  • Name at least two hypotheses.
  • Estimate priors explicitly.
  • Compute the posterior with natural frequencies.
Natural-frequency updates are easier to check and easier to communicate.

Error patterns

How integration failures actually look

The commonest failure is base-rate neglect: jumping from 'the test is positive' to 'the condition is present' without multiplying by the prior. The capstone trains you to write out the base rate before computing anything else.

The second commonest failure is likelihood confusion: using P(E|H) when you should use P(H|E), or treating 'the evidence is consistent with H' as 'the evidence confirms H'. Always ask what the likelihood of the evidence would be under each rival, not just under the hypothesis you favor.

What to look for

  • Do not skip the prior.
  • Do not confuse P(E|H) with P(H|E).
  • Do not treat consistency with H as confirmation of H.
Base-rate neglect and likelihood confusion are the dominant Bayesian failure modes; both are preventable by writing out the structure.

Before practice

What this lesson is testing

The cases below involve estimation, not exact numbers. You will have to make reasonable assumptions and carry out the update. Part of the exercise is making your assumptions explicit.

A case is only complete when you have produced priors, likelihoods, a posterior, and a calibrated verdict.

What to look for

  • State your assumptions explicitly.
  • Compute likelihoods for every rival.
  • Deliver a calibrated plain-English verdict.
The capstone measures whether you can apply the Bayesian pipeline to evidence that does not come with numbers attached.

Core Ideas

The main concepts to keep in view

Use these as anchors while you read the example and draft your response. If the concepts blur together, the practice usually blurs too.

Likelihood

The probability of the observed evidence on the assumption that a given hypothesis is true, written P(E | H).

Why it matters: Likelihood measures how well a hypothesis predicts the evidence — not how probable the hypothesis is.

False Positive Rate

The probability that a test or indicator yields a positive result when the hypothesis is false, written P(E | not-H).

Why it matters: A positive indicator is only meaningful when the false positive rate is small relative to the true positive rate.

Calibration

The property of having stated confidence match long-run accuracy — 70%-confident predictions should come true about 70% of the time.

Why it matters: Calibration is the Bayesian virtue that separates well-updated beliefs from overconfident or underconfident ones.

Reference

Open these only when you need the extra structure

How the lesson is meant to unfold

Review

This step supports the lesson by moving from explanation toward application.

Guided Synthesis

This step supports the lesson by moving from explanation toward application.

Independent Synthesis

This step supports the lesson by moving from explanation toward application.

Reflection

This step supports the lesson by moving from explanation toward application.

Mastery Check

The final target tells you what successful understanding should enable you to do.

Reasoning tools and formal patterns

Rules and standards

These are the criteria the unit uses to judge whether your reasoning is actually sound.

Respect Base Rates

A probabilistic judgment should not ignore background prevalence or prior probability when the context makes it relevant.

Common failures

  • A striking test result is treated as if it overrides the base rate automatically.
  • Rare-event contexts are assessed as though all hypotheses started equally likely.

Distinguish P(E | H) from P(H | E)

A likelihood is not the same thing as a posterior probability. Swapping them is the 'prosecutor's fallacy'.

Common failures

  • The probability of evidence given a hypothesis is mistaken for the probability of the hypothesis given the evidence.
  • Diagnostic accuracy is confused with posterior certainty.

Update Proportionately to Evidence

Belief revision should reflect both prior plausibility and the relative explanatory weight of the evidence — not the vividness or novelty of the evidence.

Common failures

  • A small piece of evidence causes an excessive revision.
  • Strong contrary evidence produces almost no change in confidence.

Compare Evidence Under All Rival Hypotheses

The weight of evidence depends not only on how well it fits the favored hypothesis, but also on how well it fits the rivals.

Common failures

  • Asking only whether the evidence fits H and ignoring whether it fits not-H equally well.
  • Treating evidence as strong because it 'supports' H without checking whether it also supports rival hypotheses.

Patterns

Use these when you need to turn a messy passage into a cleaner logical structure before evaluating it.

Bayesian Update Schema

Input form

evidence_assessment_problem

Output form

prior_likelihood_posterior_analysis

Steps

  • State the hypothesis under evaluation.
  • Identify the relevant prior probability or base rate.
  • State how likely the evidence would be if the hypothesis were true (the likelihood).
  • State how likely the evidence would be if the hypothesis were false (the false positive rate).
  • Compute or estimate the posterior proportionately.

Watch for

  • Skipping the prior entirely.
  • Using only one-sided likelihood information.
  • Treating the posterior as certainty rather than a revised degree of support.

Qualitative Bayesian Comparison

Input form

competing_hypotheses_with_evidence

Output form

relative_support_judgment

Steps

  • State the competing hypotheses.
  • Compare their priors qualitatively (which was more plausible before the evidence?).
  • Compare how strongly each predicts the evidence.
  • Multiply qualitatively: a higher prior and a higher likelihood both push the posterior up.
  • State the posterior ranking with appropriate caution.

Watch for

  • Assuming the hypothesis with the most vivid story automatically gets the higher posterior.
  • Ignoring that weaker priors can sometimes be overcome by much stronger evidence.

Worked Through

Examples that model the standard before you try it

Do not skim these. A worked example earns its place when you can point to the exact move it is modeling and the mistake it is trying to prevent.

Worked Example

Rare-Disease Update in Natural Frequencies

Natural frequencies make the base-rate effect visible; the same computation in probabilities would obscure it.

Scenario

A disease has a base rate of 1 in 1,000. A screening test has 99 percent sensitivity and a 5 percent false positive rate. A patient tests positive.

Posterior

About 10 in 510, or roughly 2 percent.

Calibrated Verdict

A positive result on this screening test, by itself, leaves the probability of actually having the disease around 2 percent. The right next step is usually a confirmatory test, not treatment.

Natural Frequencies

  • Imagine 10,000 people.
  • About 10 actually have the disease.
  • Of those 10, the test catches about 9.9 (call it 10).
  • Of the 9,990 without the disease, about 500 test positive (5 percent false positive rate).
  • Total positives: about 510. Actual cases among them: about 10.

Pause and Check

Questions to use before you move into practice

Self-check questions

  • Did I write out the base rate before computing anything?
  • Did I compute likelihoods for every rival?
  • Is my verdict expressed in calibrated language?

Practice

Now apply the idea yourself

Move into practice only after you can name the standard you are using and the structure you are trying to preserve or evaluate.

Evaluation Practice

Inductive

Full-Cycle Bayesian Update

For each scenario, produce: (1) the rival hypotheses, (2) your estimated priors with a one-sentence justification, (3) your estimated likelihoods, (4) a posterior expressed as a natural frequency, and (5) a calibrated plain-English verdict.

Integrative cases

Work one case at a time. These cases are deliberately mixed; part of the exercise is deciding which moves from the unit each case requires.

Case A

A screening test for a rare disease (base rate 1 in 2,000) has a 95 percent sensitivity and a 5 percent false positive rate. A patient tests positive.

Classic rare-disease update. Natural frequencies will make the result intuitive.

Case B

A fraud-detection system flags a transaction as suspicious. The system has a 98 percent true positive rate and a 2 percent false positive rate. About 1 in 500 transactions are actually fraudulent.

Base rate matters enormously. Compute the posterior carefully.

Case C

A student's essay is flagged as AI-generated by a detector with a 90 percent true positive rate and a 10 percent false positive rate. The instructor estimates that 10 percent of essays in the class are AI-generated.

A non-rare base rate changes the update significantly. Compare with the rare-disease case.

Case D

An employee's wellness tracker reports an unusual heart-rate pattern overnight. The pattern appears in about 3 percent of nights for healthy adults and in about 25 percent of nights for adults with the condition being screened for. The background rate of the condition is about 1 percent in the relevant age group.

Use rival likelihoods and a modest prior to compute a posterior.

Use one of the cases above, identify the evidence base, and judge how strong the conclusion is once you account for rival factors.

Not saved yet.

Quiz

Inductive

Capstone Check Questions

Answer each short check question in one or two sentences. These questions test whether you can articulate the reasoning you just performed in your own words.

Check questions

Answer each question from memory in your own words. No answer should need more than two sentences.

Question 1

Why does a positive test for a rare condition often leave the posterior well below 50 percent?

Base-rate arithmetic: small prior times large likelihood still gives a small posterior relative to the false-positive mass.

Question 2

What is the difference between P(E|H) and P(H|E), and why does confusing them produce wrong updates?

They are related by Bayes' rule; swapping them is the prosecutor's fallacy.

Question 3

Why are natural frequencies easier to reason with than probabilities?

They keep the denominator visible.

Question 4

Why should a Bayesian verdict always be expressed in calibrated language rather than as a yes/no claim?

A posterior is a degree of belief, not a categorical verdict.

Use one of the cases above, identify the evidence base, and judge how strong the conclusion is once you account for rival factors.

Not saved yet.

Animated Explainers

Step-by-step visual walkthroughs of key concepts. Click to start.

Read the explanation carefully before jumping to activities!

Riko

Further Support

Open these only if you need extra help or context

Mistakes to avoid before submitting
  • Letting the test's sensitivity alone decide the verdict.
  • Converting a natural-frequency answer back into a probability without keeping the denominator visible.
Where students usually go wrong

Skipping the base rate entirely.

Treating the test result as a probability of the disease.

Confusing P(E|H) with P(H|E).

Delivering a categorical verdict where a calibrated one is needed.

Historical context for this way of reasoning

Amos Tversky and Daniel Kahneman

Tversky and Kahneman documented base-rate neglect as a robust human tendency. The capstone trains the discipline that counters it: writing out the structure before computing anything.