Fish Welfare Initiative · ARA Eluru

Daily Dissolved-Oxygen Risk Alert

A simple weather-based forecast that flags days when low DO is more likely across Eluru ponds.

Location: Eluru centroid (16.64°N, 81.12°E) Loading…

Historical lookup — verify the model on past days

Pick any date from 2024-01-01 onwards. The page will compute what the model would have said for that day, and (if the date is in our records) compare it to the actual visits and OOR results. Useful for spot-checking whether the model's "High risk" calls actually corresponded to bad days.

Date

Weather data source

Day-ahead forecast Reanalysis (hindsight)

What's the difference between these two sources?

Day-ahead forecast = what Open-Meteo's forecast actually said for that day, the day before. This matches what staff would have seen in real deployment. Use this to evaluate "would the model have helped us in real life?"

Reanalysis = the best retrospective estimate of what actually happened that day, computed later using all available data. Skye trained the model on this. Use this to see "what would the model say with perfect-information weather?" — i.e., the upper bound on accuracy. Real deployment will be slightly worse because forecasts are noisier than reanalysis.

Comparing the two for the same date shows how much the model degrades from "ideal information" to "what staff would actually have seen."

Practical caveat: For dates more than a few days in the past, Open-Meteo's day-ahead forecast and reanalysis converge to nearly identical values — Open-Meteo recalibrates historical forecasts against actual observations over time. So in practice, picking either source for older dates will usually give the same answer. The difference may only be visible for very recent dates (last 24–48 hours). The good news: this means live deployment performance shouldn't degrade meaningfully from what we measured in testing. The honest caveat: it also means we can't fully separate "what the forecast said" from "what actually happened" using public data — so the historical lookup is a partial validation of the model, not a fully independent one.

Proposed trial design

Before this model changes how field staff visit ponds, we want to shadow-test it for 12 weeks with no operational changes — just daily logging. This page explains how the trial works, why 12 weeks (rather than 4–6), what we'll decide at the end, and when we'd stop early.

Why a shadow trial

The model has been validated on historical data, but historical validation has limits. Performance can drift season-to-season; the data we have so far is mostly dry-season; and real deployment uses day-ahead forecasts, not reanalysis. Before changing field operations, we want to verify the signal still shows up on fresh, real-world conditions.

A shadow trial does exactly that with one important property: nothing about field operations changes during the trial. Staff visit ponds on their normal schedule, in normal order, at normal cadence. They just record the model's alert tier alongside the day's results. At the end, we compare OOR rates on Alert vs Normal days. If the signal holds, we move to a small operational pilot. If it doesn't, we stop — and we've spared Programs the cost of deploying a noisy tool.

The protocol — what staff do each day

Each morning, open this page. Note the alert tier: Normal, or Alert (Elevated and High both count as Alert).
Visit ponds on the normal schedule. No changes to routing, visit count, or order.
At end of day, log: date, alert tier, # visits made, # visits that found OOR.
Send the weekly log to Haven by Friday.

That's it. The model produces a daily call; the trial collects ground truth alongside it.

How long, and why 12 weeks

The earlier "4–6 week" suggestion in our planning was too short for the actual signal we're testing. Here's the math:

YTD validation showed Alert OOR ≈ 11.6%, Normal OOR ≈ 6.7% — a 1.7× lift.
Detecting that 1.7× difference at conventional 80% statistical power requires roughly 270 visits per group.
At ~6.5 visits/day with ~50% of days as Alert, that's ~3.25 visits/day per group, or about 12 weeks to accumulate the needed sample.

Concretely:

Trial length	Total visits	~Visits per group	Min lift detectable at 80% power
4 weeks	~180	~90	~2.5× (would only catch a far stronger signal than expected)
8 weeks	~360	~180	~2.1×
12 weeks	~540	~270	~1.7× — matches the YTD signal
16 weeks	~720	~360	~1.5×
20 weeks	~900	~455	~1.4×

At 4 weeks, we couldn't statistically distinguish the model from noise unless the true lift were dramatically bigger than YTD suggested. 12 weeks is the natural minimum for the signal we actually have.

Why 12 weeks is genuinely the right length, not just an annoyingly long one

The trial costs almost nothing during. Staff log a tier and a visit count alongside their normal day's records. No behavioural change, no operational risk. The only "cost" is delayed deployment if the signal is real.
12 weeks spans the dry-to-monsoon transition. Today's date is early May; a 12-week trial runs through late July. That's exactly the season we have no validation for, and where the model's weather-driven signal is most likely to behave differently. Useful information regardless of the stat test.
The YTD signal might not generalize. If the lift on monsoon days is dramatically different (better or worse) from the dry-season pattern, that's a critical finding for whether to deploy at all — not something we want to discover after rolling out.

Interim checkpoints

Week 4 — directional sanity check (not a go/no-go). Just look at OOR rates so far. If Alert is at least directionally higher than Normal, continue. If they're tied or flipped, investigate before continuing — something may be off in this season.

Week 8 — interim review. Confidence intervals will still be wide, but you'll have a real read. If the gap is dramatically larger than expected (≥2.5× lift) and the lower bound of its 90% CI is above 1.0, the trial may have already crossed the bar — consider proceeding to pilot early. If the gap roughly tracks 1.7×, keep going. If it's washed out (<1.2× or flipped), seriously consider stopping early.

Week 12 — decision point. See criterion below.

Pre-registered decision criterion (set the bar now, before results come in)

Proceed to operational pilot if, at end of week 12:

Alert OOR rate ≥ 1.5× Normal OOR rate, AND
The lower bound of the 90% confidence interval on the lift is above 1.0× (i.e., the difference is statistically distinguishable from zero).

Otherwise: stop. No pilot. Document the result as evidence the approach doesn't generalize, and look at other modelling directions (e.g. Sara's per-pond approach) for the next iteration.

Pre-registering the criterion now matters because it removes the temptation to retroactively choose a threshold that makes the result look better than it is.

When to stop early

Two scenarios where stopping the full 12 weeks makes sense:

Stop early because it's clearly working — at week 8, if lift is dramatically larger than 2× and the 90% CI excludes 1.0, we have stat-sig evidence and can move to pilot 4 weeks ahead of schedule.
Stop early because it's clearly not working — at week 4 or 8, if Alert OOR rate is below Normal rate (model is anti-predictive), or if alerts are firing every day (uninformative), stop and rebuild before continuing to log.

What we'll learn either way

Even if the trial fails, the data is valuable:

Confirms or refutes monsoon-season generalization. The trial spans April–July, exactly the season we have no validation for. We'd have a real answer either way.
Calibrates real forecast quality vs reanalysis. Live deployment uses day-ahead forecasts; the trial measures how the model performs on those.
Surfaces operational pain points. Whether staff find the morning check usable. Whether the alert frequency feels right. Whether observers manage the daily log without friction.

So 12 weeks of logging, regardless of outcome, gives a real-world evidence base for whatever we do next.

Daily logging template

Use this format (copy into a spreadsheet or print and fill by hand):

Date	Tier (Normal/Alert)	Visits	OOR found	Notes
Fill one row per day. Send weekly log to Haven by Friday EOD.

If staff want to record more detail — observer name, village, weather observed, etc. — that's welcome but not required. The four columns above are the minimum.

Quick reference

Trial duration	12 weeks
What changes for staff	Nothing operationally. They just log the alert tier alongside their normal day.
Decision criterion	Alert OOR ≥ 1.5× Normal OOR, with 90% CI lower bound above 1.0
If it passes	Move to a small operational pilot.
If it fails	Stop. Document. Move on to other approaches.
Interim check-ins	Week 4 (sanity), Week 8 (interim — possible early decision)
Send logs to	Haven (haven@fishwelfareinitiative.org), weekly

What this is, in plain words

This page predicts how likely it is that low dissolved oxygen will be found at Eluru ponds tomorrow morning, based only on the weather forecast for the region.

It does not tell you which specific ponds will be in trouble. It tells you whether tomorrow is the kind of day when low DO is more common across the region — and if so, you may want to put more effort into your visit schedule on that day.

Think of it as a daily weather-based heads-up, not a routing tool.

What to do today

General response protocol

Based on YTD validation: the model reliably distinguishes Alert days (Elevated or High) from Normal days, but does not reliably distinguish Elevated from High within the alert tier. So the recommended action is the same for both: treat any non-Normal day as an Alert.

Status	What it means	Suggested response
Normal	Tomorrow's weather is in the bottom 80% historically. OOR rate on these days runs ~6.7% (a bit below baseline).	Visit on your normal schedule. Optionally trim 1 visit if capacity is tight; redistribute to alert days.
Elevated risk	Tomorrow's weather is in the top ~20% (but not top ~10%) historically. OOR rate on these days runs ~13%.	Add 1–2 extra visits to chronic-risk ponds in your assigned area. Visit earliest in the morning. Log results. Same response as High — the data doesn't reliably differentiate them.
High risk	Tomorrow's weather is in the top ~10% historically. OOR rate on these days runs ~9% (statistically thin sample).

Treat this as a soft signal, not a hard schedule. Field judgement and geographic constraints still come first. The visual color separation between Elevated and High is preserved so you can see how strong the signal is, but the action is the same.

How accurate is this, honestly

Validated against all 513 morning non-follow-up visits in West Godavari from January through early May 2026 (~4 months of real operations). Per-tier OOR rates:

Baseline (all days)

9.4%

513 visits, 48 OOR

On Normal days

6.7%

298 visits, 20 OOR

On Elevated days

13.3%

135 visits, 18 OOR

On High-risk days

9.1%

77 visits, 7 OOR (statistically thin)

The honest takeaway

Treating "High" and "Elevated" together as a single Alert tier, you'll find OOR at 11.6% on Alert days vs 6.7% on Normal days — a 1.7× lift. The fine-grained "High vs Elevated" distinction isn't reliable on this dataset (too few visits per tier per day for the difference to be meaningful). Treat both as a single signal that today is unusually risky.

What we initially thought vs what we now see

An earlier validation against just January–March 2026 looked stronger (1.5× lift on top-10% days, 2.3× on top-5%). With four months of YTD data instead of three, the signal is more modest. The Q1 result was partly a favourable seasonal alignment. The 1.7× lift on Alert vs Normal is the realistic expectation.

Even at 1.7× lift, the model still won't catch most OOR events on its own — your routine visits do most of the work. Its operational value is making your alert-day visits ~70% more productive than your normal-day visits.

Limitations — please read this

It cannot pick which pond is bad. Weather is the same across all Eluru ponds on a given day. The model only tells you whether the day looks risky overall.
The signal is modest. Alert days have ~11.6% OOR rate vs ~6.7% on Normal days — a 1.7× lift. Even on Alert days, ~88% of visits will find no OOR. This isn't a "this pond is bad" signal, it's a "this day is somewhat riskier than usual" signal.
The High vs Elevated distinction isn't reliable on the data we've seen so far (4 months). Treat them as one Alert tier; same action either way.
Validated only on Jan–early May 2026 (dry + early pre-monsoon). We have no validation yet for monsoon or post-monsoon seasons. The relationship between weather and DO may be different in those seasons. Re-validate before assuming the model still works in July–November.
Eluru only. It was trained on Eluru data and the same regional weather. It is not validated for Nellore.
Forecast quality. The model was originally tested using the actual weather that occurred. In real use it relies on a day-ahead forecast (which is very accurate for temperature and humidity, decent for wind, noisier for cloud cover and rainfall).
It's a soft tool, not a routing replacement. Don't change daily routing based on this alone. Use it as one input alongside your normal judgement.

How it works (for the curious)

The model is a Ridge regression that predicts tomorrow's average DO across Eluru ponds, using only weather features from Open-Meteo — temperature, wind, humidity, sunshine, rainfall, cloud cover, pressure, dewpoint. It also looks at 3-day and 7-day rolling averages of these to capture multi-day weather accumulation effects.

Predicted DO is converted to a "risk score" (lower predicted DO = higher risk). This is compared to historical thresholds from 2024–2025 training data to decide the alert level:

High risk = score is in the top 10% of historical Eluru days (about 1 in 10 days).
Elevated = score is in the top 20% but not top 10%.
Normal = score is in the bottom 80%.

For operational purposes, treat High and Elevated as a single Alert tier. YTD validation showed the fine-grained High-vs-Elevated distinction isn't statistically reliable on the data we have so far — the 3-tier visual is preserved only to show how strong the underlying signal is.

The model was trained on 2,236 Eluru morning non-follow-up visits from Jan 2024 – Dec 2025 and tested on Jan–early May 2026 data (held out of training). It does not use any pond-specific features — only regional weather. It is one of several models that have been built and compared by FWI volunteers; this one was selected because, at the operational ranking task, it performs as well as or better than the more complex alternatives, and is much simpler to explain and deploy.

Tomorrow's weather (and what the model saw)

Loading…

How the score was calculated step-by-step

Loading…

Open-Meteo API call (technical)

This page makes a single browser call to Open-Meteo each time it loads:

…

Open-Meteo is a free, open-source weather data API and does not require a key. The call returns the past 7 days plus today plus tomorrow, which is enough to compute today's lagged-weather features and tomorrow's forecast.

Trial logging

Before this model changes daily operations, we want to shadow-test it for 12 weeks with no operational changes — just daily logging of the alert tier alongside normal visit results. The full trial design (timeline, decision criteria, interim checkpoints, why 12 weeks) lives in the Trial design tab at the top of this page. The short version:

Each morning, check this page. Record the alert tier: Normal or Alert (Elevated and High both count as Alert).
Visit ponds on your normal schedule (no operational change during the trial).
At end of day, log: date, alert tier, # visits, # OOR found.
Send weekly logs to Haven (haven@fishwelfareinitiative.org).

After 12 weeks we'll compute the OOR rate gap between Alert and Normal days. The pre-registered criterion for proceeding to an operational pilot is: Alert OOR ≥ 1.5× Normal OOR with 90% CI lower bound above 1.0. If we hit it, we move to a small pilot. If not, we stop and document.

What changed in this version

The earlier validation used only Q1 2026 data (3 months, 312 visits). With YTD data through early May (4 months, 513 visits), the model's signal is more modest than initially reported. Three changes were made:

Lift expectation lowered. Previously cited 1.5× lift on top-10% days (Q1 only). On YTD data, the realistic figure is 1.7× lift on Alert vs Normal days — slightly higher because it groups Elevated+High together, which is the cleaner cut.
3-tier collapsed to binary in the action protocol. The High vs Elevated distinction wasn't statistically reliable in the YTD data (HIGH had 9.1% OOR on 13 days, ELEVATED had 13.3% on 22 days — the order even flipped). Both tiers now recommend the same action.
Honest framing about variability. The Q1 number was partly seasonal alignment, not a stable estimate. Plan to recalibrate every quarter as more data arrives.

Source & contact

Built using volunteer-contributed code from the Fish Welfare Initiative ARA modelling project. Source pipeline: SkyeNygaard/Fishwelfare-Experiments. Weather data: Open-Meteo. Questions, issues, or feedback → Haven (haven@fishwelfareinitiative.org).

Publicly shareable version of the code, analysis, and reports is at github.com/hkingnob/fwi-eluru-do-alert-public.