Pick any date from 2024-01-01 onwards. The page will compute what the model would have said for that day, and (if the date is in our records) compare it to the actual visits and OOR results. Useful for spot-checking whether the model's "High risk" calls actually corresponded to bad days.
Day-ahead forecast = what Open-Meteo's forecast actually said for that day, the day before. This matches what staff would have seen in real deployment. Use this to evaluate "would the model have helped us in real life?"
Reanalysis = the best retrospective estimate of what actually happened that day, computed later using all available data. Skye trained the model on this. Use this to see "what would the model say with perfect-information weather?" — i.e., the upper bound on accuracy. Real deployment will be slightly worse because forecasts are noisier than reanalysis.
Comparing the two for the same date shows how much the model degrades from "ideal information" to "what staff would actually have seen."
Practical caveat: For dates more than a few days in the past, Open-Meteo's day-ahead forecast and reanalysis converge to nearly identical values — Open-Meteo recalibrates historical forecasts against actual observations over time. So in practice, picking either source for older dates will usually give the same answer. The difference may only be visible for very recent dates (last 24–48 hours). The good news: this means live deployment performance shouldn't degrade meaningfully from what we measured in testing. The honest caveat: it also means we can't fully separate "what the forecast said" from "what actually happened" using public data — so the historical lookup is a partial validation of the model, not a fully independent one.
Before this model changes how field staff visit ponds, we want to shadow-test it for 12 weeks with no operational changes — just daily logging. This page explains how the trial works, why 12 weeks (rather than 4–6), what we'll decide at the end, and when we'd stop early.
The model has been validated on historical data, but historical validation has limits. Performance can drift season-to-season; the data we have so far is mostly dry-season; and real deployment uses day-ahead forecasts, not reanalysis. Before changing field operations, we want to verify the signal still shows up on fresh, real-world conditions.
A shadow trial does exactly that with one important property: nothing about field operations changes during the trial. Staff visit ponds on their normal schedule, in normal order, at normal cadence. They just record the model's alert tier alongside the day's results. At the end, we compare OOR rates on Alert vs Normal days. If the signal holds, we move to a small operational pilot. If it doesn't, we stop — and we've spared Programs the cost of deploying a noisy tool.
That's it. The model produces a daily call; the trial collects ground truth alongside it.
The earlier "4–6 week" suggestion in our planning was too short for the actual signal we're testing. Here's the math:
Concretely:
| Trial length | Total visits | ~Visits per group | Min lift detectable at 80% power |
|---|---|---|---|
| 4 weeks | ~180 | ~90 | ~2.5× (would only catch a far stronger signal than expected) |
| 8 weeks | ~360 | ~180 | ~2.1× |
| 12 weeks | ~540 | ~270 | ~1.7× — matches the YTD signal |
| 16 weeks | ~720 | ~360 | ~1.5× |
| 20 weeks | ~900 | ~455 | ~1.4× |
At 4 weeks, we couldn't statistically distinguish the model from noise unless the true lift were dramatically bigger than YTD suggested. 12 weeks is the natural minimum for the signal we actually have.
Week 4 — directional sanity check (not a go/no-go). Just look at OOR rates so far. If Alert is at least directionally higher than Normal, continue. If they're tied or flipped, investigate before continuing — something may be off in this season.
Week 8 — interim review. Confidence intervals will still be wide, but you'll have a real read. If the gap is dramatically larger than expected (≥2.5× lift) and the lower bound of its 90% CI is above 1.0, the trial may have already crossed the bar — consider proceeding to pilot early. If the gap roughly tracks 1.7×, keep going. If it's washed out (<1.2× or flipped), seriously consider stopping early.
Week 12 — decision point. See criterion below.
Proceed to operational pilot if, at end of week 12:
Otherwise: stop. No pilot. Document the result as evidence the approach doesn't generalize, and look at other modelling directions (e.g. Sara's per-pond approach) for the next iteration.
Pre-registering the criterion now matters because it removes the temptation to retroactively choose a threshold that makes the result look better than it is.
Two scenarios where stopping the full 12 weeks makes sense:
Even if the trial fails, the data is valuable:
So 12 weeks of logging, regardless of outcome, gives a real-world evidence base for whatever we do next.
Use this format (copy into a spreadsheet or print and fill by hand):
| Date | Tier (Normal/Alert) | Visits | OOR found | Notes |
|---|---|---|---|---|
| Fill one row per day. Send weekly log to Haven by Friday EOD. | ||||
If staff want to record more detail — observer name, village, weather observed, etc. — that's welcome but not required. The four columns above are the minimum.
| Trial duration | 12 weeks |
| What changes for staff | Nothing operationally. They just log the alert tier alongside their normal day. |
| Decision criterion | Alert OOR ≥ 1.5× Normal OOR, with 90% CI lower bound above 1.0 |
| If it passes | Move to a small operational pilot. |
| If it fails | Stop. Document. Move on to other approaches. |
| Interim check-ins | Week 4 (sanity), Week 8 (interim — possible early decision) |
| Send logs to | Haven (haven@fishwelfareinitiative.org), weekly |
This page predicts how likely it is that low dissolved oxygen will be found at Eluru ponds tomorrow morning, based only on the weather forecast for the region.
It does not tell you which specific ponds will be in trouble. It tells you whether tomorrow is the kind of day when low DO is more common across the region — and if so, you may want to put more effort into your visit schedule on that day.
Think of it as a daily weather-based heads-up, not a routing tool.
Based on YTD validation: the model reliably distinguishes Alert days (Elevated or High) from Normal days, but does not reliably distinguish Elevated from High within the alert tier. So the recommended action is the same for both: treat any non-Normal day as an Alert.
| Status | What it means | Suggested response |
|---|---|---|
| Normal | Tomorrow's weather is in the bottom 80% historically. OOR rate on these days runs ~6.7% (a bit below baseline). | Visit on your normal schedule. Optionally trim 1 visit if capacity is tight; redistribute to alert days. |
| Elevated risk | Tomorrow's weather is in the top ~20% (but not top ~10%) historically. OOR rate on these days runs ~13%. | Add 1–2 extra visits to chronic-risk ponds in your assigned area. Visit earliest in the morning. Log results. Same response as High — the data doesn't reliably differentiate them. |
| High risk | Tomorrow's weather is in the top ~10% historically. OOR rate on these days runs ~9% (statistically thin sample). |
Treat this as a soft signal, not a hard schedule. Field judgement and geographic constraints still come first. The visual color separation between Elevated and High is preserved so you can see how strong the signal is, but the action is the same.
Validated against all 513 morning non-follow-up visits in West Godavari from January through early May 2026 (~4 months of real operations). Per-tier OOR rates:
Treating "High" and "Elevated" together as a single Alert tier, you'll find OOR at 11.6% on Alert days vs 6.7% on Normal days — a 1.7× lift. The fine-grained "High vs Elevated" distinction isn't reliable on this dataset (too few visits per tier per day for the difference to be meaningful). Treat both as a single signal that today is unusually risky.
An earlier validation against just January–March 2026 looked stronger (1.5× lift on top-10% days, 2.3× on top-5%). With four months of YTD data instead of three, the signal is more modest. The Q1 result was partly a favourable seasonal alignment. The 1.7× lift on Alert vs Normal is the realistic expectation.
Even at 1.7× lift, the model still won't catch most OOR events on its own — your routine visits do most of the work. Its operational value is making your alert-day visits ~70% more productive than your normal-day visits.
The model is a Ridge regression that predicts tomorrow's average DO across Eluru ponds, using only weather features from Open-Meteo — temperature, wind, humidity, sunshine, rainfall, cloud cover, pressure, dewpoint. It also looks at 3-day and 7-day rolling averages of these to capture multi-day weather accumulation effects.
Predicted DO is converted to a "risk score" (lower predicted DO = higher risk). This is compared to historical thresholds from 2024–2025 training data to decide the alert level:
For operational purposes, treat High and Elevated as a single Alert tier. YTD validation showed the fine-grained High-vs-Elevated distinction isn't statistically reliable on the data we have so far — the 3-tier visual is preserved only to show how strong the underlying signal is.
The model was trained on 2,236 Eluru morning non-follow-up visits from Jan 2024 – Dec 2025 and tested on Jan–early May 2026 data (held out of training). It does not use any pond-specific features — only regional weather. It is one of several models that have been built and compared by FWI volunteers; this one was selected because, at the operational ranking task, it performs as well as or better than the more complex alternatives, and is much simpler to explain and deploy.
This page makes a single browser call to Open-Meteo each time it loads:
…
Open-Meteo is a free, open-source weather data API and does not require a key. The call returns the past 7 days plus today plus tomorrow, which is enough to compute today's lagged-weather features and tomorrow's forecast.
Before this model changes daily operations, we want to shadow-test it for 12 weeks with no operational changes — just daily logging of the alert tier alongside normal visit results. The full trial design (timeline, decision criteria, interim checkpoints, why 12 weeks) lives in the Trial design tab at the top of this page. The short version:
After 12 weeks we'll compute the OOR rate gap between Alert and Normal days. The pre-registered criterion for proceeding to an operational pilot is: Alert OOR ≥ 1.5× Normal OOR with 90% CI lower bound above 1.0. If we hit it, we move to a small pilot. If not, we stop and document.
The earlier validation used only Q1 2026 data (3 months, 312 visits). With YTD data through early May (4 months, 513 visits), the model's signal is more modest than initially reported. Three changes were made:
Built using volunteer-contributed code from the Fish Welfare Initiative ARA modelling project. Source pipeline: SkyeNygaard/Fishwelfare-Experiments. Weather data: Open-Meteo. Questions, issues, or feedback → Haven (haven@fishwelfareinitiative.org).
Publicly shareable version of the code, analysis, and reports is at github.com/hkingnob/fwi-eluru-do-alert-public.