This repository contains the R code and data for estimating the burden and cost of food-borne illness, ten priority pathogens and four sequel illnesses in Australia circa 2024.
The code estimates the burden (cases, hospitalisations, and deaths) and estimates the associated costs (direct medical costs, lost productivity, pain and suffering, and fatality). Estimates can be run by executing Rfiles/BuildCostTable.R. Cost tables are outputted as .csv files, but can also be explored in an interactive shiny app (app.R). The app also allows users to estimate costs for foodborne disease outbreaks based on estimated number of cases, hospitalisations, and deaths using the same costing framework for the main estimates.
Web-hosted version of interactive table can be found at https://angusmclure.shinyapps.io/AusFBDCosting/
The initial development of this code was commissioned by Australian Department of Health, and managed by Food Standards Australia New Zealand (FSANZ). Updates have been commissioned by FSANZ. Code development is by Angus McLure at the Australian National University.
This code was used to calculate the estimates presented in the FSANZ report The annual cost of foodborne disease in Australia (published 2022). The model and results are available as a peer-reviewed journal article in Foodborne Pathogens and Disease.
This app uses the best data available to the ANU team in early 2025 to estimate burden and cost of disease around the year 2024. There were many data inputs, some up-to-date as of the time of analysis, but others dating back to the early 2000s. Going forward it may be beneficial to update some or all of these data inputs to make better estimates of burden and cost.
The estimation process is complex but involves three broad steps: estimation of burden, estimation of number of cost items, and multiplication of cost items by unit costs. Each of these steps require data, many of them specific to each pathogen being considered. The lists below describe the data, the source and list the names of the files where these data are stored in the model, with notes on how to update them.
You can update estimate for inflation without updating or re-running the underlying model. Try launching the app locally (instructions below) or visit FSANZ Foodborne Disease Costing Model. The app has an options for adjusting for inflation on the first page. If the FY quarter you want to update to is not listed as an option, you will need to update Data/CPI-ABS.csv with the latest ABS data, then relaunch your app locally. See end of this document for details about updating Data/CPI-ABS.csv.
A full list of sources for these inputs and the storage type are listed at the end of this document.
All data inputs are stored in one of three ways:
- Data stored and read in the raw, unaltered format in which it is published or provided to authors (e.g. population estimates, CPI).
- Data that is stored in spreadsheets in a custom format (created and maintained by authors)
- Data that is written into model code
Note: The storage type, source, and instructions to update each data type are listed at the end of this document.
Data storage 1 makes up the largest portion of inputs in terms of sheer number of datapoints. Most data of this type are very simple to update.
Data storage 2 was used for data for which we did not have access to a nicely formatted machine-readable datasets, or when we were plucking a one or two numbers from a very large dataset. This category includes all of the unit costs and a few other inputs
Data storage 3 was used for data which would require (often major) additional research studies to update. They are stored by pathogen in a format which is designed to be human and machine readable. These data types include the number of cost items (medications, tests, GP/ED/specialist visits) per case, underreporting and domestically acquired multipliers.
All of the data of storage types 1 and 2 are in the ./Data directory. All data of storage type 3 are in RFiles/Diseases.R. The same file also indicates which method (and therefore data sources) should be used to estimate the burden of the disease for each pathogen or disease. Details for each data type are in the final section of this document.
- Download/access the latest version of the data and save in folder “./Data/”
- For most files you replace old file with new file reusing the old name.
- For AIHW hospitalisation data we need a file for each FY, so we add the latest file and keep the older ones. Make sure to use the consistent naming convention. E.g. the FY2018-19 data is called
Principal_diagnosis_data_cube_2018-19.xlsx
- Sometimes the data custodian will change their data format:
- If the data format hasn’t changed: we’re done!
- If data format has changed: either
- update matching function in “./Rfiles/loadData.R” to read in new format (recommended), OR
- manually reformat the new dataset into the old format (not recommended)
- Source latest data
- Manually enter/copy values into appropriate spreadsheet in
./Data/:- For most files: replace old values with new values
- For NNDSS notifications: paste in new rows after (in addition to) old rows
- Source new data (typically requires a new study!)
- Update matching data in
.Rfiles/Diseases.R
To include an additional pathogen in the costing requires extensive additional information.
First it would require it's own entry in RFiles/Diseases.R. This would tell the model which frameworks are being used to estimate burden (cases and hospitalisations). It would also need to include a wide range of assumptions about symptom type and severity, treatments and healthcare seeking outcomes, underreporting and domestically acquired multipliers and probability of resulting in sequel or ongoing illnesses, and ARDG codes. Many of the inputs would require literature searches, original studies, or expert elicitation.
Second, you would also need to ensure that the matching input data (i.e. deaths and notifications if applicable) are available in the appropriate files in ./Data/. While AIHW data on hospitalisation already includes all ARDG codes, the deaths dataset only includes selected causes of death and may not cover your disease of interest.
The main script in the programme is Rfiles/BuildCostTable.R. Once data and data reading code in ./Rfiles/loadData are up to date you are ready to run the model to get new estimates of burden and cost.
Note: If you are just updating an existing estimate for inflation, you can skip this step and go straight to running the shiny app.
-
Open the project file
FSANZ-ANU-Foodborne-Illness-Costing.Rproj. This should automatically also activaterenvwhich will ensure that you have the correct versions of every package you need. -
Open
Rfiles/BuildCostTable.R. You will be making a few small edits to opening ~15 lines of this script before running the whole thing. -
Set the reference quarter for cost estimates.
EstQ <- "Dec-24" #quarter for overall cost estimates
-
Set the reference year for deaths, cases, and hospitalisations:
YearDeaths <- 2024 # Year for determing population for population-adjusting estimates of deaths (does not change data source for deaths) YearCases <- 2024 # Year that notification data and population estimates are taken from for estimating cases YearHosp <- 2023 # Financial year that hospitalization data is taken from --- FY2024 will probably not be available until Nov 2025 given historical reporting timelines
-
Set the random seed for random number generation. If you don't make changes to code or inputs, running the code twice with the same random seed shouldn't change the results. This is helpful for reproducibility and to identify if changes to the code are making any difference to the outputs.
set.seed(20251013) #I suggest choosing the date of the last run on which inputs/code changed in ways that effected the outputs
-
Run the entire script
Rfiles/BuildCostTable.R. This may take ~10 minutes.- If there are any errors: Fix errors then clear your workspace before trying to re-run script
rm(list = ls(all.names = TRUE))
- If there are no errors: All model outputs have been updated!
- If there are any errors: Fix errors then clear your workspace before trying to re-run script
Note: If you running into errors and find you have to run the model multiple times to fix them, you could speed up run-time be reducing the number of random draws taken. This is set on line ~10 and is currently set to
10^6(i.e. one million). When you have identified any bugs you can set to the defaul values again.
All model outputs are stored in the folder ./Outputs/. All outputs include estimates and uncertainty intervals. There are four human readable outputs:
CostPerCase.csv: The estimated cost per caseCostTable.csv: Detailed breakdown of costs by pathogen, disease, age group, and detailed cost itemCostTableCategories.csv: Costs by pathogen, disease, age group, and four broad cost categoriesEpiTable.csv: Summaries of estimated burden (Cases, Hospitalisations and Deaths) by pathogen, disease, and age group
There is also an additional output AusFBDiseaseImage-Light.RData, which is not human readable, but can be loaded into R to access a (lightweight) version of the full model outputs. This will not be useful most of the time, but is required for the shiny app to run the Outbreak tab.
After you have run the model with updated data successfully (no error messages) you are ready to launch the shiny app.
The simplest option is to launch a local version of the app that is only running on your computer using Rstudio. Simply:
- Open
app.R - Click 'Run app' in the top bar of the script editor pane.
- That's it!
If you want to make a version of this app that can be accessed by anyone using a URL you'll need to host the app on a server. See shinyapps.io for free and paid hosting options and guides for how to set this up. A copy of the app is maintained at angusmclure.shinyapps.io/AusFBDCosting/.
The outputs in ./Outputs are all designed to be easily machine readable so can be used to create Figures and tables programmatically. For example code and outputs for figures See ./Report/Figures.R and ./Paper and Presentations. For example code and outputs for summary tables see ./Report/BuildReportTables.R and the many csv files in the same folder (e.g. ./Report/DetailedCostTable.All pathogens.Initial Disease.csv).
These data inform the underlying estimate of burden (cases, hospitalisation, and deaths) that form the core of the costing model.
-
Population estimates (2024, ABS) Storage format 1
These are used to calculate the total number of gastro cases and the burden of the non-notifiable pathogens.
National estimates are stored in
AustralianPopulationByAge.xlsThe data are read in bygetAusPopSingleYearAge()and further processed bygetAusPopAgeGroupinRfiles/loadData.R.State and territory population estimates are stored in files with names following this format
PopulationAgeYear-XXX.xlsxwhere XXX is the 2 or 3 letter abbreviation for the jurisdiction. The data are read in bygetCasesStateAgeGroup()inRfiles/loadData.R.All data are in the format published by the ABS: 3101.0 Australian Demographic Statistics; Tables 51-59).
Assuming that the ABS continues to use the same format going forward, one should be able to update the files by downloading the latest version of this and replacing the file. Modifications to
Rfiles/loadData.Rmay be required here if the ABS format changes. -
Number of notifications for nationally notifiable pathogens (2024, NNDSS)* Storage format 2
Each disease has a
caseMethodattribute in.Rfiles/Diseases.Rthat determines how cases numbers are estimated. IfcaseMethod = 'Notifications', then notifications for the year must be provided in theDatadirectory.For nationally notifiable diseases, these data are stored in
Data/NNDSS Disease by AgeGroup and Year.xslxand read into the model bygetCasesNNDSSAgeGroupinRfiles/loadData.R. There are rows for the name of the disease and reporting year, followed by columns for each 5 year age-group (and 85+). Updating the data requires adding rows for each notifiable pathogen for each year for which estimates are required.For some data where only (high quality) data were available from certain states, we used older notifications data for those states and performed population adjustment to get national estimates circa 2024. The data are read into the model and population adjusted by
getCasesStateAgeGroupinRfiles/loadData.R. It reads population data for each jurisdiction (from separate files for each jurisdiction inData) both for the reporting years (2013-2015) and the estimation year (2024) by state, sex, and agegroup and performs adjustments accordingly. The format of the data are the same once read into the model and are combined into a single objectNotificationsAgeGroup. Currently Yersinia entercolictica and STEC use this method. Updating this data will likely be on a case-by-case basis. It might be appropriate in the future for estimates to use national notifications data which will simplify the process. -
Number of foodborne gastroenteritis events per person per year from the second national gastroenteritis survey (2008, NGSII) Storage format 3
This is hard-coded into
Rfiles/Diseases.Ras thegastroRateas anrdistobject specifying the uncertainty distribution for this number. -
The proportion of gastroenteritis cases due to specific pathogens from the Victorian water quality study (2001, Hellard) — used only for non-notifiable diseases. Storage format 3
This is hard-coded into
Rfiles/Diseases.Rfor each disease. Each disease has acaseMethodattribute that determines how cases numbers are estimated. IfcaseMethod = 'GastroFraction', then the disease also requires an attributegastroFractionwhich is anotherrdistobject specifying the fraction of gastro that is due to this pathogen. -
Seroprevalence studies for Toxoplasma gondii infection collected 2005-2007 from Busselton, Western Australia (2020, Molan) Storage format 3
This data is used to estimate the force of infection of the disease (units of infections per person per year) by fitting a very simple SIR model to the disease assuming the population is at endemic equilibrium and the force of infection is constant over time. The fitting is performed separately from this code. Each disease has a
caseMethodattribute that determines how cases numbers are estimated. IfcaseMethod = 'Seroprevalence', then the disease also requires an attributeFOIwhich is ardistobject specifying the force of infection estimate. Currently this is only used for Toxoplasmosis. -
Number of hospitalisations (FY 2023 and earlier, AIHW) Storage format 1
These data are used to estimate the number of hospitalisation for each disease. Each disease in
Rfiles/Diseases.Rhas an attribute calledhospMethod. IfhospMethod = 'AllCases'then we assume that every case is hospitalised and therefore do not require additional data. IfhospMethod = 'AIHW'then the reported number of hospitalisations with the principle diagnosis matching the codes listed in attributehospCodesis extracted from AIHW datasets, then multiplied byunderdiagnosisand divided byhospPrincipalDiagnosisattributes for the disease. If there are no hospitalisations reported for the estimation year, the program will use the average number of reported hospitalisations listed across all datasets available to the programme. AIHW reports by financial year and the current version of the software has data for financial years 2015-16 through 2022-23 (latest data as of writing). For circa 2024 estimates we used FY 2022-2023 as the starting point, but expanded to all datasets if there were no hospitalisations reported for FY 2022-23. The data are stored in./Datawith names of the formatPrinciple_diagnosis_data_cube_20XX-YY.xlsx. The data are read in bygetHospitalisationsAgeGroupinRfiles/loadData.Rwhich assumes that all relevant data will be named with exact matching format. File format is exactly as downloaded from AIHW website (e.g. Admitted patient care NMDS 2019-20 (aihw.gov.au)). Updating the model will require downloading the latest data and changing the file name to the matching format. If the format that AIHW uses changes in future releases there may be a requirement to re-writegetHospitalisationsAgeGroup. -
Deaths (2014-2023, ABS) Storage format 1
Each diseases listed in
Rfiles/Diseases.Rhas a an attributemortCodeslisting the codes associated with this disease. Given the small number of deaths, there are sensitivities around these data and they are only available on request. Furthermore, as cause of death coding is not always complete, the number of deaths are underreported. We proceed by applying under-diagnosis multipliers but if the reported number of deaths were 0 for a given year, even multiplying by a large multiplier would have no effect. To deal with these complexities we use data from across 10 years (2014-2023) to estimate mortality incidence rates. Data is read in bygetABSDeathsinRfiles/loadData.Rwhich reads from the fileData/Causes of Death data.xlsxand returns death rates by age-group for different causes of death that can be multiplied by population to get estimates of deaths per year. To update for future years this can either be left as is (in which case population adjustment will be performed for the target year), or reworked with newer data. -
Multipliers (Various sources and assumptions) Storage format 3
There are a number of multipliers used to estimate burden. These are hard-coded for each disease in
Rfiles/Diseases.Ras attributes of each disease. Though different diseases use different multipliers depending on the method used to estimate the burden, these multipliers aredomestic(fraction of cases that are domestically acquired)underreporting(factor to account for underreporting of cases)foodborne(fraction of cases that are foodborne),hospPrincipalDiagnosis(fraction of hospitalisations where the disease or related code is listed as the principle diagnosis)underdiagnosis(factor to account for under-diagnosis of the disease as the cause of death or hospitalisation). There are many different sources for each of these multipliers, and they can be updated by editing the matching attributes inRfiles/Diseases.R.
This data is used to estimate the number of cost items. There are many cost items. Hospitalisation and deaths are two key cost items that are outlined in the previous section, however other cost items are calculated from the estimates of burden (e.g. GP visits, doses of medication, time off work, days of illness).
-
Direct costs Storage method 3
There are many kinds of direct costs considered (GP/ED/Specialist visits, medications and testing) and these are sufficiently different for each disease that they are hard-coded directly into
Rfiles/Diseases.R. -
Duration of illness Storage method 3
These are also inputted directly for each disease into the
Rfiles/Diseases.Runder thedurationattribute, with entries for hospitalised and non-hospitalised cases
-
Days of lost work Storage method 2
For non-hospitalised cases the time off work was assumed to be the same for all initial infections and estimated from the Second National Gastroenteritis Survey (NGSII). The data is stored in and read in by
getMissedDaysGastrofromRfiles/loadData.R. If this data were updated this would probably have to be reworked through the whole model as the data outputs is quite specific to NGSII. The number of days taken off work for hospitalised cases is based on the mean length of stay, manually extracted from the same AIHW files reporting the number of hospitalisation discussed in the previous section. This is combined with workforce participation rates (and average daily earnings) by age-group reported adapted from standard reports from the ABS. The data is stored in a custom format inData/WorkforceAssumptions.csvand read into the model bygetWorkforceAssumptions()fromRfiles/loadData.R. The data can be updated for the latest values by editingData/WorkforceAssumptions.csv. If future updates wish to perform calculations for multiple timepoints,Data/WorkforceAssumptions.csvshould be be modified to include the published workforce data value for a range of years andgetWorkforceAssumptions()should be modified to read in data for all years in a format where the program can select from a list of years.
In the final step we multiply cost items by unit prices.
-
Cost of premature mortality Storage method 2
Premature mortality is costed using the Value of Statistical Life published by the Department of Prime Minister and Cabinet, in 2024. This is stored in
Data/ValueStatisticalLife.csvand read in bygetValueStatisticalLife()fromRfiles/loadData.R. If future updates wish to perform calculations for multiple timepoints,Data/ValueStatisticalLife.csvshould be be modified to include the published workforce data value for a range of years andgetValueStatisticalLife()should be modified to read in data for all years in a format where the program can select from a list of years. -
Direct costs Storage method 2
As described above, the cost items for each diseases are listed in
Rfiles/Diseases.Rin a custom format. This format includes codes that can be used to lookup costs. The costs themselves are stored inData/Costs.xlsxand read in bygetCosts()fromRfiles/loadData.R. The source for each data point is noted in the spreadsheet.In summary, MBS and PBS fees are taken from data published by Medicare (2024), and hospitalisation costs are linked to a AR-DRG (or an average of two such costs) most appropriate for the condition. IHCPA publishes costs for each AR-DRG for public hospitals that can be used to update this data. Calculating costs requires extracting the National Efficient Price (NEP) and the Price weight for each AR-DRG considered. Both NEP and weight tables are updated annually and are available at the IHCPA website. See
./Update/CompareDRG/for historical data tables and a comparison of AR-DRG costs over time.If future updates wish to perform calculations for multiple timepoints,
Data/Costs.xlsxshould be updated to include the published costs for a range of years andgetCosts()should be modified to read data for all years in a format where the model can select from a list of years. Another option may be to transition to storing this data in format 1, i.e. read in directly from datasets provided by Medicare and IHCPA directly. However, there are some costs (e.g. for generic classes of drugs or non-PBS medicines) that are not published in public datasets and would need to be estimated an imported into the model separately complicating this approach. -
Average weekly earnings (2024, ABS) Storage format 2
This is a key cost that is multiplied by number of days of lost work. Data is stored in
Data/WorkforceAssumptions.csvand read in bygetWorkforceAssumptions()fromRfiles/loadData.R. If future updates wish to perform calculations for multiple timepoints, theData/WorkforceAssumptions.csvshould be modified to include the published workforce data value for a range of years andgetWorkforceAssumptions()should be modified to read in all of these data in a format such that the program can select from a list of years. -
Willingness to pay to avoid pain grief and suffering associated with disease estimated from a discrete choice experiment in Australia (2023, CHERE) Storage format 2
The willingness to be pay values (elicited in units of per day or per year) are multiplied by total estimated duration of illness (estimated as above) to get total cost of pain and suffering. As these values come from a research study, any update will likely update the format of the data and thus the data importation and the estimation process. However, currently the WTP values and descriptions of uncertainty are stored in
Data/WTPvaluesUncertainty.xlsxand read into the model bygetWTP()inRfiles/loadData.R. As WTP values are reported in 2017 dollars, these costs need to be adjusted for inflation. -
Inflation (2024 and earlier, ABS) Storage format 1 The base costs estimates are for December 2024 and expressed in 2024 dollars. Most unit costs were sourced late 2024 or early 2025 so do not require inflation adjustments. However WTP values for grief, pain, and suffering are estimated in 2017 dollars and needed to be adjusted up to Dec 2024. Additionally the shiny app has the option to adjust all costs to the latest quarter without updating all the underlying data. CPI data is stored in
Data/CPI-ABS.csvin the format published by the ABS (Accessed June 2025).Assuming that the ABS continues to use the same format for these downloads and continues to include dates back to Dec 2017, the
Data/CPI-ABS.csvcan be replaced by the latest release to include the latest CPI rates. However, if the ABS changes the format, the input file can be updated by appending the CPI values for the latest financial quarters in a consistent format. Note that the program uses the 'change from previous quarter' column of the file, so any appended CPI rates should not be annualised.
Note: Inflation adjustment in the app only effects the cost estimates presented in the shiny app. It does not change any of the estimates in the core model or saved outputs of the model in
Outputs/orReport/. Even in the shiny app it does not update any of the underlying data, estimates of burden, or perform population adjustment to the latest population data.