View from space of sea ice floes and eddy currents near the coast of Russia’s Kamchatka Peninsula
This view from the International Space Station shows sea ice floes and eddy currents near the coast of Russia’s Kamchatka Peninsula. Credit: NASA JSC Earth Science and Remote Sensing Unit

Data characterizing the ocean are inherently estimates and are therefore uncertain. This is true of all in situ and remotely sensed observations—of, say, sea surface temperature or sea level—as well as of outputs and forecasts from numerical models and of analysis products resulting from the synthesis of observations and models.

The typical meaning of uncertainty with respect to data is a familiar concept for scientists: A numerical value quantifying the state of a variable can be associated with one or more ancillary numerical values characterizing the possible error. However, it is essential to distinguish between quantifying error and quantifying uncertainty. The error of an estimate, defined as the difference between the estimate and the true value of a variable, cannot be known—if the true value were known, the estimate could be corrected. In contrast, the uncertainty of an estimate can be assessed using various statistical, theoretical, and numerical methodologies.

An ocean data set may otherwise be of the highest scientific quality, but if quantified uncertainties do not accompany it, it will not be useful to scientists or other stakeholders.

In oceanography and climate science, the nature of uncertainty associated with different types of data—for instance, direct and indirect observations versus analysis products—has been semantically and philosophically debated [Parker, 2016]. This sort of debate is helpful because inadequate understanding and treatment of data uncertainty persist in the research community, decreasing the potential usefulness of—and confidence in—many data sets.

As a requirement for proposing, planning, and implementing ocean observing, modeling, and analysis systems, we advocate that resulting data should be accompanied by clearly described and easily accessible uncertainty information. To put it bluntly, an ocean data set may otherwise be of the highest scientific quality, but if quantified uncertainties do not accompany it, it will not be useful to scientists or other stakeholders [Moroni et al., 2019].

Uncertainty Completes the Data

Efforts to reconstruct changes in global mean sea level since the first known sea level measurement was taken in the mid-19th century (i.e., the observational era) and to attribute these changes to driving factors are a field in which uncertainty quantification is at the core of the scientific investigation process. Understanding these changes requires measurements of ocean temperature, cryospheric and terrestrial water mass distributions, and sea surface height at numerous locations and times. From these observations, the contributions of global ocean thermal expansion and global ocean mass change to global mean sea level change can be determined.

The sea level budget is considered “closed” when the sum of these independent components agrees with direct measurements of total sea level, meaning that the body of existing observations is sufficient to interpret the causes of sea level change. Only recently, thanks to the assembled efforts of many individual studies across the interdisciplinary fields that contribute to sea level science, has the sea level budget been closed within quantified uncertainties—an achievement that testifies to our adequate understanding of processes influencing sea level, and their uncertainties, at the global scale [Frederikse et al., 2020].

This example illustrates that determining the magnitude of uncertainties associated with ocean data is necessary not only so that these data can be used meaningfully in scientific investigations—uncertainty quantification makes the data complete. In other words, uncertainty quantification is necessary to evaluate the confidence, or, equivalently, the doubt, one can have in ocean data.

Challenges with Ocean Data

Uncertainty is a major focus of metrology, the science of measurement, and standards for uncertainty quantification are well cataloged in documents in that field. These documents should serve as starting points for oceanographers to lay out a strategy for quantifying the uncertainties in their data [e.g., Joint Committee for Guides in Metrology, 2008].

Some concepts that are applicable to bench measurements are difficult to translate to the oceanographer’s laboratory—the ocean—because the ocean and the climate system in which it is embedded are constantly changing.

Yet some concepts that are applicable to bench measurements in metrology, such as being able to repeat observations under the exact same conditions, are difficult to translate to the oceanographer’s laboratory—the ocean—because the ocean and the climate system in which it is embedded are constantly changing. For example, repeat sampling of hydrographic properties (e.g., temperature, salinity, oxygen) in some remote parts of the ocean has occurred only after decades, if at all. And some high-resolution, global, numerical ocean models can be run only once because of prohibitive computational costs, so the statistical distributions of their output under different initial conditions are unknown.

There are also challenges distinct to oceanography and related fields. Satellite measurements offer indirect estimates of ocean surface properties that are calibrated and validated with in situ observations, yet these “cal/val” exercises are burdened by multiple sources of uncertainty. One such source is representation error, which can arise because pointwise in situ measurements (e.g., of sea surface temperature) do not always agree with satellite measurements, which represent averages of physical quantities over the satellite’s ground footprint. In this case, the measured values disagree because they represent different quantities, and natural variability at short spatial scales masquerades as a possible error in either measurement. Uncertainty related to representation error can be understood only by combining geophysical theories, extensive observations, and methodological knowledge.

In oceanography, as in other fields, the example of representation errors illustrates the necessity to identify sources of errors correctly and to strive to characterize them with appropriate and traceable uncertainties. This is a challenge because the classification of uncertainties (or errors) based on established statistical principles does not necessarily and readily map onto the idiosyncratic classifications used in ocean science. As an example, biases (systematic errors) and random errors are often conflated in ocean observations for lack of appropriate knowledge.

Another example is in climate and ocean modeling, for which there is a need to consider separately structural or model uncertainties and uncertainties due to the chaotic behavior of the Earth system [National Research Council, 2012]. Further, when models and observations are combined to generate state estimates or forecasts, confidence in their outputs can be accurately assessed only if observational uncertainties are available and are carefully propagated through the machinery of data assimilation, in which models and their output are repeatedly updated to incorporate new observations [Leutbecher and Palmer, 2008].

Effective communication of uncertainties among observationalists, modelers, and theoreticians is thus essential. This communication requires coordinated efforts and standardized protocols among these groups—a tall order considering that different oceanographic disciplines have traditionally been insular and have used distinct vocabularies to describe uncertainty. Such disconnects might be remedied if ocean scientists put greater emphasis on training in statistical sciences and collaboration with experts in that field.

It’s perhaps surprising that seminal community documents about ocean observing systems do not contain recommendations for quantifying, propagating, and communicating uncertainty estimates.

A Gap in Guidance

Researchers in observational oceanography have made great strides in understanding oceanic circulation through nationally and internationally coordinated efforts. It’s perhaps surprising, then, that seminal community documents about ocean observing systems do not contain recommendations for quantifying, propagating, and communicating uncertainty estimates. Many do not even mention uncertainty or error. Examples of such documents include the Global Ocean Observing System 2030 Strategy (produced by the United Nations Educational, Scientific and Cultural Organization’s Global Ocean Observing System (GOOS) expert panel in 2019) and the Framework for Ocean Observing (which resulted from the 2009 OceanObs’09 Conference).

Such omissions are problematic because these documents guide high-level, programmatic funding allocations. Observational and modeling research efforts should target ocean variables with the largest (or unknown!) uncertainties for study, so that these uncertainties can be reduced. If uncertainty quantification is not included explicitly in guiding documents, however, efforts to incorporate uncertainty measures into observing and analysis systems are unlikely to be prioritized.

Community-Driven Solutions Are Emerging

In recent years, solutions to some of the challenges highlighted above have begun to emerge from groups within the research community [e.g., Matthews et al., 2013], and discussions of the importance of quantifying and communicating uncertainty in ocean data are becoming more widespread [Merchant et al., 2017]. Collectively, these efforts aim at improving the understanding, derivation, communication, and utilization of the uncertainties in ocean in situ, remote sensing, and model products.

In 2013, the U.S. Climate Variability and Predictability Program (US CLIVAR)—which contributes to both the U.S. Global Change Research Program and the World Climate Research Program’s international CLIVAR program—released a 15-year science plan that includes a goal to better quantify uncertainty in observations, simulations, predictions, and projections of climate variability and change. Since then, US CLIVAR has funded activities promoting ocean uncertainty quantification, including working groups on Large Initial-Condition Earth System Model EnsemblesEmerging Data Science Tools for Climate Variability and Predictability, and Ocean Uncertainty Quantification. This latter group (which we are leading), named OceanUQ, is a research community platform that aims at developing strategies and best practices for ocean uncertainty quantification through informational blog posts, Web-based educational resources, a forum, and, ultimately, community meetings and trainings.

The international CLIVAR program has organized the CLIVAR Global Synthesis Observation Panel’s Ocean Reanalysis Intercomparison Project to evaluate historical ocean state estimates with reliable uncertainties. Other ongoing international efforts include the Ocean Best Practices System, which exists to develop and promote well-established and standardized methods across ocean research, operations, and applications.

Meanwhile, databases like the International Quality-Controlled Ocean Database are being expanded to facilitate assessment of uncertainties using in situ observations. And Earth Science Information Partners has established the Information Quality Cluster to collect information on key aspects of uncertainty of Earth science data [Moroni et al., 2019].

There has also been progress toward summarizing the state of knowledge on uncertainty for ocean variables. For example, GOOS publishes estimates of random uncertainty and uncertainty in the bias of measurements of Essential Ocean Variables (e.g., sea surface temperature, sea state) in its specification sheets. Workshops and special sessions at community meetings, including at the OceanObs’19 conference, have also led to specific recommendations for improving ocean uncertainty quantification, such as providing more training and publishing best practices documents.

Opportunities on the Horizon

Ocean sciences have now entered the realm of “big data.” Some of the many benefits of this development include increasing our knowledge of natural variability and thus of the statistical random nature of ocean data. But the vast increase and diversification of ocean data also exacerbate challenges related to uncertainty. It does not have to be this way, however. Instead, this explosion of data availability is an opportunity to apply not only established statistical methods but also novel data science methods.

In the past, analyzing uncertainties in data has typically occurred as a separate step, apart from the data analysis itself, in the research process—a step that is too often skipped. But we can now apply data-driven methods, such as Gaussian process regression, that can estimate uncertainties alongside the estimation of the quantities of interest themselves [Kuusela and Stein, 2018]. At the same time, the accelerating movement in Earth sciences toward providing open data and open software through online platforms can help demystify uncertainty quantification for many researchers by providing explicit methods and how-to instructions.

By continuing current efforts and capitalizing on new opportunities to develop standardized guidance for ocean scientists, we can make valuable and hard-earned ocean data more accessible and usable.

With the profusion of new data and methods, new concerns are also emerging. Ocean data can be highly monetized—for the purpose of ensuring the safety of human operations at sea, for example—and thus, commercial enterprises have entered the arena of collecting ocean measurements. For proprietary and profitability reasons, these companies may be reluctant to communicate the uncertainty or doubt that should be associated with their data products. As commercially sourced ocean data products are acquired and provided for research use, they, too, should include quantification of uncertainties to improve their utility in scientific investigations.

We argue that cultural change is needed in the oceanographic community. Ocean data–generating endeavors of all flavors should include a plan for uncertainty quantification alongside their standard data management plan. In addition, published studies should explicitly address how data uncertainties impact results and conclusions. Yet communication of uncertainty in data sets or products need not necessarily be formulated in terms of highly technical probability density functions or statistical concepts and should be tailored to intended users. For example, reports from the Intergovernmental Panel on Climate Change [2021] demonstrate how uncertainty representation can take a variety of forms, both quantitative and qualitative, depending on the context and intended use of (and audience for) the data.

By continuing current efforts and capitalizing on new opportunities to develop standardized guidance for ocean scientists and to promote the importance of quantifying and communicating uncertainty, we can make valuable and hard-earned ocean data more accessible and usable. We can also increase the confidence that scientists and others have in analyses—of sea level change and many other important ocean processes—performed with these data.

Acknowledgments

The authors would like to thank the members of the US CLIVAR Ocean Uncertainty Quantification working group for conversations that led to the improvement of this piece.

References

Frederikse, T., et al. (2020), The causes of sea-level rise since 1900, Nature, 584(7821), 393–397, https://doi.org/10.1038/s41586-020-2591-3.

Intergovernmental Panel on Climate Change (2021), Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change, edited by V. Masson-Delmotte et al., Cambridge Univ. Press, Cambridge, U.K., www.ipcc.ch/report/sixth-assessment-report-working-group-i/

Joint Committee for Guides in Metrology (2008), Guide to the expression of uncertainty in measurement, Rep. 100:2008, Bur. Int. des Poids et Measures, Sèvres, France, www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf.

Kuusela, M., and M. L. Stein (2018), Locally stationary spatio-temporal interpolation of Argo profiling float data, Proc. R. Soc. A474(2220), 20180400, https://doi.org/10.1098/rspa.2018.0400.

Leutbecher, M., and T. N. Palmer (2008), Ensemble forecasting, J. Comput. Phys.227(7), 3,515–3,539, https://doi.org/10.1016/j.jcp.2007.02.014.

Matthews, J. L., E. Mannshardt, and P. Gremaud (2013), Uncertainty quantification for climate observations, Bull. Am. Meteorol. Soc., 94(3), ES21–ES25, https://doi.org/10.1175/BAMS-D-12-00042.1.

Merchant, C. J., et al. (2017), Uncertainty information in climate data records from Earth observation, Earth Syst. Sci. Data, 9, 511–527, https://doi.org/10.5194/essd-9-511-2017.

Moroni, D. F., et al. (2019), Understanding the various perspectives of Earth science observational data uncertainty, Figshare, https://doi.org/10.6084/m9.figshare.10271450.

National Research Council (2012), A National Strategy for Advancing Climate Modeling, Natl. Acad. Press, Washington, D.C., https://doi.org/10.17226/13430.

Parker, W. S. (2016), Reanalyses and observations: What’s the difference?, Bull. Am. Meteorol. Soc., 97(9), 1,565–1,572, https://doi.org/10.1175/BAMS-D-14-00226.1.

Author Information

Shane Elipot (selipot@rsmas.miami.edu), Rosenstiel School of Marine and Atmospheric Science, University of Miami, Fla.; Kyla Drushka, Applied Physics Laboratory, University of Washington, Seattle; Aneesh Subramanian, Department of Atmospheric and Oceanic Sciences, University of Colorado Boulder; and Mike Patterson, US CLIVAR Project Office, Washington, D.C.

Citation: Elipot, S., K. Drushka, A. Subramanian, and M. Patterson (2022), Overcoming the challenges of ocean data uncertaintyEos, 103, https://doi.org/10.1029/2022EO220021. Published on 12 January 2022.
This article does not represent the opinion of AGU, Eos, or any of its affiliates. It is solely the opinion of the author.
Text © 2022. The authors. CC BY-NC-ND 3.0