DOI: 10.1016/j.watres.2018.09.009
Scopus记录号: 2-s2.0-85053081189
论文题名: Are all data useful? Inferring causality to predict flows across sewer and drainage systems using directed information and boosted regression trees
作者: Hu Y. ; Scavia D. ; Kerkez B.
刊名: Water Research
ISSN: 431354
出版年: 2018
卷: 145 起始页码: 697
结束页码: 706
语种: 英语
英文关键词: Boosted regression trees
; Causality
; Data-driven model
; Directed information
; Flow prediction
Scopus关键词: Combined sewers
; Drainage
; Forecasting
; Forestry
; Information use
; Location
; Rain
; Rain gages
; Regression analysis
; Trees (mathematics)
; Boosted regression trees
; Causality
; Data-driven model
; Directed information
; Flow prediction
; Numerical models
; calibration
; drainage water
; measurement method
; numerical model
; parameterization
; prediction
; rainfall
; regression analysis
; sensor
; sewage
英文摘要: As more sensor data become available across urban water systems, it is often unclear which of these new measurements are actually useful and how they can be efficiently ingested to improve predictions. We present a data-driven approach for modeling and predicting flows across combined sewer and drainage systems, which fuses sensor measurements with output of a large numerical simulation model. Rather than adjusting the structure and parameters of the numerical model, as is commonly done when new data become available, our approach instead learns causal relationships between the numerically-modeled outputs, distributed rainfall measurements, and measured flows. By treating an existing numerical model – even one that may be outdated – as just another data stream, we illustrate how to automatically select and combine features that best explain flows for any given location. This allows for new sensor measurements to be rapidly fused with existing knowledge of the system without requiring recalibration of the underlying physics. Our approach, based on Directed Information (DI) and Boosted Regression Trees (BRT), is evaluated by fusing measurements across nearly 30 rain gages, 15 flow locations, and the outputs of a numerical sewer model in the city of Detroit, Michigan: one of the largest combined sewer systems in the world. The results illustrate that the Boosted Regression Trees provide skillful predictions of flow, especially when compared to an existing numerical model. The innovation of this paper is the use of the Directed Information step, which selects only those inputs that are causal with measurements at locations of interest. Better predictions are achieved when the Directed Information step is used because it reduces overfitting during the training phase of the predictive algorithm. In the age of “big water data”, this finding highlights the importance of screening all available data sources before using them as inputs to data-driven models, since more may not always be better. We discuss the generalizability of the case study and the requirements of transferring the approach to other systems. © 2018 Elsevier Ltd
Citation statistics:
资源类型: 期刊论文
标识符: http://119.78.100.158/handle/2HF3EXSE/112403
Appears in Collections: 气候减缓与适应
There are no files associated with this item.
作者单位: Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, United States; School for Environment and Sustainability, University of Michigan, Ann Arbor, United States
Recommended Citation:
Hu Y.,Scavia D.,Kerkez B.. Are all data useful? Inferring causality to predict flows across sewer and drainage systems using directed information and boosted regression trees[J]. Water Research,2018-01-01,145