Filtering input Global Solar Irradiation Data
The purpose of performing a filter on the data that is read into the program is that abnormalities appear in data sets which will distort the interpolation technique and hence provide inaccurate results. There are a number of different steps taken which have been incorporated in the matlab script.
- Data value less than 0 W/m^2 – station set to 0 W/m^2
- Multiple data entries – selects the value without the error code
In addition to performing a straightforward abnormality omission process, the data is censored for extreme results that are unlikely to occur. This searches for outliers - observations that appear to deviate from the data set. Outliers are often indicative of measurement errors hence the method is appropriate to remove extremities from the data set.
Removing Outliers
Firstly the data set is required to be sorted in numerical order to execute a statistical analysis. Performing analysis with quartiles of the distribution, the Interquartile Range (IQR) can be used to exemplify the data determining whether any extreme observations skew the data; the interquartile range is a vigorous method that aims to omit outliers from the data set. Notably, this method is used within the well-known Six Sigma process which shows a percentage defective value of 0.00034%.
Quartiles are simply four quantiles – points taken at regular intervals from the cumulative distribution function (CDF) of a string variable. The ranking of the first, second and third quartile is 25%, 50% and 75% of the variable string length respectively. Linear interpolation is used to compute quartiles that are positioned between two points.
The mathematical method used to check for outliers:
- A matrix of time-steps has been created as the data sets contain daily total solar radiation at specific stations.
- The code has been limited to using data from the met stations selected
- In the event of an abnormality in the data such as:
- Data value less than 0 W/m^2 – station set to 0 W/m^2
- Multiple data entries – selects the value without the error code
In addition to performing a straightforward abnormality omission process, the data is censored for extreme results that are unlikely to occur. This searches for outliers - observations that appear to deviate from the data set. Outliers are often indicative of measurement errors hence the method is appropriate to remove extremities from the data set.
Removing Outliers
Firstly the data set is required to be sorted in numerical order to execute a statistical analysis. Performing analysis with quartiles of the distribution, the Interquartile Range (IQR) can be used to exemplify the data determining whether any extreme observations skew the data; the interquartile range is a vigorous method that aims to omit outliers from the data set. Notably, this method is used within the well-known Six Sigma process which shows a percentage defective value of 0.00034%.
Quartiles are simply four quantiles – points taken at regular intervals from the cumulative distribution function (CDF) of a string variable. The ranking of the first, second and third quartile is 25%, 50% and 75% of the variable string length respectively. Linear interpolation is used to compute quartiles that are positioned between two points.
The mathematical method used to check for outliers:
where: Q1 and Q3 are the first and third quartile respectively.
The Lower fence is the "lower limit" and the Upper fence is the "upper limit" of data, and any data lying outside this defined bounds can be considered an outlier.
The Lower fence is the "lower limit" and the Upper fence is the "upper limit" of data, and any data lying outside this defined bounds can be considered an outlier.
Graham Cairns
University of Edinburgh, 2013
University of Edinburgh, 2013