Sites from Random vs. Nonrandom Sampling Designs

Comparison of land use/land cover for sites from random sampling designs
(random) versus other site selection methods (non-random)
Jo Wilhelm, King County, April 4, 2012
Snohomish and King Counties dominate the site selection regardless of whether random or all sites are
used (Figure 1, green points are random, red are non-random). However, if just random sites are
selected, we lose quite a bit of coverage in Clallam, Pierce, Thurston, and Whatcom counties. Jefferson
County has no sites with data in the PSSB; Skagit County only has 3 sites and all are random; Mason
County only has 6 sites total, three of which are random.
Figure 1. Distribution of sites from random sampling designs (random) versus other site selection methods (non random).
1
100
90
80
70
60
50
40
30
20
10
0
241
250
194
200
150
125
100
79
54
7
Ag
1
0
2
0
Wetland
Shrub
6
3
0
Grass
5
Barren
50
47
31
Forest
15
1
3
0
0
Urban
# of Sites
Percent
Land use/cover (LULC) for each county within Puget Sound, and Puget Sound in its entirety (far right bar)
are summarized in Figure 2 with the number of random and non-random sites within each County
overlaid. Percent forest is the dominant LULC throughout Puget Sound. King and Snohomish Counties
have the most sites, but if all sites are considered, Clallam, Kitsap, Pierce, Thurston, and Whatcom
counties each have at least 16 sites with BIBI data available.
15
2
0
NonRandom
Random
Figure 2. LULC for the portion of each County located within Puget Sound and the number of random and non-random sites
with BIBI data stored in the PSSB.
100
90
80
70
60
50
40
30
20
10
0
173
149
82
66
63
49
45
31
17
15
4
0
1
0
0
1
2
3
4
27
29
1
0
5
6
26
11
1
5
7
8
9
1
0
12
0
1
0
4
3
0
8
0
3
4
200
180
160
140
120
100
80
60
40
20
0
# of Sites
Percent
LULC for each WRIA within Puget Sound are summarized in Figure 3, also with the number of random
and non-random sites overlaid. WRIA 8 (Cedar-Sammamish) has the most number of sites.
10 11 12 13 14 15 16 17 18 19
WRIA #
Ag
Wetland
Shrub
Grass
Barren
Forest
Urban
NonRandom
Random
Figure 3. LULC for the portion of each WRIA located within Puget Sound and the number of random and non-random sites with
BIBI data stored in the PSSB. WRIA 1 – Nooksack, 2 – San Juan, 3 – L. Skagit/Samish, 4 – U. Skagit, 5 – Stillaguamish, 6 – Island, 7
– Snohomish, 8 – Cedar/Sammamish, 9 – Duwamish/Green, 10 – Puyallup/White, 11 – Nisqually, 12 – Chambers/Clover, 13 –
Deschutes, 14 – Kennedy/Goldsborough, 15 – Kitsap, 16 – Skokomish/Dosewallips, 17 – Quilcene-Snow, 18 – Elwha/Dungeness,
19 – Ivre/Hoko.
2
180
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
160
Frequency
140
120
100
80
60
40
20
0
1
5
10
20
30
40
50
60
70
80
90
Cumulative Frequency %
Both the histogram and the cumulative frequency distribution of % urban land use in the watershed for
random, non-random, and all sites combined (All) have similar shapes to each other (Figure 4). All three
data sets are over weighted in sites with less than 10% urban land use within the watershed and
underweighted in sites with greater than 90% urban land use.
100
% Urban (Watershed)
Random Freq
Non_R Freq
All_Freq
Random Cum%
Non_R Cum%
All Cumulative %
Figure 4. Histogram of random (blue), nonrandom (red) sites, and all sites combined (purple). The bins range from 0-1, 1-5, 510, and then switch to 10 % intervals (10-20, 20-30, etc.). The number shown on the x-axis represents the upper end of the bin
range.
Urban
Regressing the cumulative frequency distribution of the random sites versus either the non-random or
all sites combined yields a very tight fit (Figure 5).
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
R² = 0.98
R² = 0.9934
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Urban (Random Sites)
vs. Non-Random
vs. All
Figure 5. The cumulative frequencies from the random sites were graphed against the cumulative frequencies of the non2
random (red) and all sites combined (purple). The R values listed are for the best fit line with y-intercept of 0.
3
Another potential consideration is the total surface area sampled (Figure 6, Table 1). The 3 ft2 sites are
limited almost exclusively to Snohomish, King, and Pierce Counties.
Figure 6. The distribution of sites by total surface area sampled.
Table 1. The number of sites for each total surface area sampled based on random and non-random sampling designs.
Surface Area (ft2)
3
8
9
13.5
Total
Non-Random
231
56
196
4
487
Random
254
87
341
Total
485
143
196
4
828
4
Figure 7 attempts to combine the information regarding surface area and site selection design into one
map.
Figure 7. The distribution of sites by random vs. non-random with the size of the dot representing the total surface area
2
sampled (3, 8, 9, or 13.5 ft ).
The histograms based on site selection method and surface area are difficult to interpret, however it
does seem that the nonrandom, 8 sq feet sites generally have less urban in the watershed (Figure 8).
The cumulative frequency diagrams suggest that most of the site distributions are fairly consistent
regardless of site selection method or surface area sampled (Figure 9). However, there are two
exceptions: the nonrandom, 8 ft2 sites and the nonrandom, 9 ft2 sites. The nonrandom 8 ft2 sites have a
5
higher proportion of sites with 10-40% urban and very few sites with > 50% urban compared to the
other categories. In contrast, the non random 9 ft2 sites have a larger proportion of sites in areas with
>70% urban than the other categories.
60
Frequency
50
40
Rdm, 3
Rdm, 8
30
NR, 3
20
NR, 8
10
NR, 9
0
1
5
10
20
30
40
50
60
70
80
90 100
% Urban (Watershed)
Figure 8. Histogram of % urban in the watershed for random (rdm) and non-random (NR) site selection in addition to total
surface area sampled (3, 8, and 9 sq feet).
100%
90%
Cumulative Frequency
80%
Rdm, 3sf
70%
Rdm, 8sf
60%
NR, 3sf
50%
NR, 8sf
40%
NR, 9sf
30%
AllRandom
20%
AllNonRandom
All
10%
0%
0
20
40
60
80
100
% Urban (Watershed)
Figure 9. Cumulative frequency diagram of % urban in the watershed for random (rdm) and non-random (NR) site selection in
addition to total surface area sampled (3, 8, and 9 sq feet). All sites combined are also graphed (purple).
Table 2 summarizes the distribution of sites by jurisdiction.
6
2
Table 2. Breakdown of site selection method (random vs. non-random) and total surface area sampled (3, 8, 9, 13.5 ft ) by
jurisdiction.
Jurisdiction
Adopt-A-Stream Foundation
City of Bainbridge Island
City of Bellevue
City of Bellingham
City of Everett
City of Federal Way
City of Issaquah
City of Kirkland
City of Lake Forest Park
City of Redmond
City of Seattle
Clallam County
King County - DNRP
King County - Roads
Kitsap County
Pierce County
Skokomish Tribal Nation
Snohomish County
Thurston County
Washington State Department of Ecology
Grand Total
3
7
Non-Random
8
9
13.5
Random
3
8
6
17
14
6
11
9
7
4
33
31
72
1
39
82
4
157
25
59
21
33
3
2
14
32
231
14
56
196
Other Info for consideration:
Side by side comparison of all sites (left) and random only (right)
7
97
4
254
28
87
Total
7
6
17
14
6
11
9
7
4
33
35
73
255
82
46
33
3
131
14
42
828
Site Selection – Follow up
Jo Wilhelm, King County, 4/5/2012
Cumulative Frequency Diagrams
First I added the 28 Ecology sites (Random, 8 ft2). The Ecology data set is weighted heavily towards sites
with very little urbanization within the contributing basin and has a unique land use distribution
compared to the other data sets (Figure 1):
100%
90%
80%
Frequency
70%
60%
50%
40%
30%
20%
10%
0%
1
5
10
20
30
40
50
60
70
80
90
100
% Urban (Watershed)
Random
Non-Random
All
Ecology_Random
Figure 1. Cumulative frequency diagram of percent urban within the contributing watershed for random (n=487), non-random
(341), all (828), and Ecology’s random sites (28).
8
Secondly, I added all possible permutations onto the cumulative frequency figure. Good luck making
sense of this (Figure 2)! There was a method to my madness in terms of naming and color coding. The
first name refers to site selection (random, nonrandom, or all); the second name refers to the surface
area (3, 8, 9, 8+9 ft2, or all). Randoms are all graphed in shades of green; non randoms in shades of red;
the combined random and non-random in shades of purple/blue. Of note: combining the 8 and 9 ft2
sites does put those lines (both for non-random and all; there were no random 9 sites) in the mix with
the ‘typical’ cumulative frequency distribution. If I send this document as a word file, I am noticing that
the graph is a living creature. You should be able resize the figure and also click on a data set and
delete what you don’t want to see. Therefore, feel free to play with what you see below to pick out
the most relevant information.
100%
90%
Cumulative Frequency
All, All
80%
Rdm, All
70%
Rdm, 3sf
Rdm, 8sf
60%
Ecology: Rdm, 8
NR, All
50%
NR, 3sf
40%
NR, 8sf
30%
NR, 9sf
NR, 8+9
20%
All, 3
10%
All, 8
All, 8+9
0%
0
20
40
60
80
100
% Urban (Watershed)
Figure 2. Cumulative frequency diagram of percent urban within the contributing watershed for varying combinations of site
2
selection (random, nonrandom, all) and surface area (3, 8, 9, 8+9 ft , and all).
9
Thirdly, here is my reduced chart (Figure 3). I would recommend going forward with recalibrating the
BIBIs on all the 3 ft2 data (blue diamond) and all the 8 plus 9 ft2 data (blue square) regardless of site
selection method. If we want to control the distribution of landuse used for recalibration, we could just
select a subset of the data by randomly drawing a specified # of sites from each 5% urban bin.
100%
90%
80%
Cumulative Frequency
70%
60%
All, All
Rdm, All
50%
NR, All
40%
All, 3
30%
All, 8+9
20%
10%
0%
0
20
40
60
80
100
% Urban (Watershed)
Figure 3. Cumulative frequency diagram of percent urban within the contributing watershed for varying combinations of site
2
selection (random, nonrandom, all) and surface area (3, 8, 9, 8+9 ft , and all). My proposal would be to recalibrate the BIBI
2
2
based on All, 3 for the 3 ft methods and All, 8+9 for the 8 ft methods. Both of these cumulative frequency distributions follow
a similar distribution to all the data sets together.
10
Elevation
There are several sites with BIBI data that are above 500 m (Figure 4). I don’t have an elevation shape file for all of Puget Sound,
so I don’t have a quick way to do a landuse analysis for the Puget Sound basin with elevation < 500m. I’ll see if someone inhouse can help, but if not we’ll have to consider whether we want to see if Peter is able and willing to help.
Figure 4. Elevation in meters of sites with BIBI and landuse data.
11