Assessing Map Quality:
An Exampling Using Grid Soil Sample Data
by T.G. Mueller
BAE 599
Managing Data Files
(Note that you will be learning how to import
data from a text file into Arc GIS)
-
Open explorer (hold the windows key
and press "E" and release) and create a directory called
"shelby_f16". Keep explorer open. Click
here
to access the data for this exercise (chose to open the zip file rather than
downloading; extract all into the "shelby_f16" directory). The Excel works
books contain soil fertility data sets for Field 16 and the shape file is a
polygon boundary of Field 16. In order to make fertility maps, we will need to
convert these files from Excel worksheets into shapefiles. I could have
given you shapefiles but I want you to learn how to import table data
from Excel into ArcGIS. This is a very useful skill.
- Before we can convert the Excel
worksheets into shapefiles, we must first convert each into database files
(.dbf). In windows explorer (not internet explorer), hold the control key
down, select the three Excel files (Field16_2000.xls, Field16_300, and
Field16_val.xls) with your mouse. Push down the shift key and control key and
double click on any of the three Excel files. All three excel fieles should open all three
files. For each of the three excel files, do the following:
- Select all the cells that are in
the range (press
and hold the <control> and <shift> key simultaneously and type the number
"8"). Go to format cells (<control> and "1").
For category click
number and set decimal places to 8. Click ok. Then save as
(while holding down the alt key, press the "F" key, and then the "A" key) a
"DBF 4" files (click tab and hit the "d" key) and save (hit the tab key and
then hit enter) in the
"Shelby_F16" directory. Agree to keep it in the DBF format (hit
the enter key).
Remember to repeat for the other two excel files (to get to the next excel
file hold the control key down and press the tab key).
When all the Excel files have been
modified, close Excel (hold down the alt key and press the F4 key). Do not
agree to make changes (click "no" three times). Open ArcCatalog and
navigate to the "Shelby_F16" directory. We need to convert each of the
DBF files into shape file. Do the following for each dbf file:
- Click on a dbf file to select
it. Right click, create feature class,
and click on from xy table. Note that the longitude and latitude
fields from the dbf file have automatically become the X and Y fields.
Note that we won't define the coordinate system here because this
version of ArcGIS is buggy and it doesn't work.
For the first dbf file, do the following (otherwise go on to where it says
"click OK"): Under output click on the folder button and navigate to the
"Shelby_F16" directory (but do not select the any file names here) and click
save (do this for the first DBF file only).
- Click OK. Repeat for the
other two dbf files.
- You have just turned your excel
file into a shape file!!!!
- We will create a geodatabase where we will
store the soils and field boundary information. In ArcCatalog. Right click on the "Shelby_F16"
directory (left side of screen) and click new and then personal geodatabase. Change
the name from "New Personal Geodatabase.mdb" to "Soils data.mdb".
- To interpolate the soils data, we need to use
Cartesian coordinates (not latitude longitude which are angular coordinates). We will project
the soils data into into the Kentucky State Plane
North projection. But first we need to define the projection.
- Defining the current
projection. Open ArcToolbox. Under
data management tools, click on Projections. Then double click on
define projection wizzard (shapefiles, geodatabase). Then click on the
yellow folder. Navigate to the "shelby_f16" folder. Hold down the control
key and select all three XYfield16 shape files and the f16.shp file.
Click add. Click next. Then click on select coordinate system.
Click select. Double click geographic coordinate systems. Double click
on world. Double click WGS 1984.prj. Click ok. Note
that we have just defined the data as geographic (latitude longitude). Click next. Click
finish. When you clicked on finish, that projection
information became stored with each of the four shapefiles.
- Changing the projection system. Double
click on Project Wizard. Click on the yellow folder button. While
holding down the control key, multiple select the same four shape files with
your mouse. Click add. Click next. For the output location, click
on the yellow folder button, navigate to the "Shelby_F16" directory", and then click
only once on the "Soils data.mdb"
geodatabase you created earlier. Click open. Click next. Click select
coordinate system. Here we are telling ArcGIS to change the coordinate
system to state plane. Click select. Click projected coordinate systems. Click
state plane. Click "NAD 1983 Feet". Select "NAD 1983 StatePlane Kentucky North
FIPS 1601 (Feet).prj" and click add. Click ok. Click next. While holding the control key down select all
three files with your mouse. Click set
transformation. Here we are telling the computer which
transformation algorithm to use. Click ok to accept the default. Click next. Click
next. Then click finish. You have just changed the projection
system from geographic to the Kentucky state plane north for all four four
files. If you had
problems, click here to download the geodatabase.
Making Post Maps.
- Open ArcMap and click OK to start
with a new empty map. Click on the add data button.
Double click on the "Soils data.mdb" geodatabase. While holding the control key down,
select the four Feature classes.
- Save your map (File, Saveas, Navigate to the
shelby_f16 directory, and save as "F16_FertMap".
- Click off all of the data layers
except f16. You can see the boundary of Field 16. Now click on the
"XYField16_200". The points represent the the locations of the 200 foot grid
soil samples that were analyzed and sent to the laboratory for analysis. A 200
foot grid is a little less than a 1 acre grid. A 1 acre grid would be a 208
foot grid. Your map should look something like this.

- Now right click on the
"XYField16_200" layer in the table of contents. Click on properties.
Click on the symbology tab. Under Show, click on quantities. Under Field value, select
pH. Change classes to 6. Click on the first circle under symbol, hold shift
down, and then click on the last circle under symbol. Right click on the
selected legend items. Click on format labels. Chang the rounding to 1 decimal
point. Click OK. See how the labels changed to 1 decimal. Select all legend items again. Right click on the selected and chose
Properties for selected symbols. Under options set size to 10. Click OK. Set the color
ramp to the the red to yellow to green ramp. Click OK. The red areas indicate
low pH whereas the green areas indicate high pH.

- Rather than going to the OGIS site download the doqq's
for this area from site. Extract them to your
Shelby_F16 directory. I have already built pyramids
and defined the projections for these DOQQ's. Add them to your map.
Right click on the F16 shape file and select properties. Click on the symbology tab. Set the symbol
to hollow. Then your map should look like this.

- Pretty cool, huh. Bet you
didn't think you could do this. I want you to make maps of K and then P. I'm
not going to give you all the details again because I want you to get
practice doing it on your own. Refer back to the previous steps if you
need to do so.
- Save your map (Control and "s").
- You will need to hand a copy of this
map in with your homework. Print your P map on the printer in
the lab or if you are really cool and like printed color maps, you can do the
following.
- Hold down the "alt" key and then press the "PrtScn"
key. Now open up Microsoft word. On the main menu, click Edit and then
paste. Now save this word document to your zip disk and print it on your
best friends color printer. Close word. Go have a drink. You deserve it.
- Click off "XYField16_200" and
click on "XYField16_300". Now you will see the 300 foot grid soil samples. If
you are ambitious, you change the symbols for the 300-foot grid, but I'll
leave that up to you. A fundamental question we must ask is what
sampling density should we use for management. This is a difficult question to
answer. But we can begin to gain insight into this problem by assessing the
quality of the maps. So I'm going to teach you how to assess map quality. But
first I'm going to teach you to interpolate.
- Save your map (Control and "s").
Interpolations
- You may be saying to your self
"interpolation, smerpolation, I'm just an old dirt agronomist. I don't need
any stinking interpolations". Others may be saying "I hope he doesn't ask me
what an interpolation is". Please let me explain what all this interpolation
stuff is about..
- When you grid soil sample on a
regular grid you represent only a small area within that field. For example
if you collect samples on a 2.5 acre grid and you take 5 to 10 samples
within a 30 foot area, you represent only a fraction of the area within the
entire field (2.6% = п 302 / 2.5 * 43560). The problem is that we
manage 100% of that field. So interpolation techniques help us predict
soil properties in the other 97.4% of the field. Hey, one other thing.
You know how the weatherman has color contour maps. Red may indicate hot
whereas blue may indicate cold. You know how those maps are generated?
Interpolation.
- There are a lot of spatial
interpolation methods. 10s or 100s. Actually, I not sure how many.
Perhaps millions. No one knows. And no one really knows which are best for
site-specific soil fertility management. It is beyond the scope of this 1
week learning unit to delve into interpolation theory. But in this exercise
I would like you to have an appreciation of what common interpolators used
in precision agriculture are.
- Using Geostatistical Analyst. Click on tools and then extensions.
Make sure that the geostatistical analyst is checked. Click close. Next,
right click on the tool bar at the top of the screen. Then check the geostatisitcal analyst tool bar
if it is not checked already. Click on the geostatistical analyst.
- In the table of contents click off
"XYField16_300" and click on "XYField16_200". Now on the tool bar, click on the
geostatistical analyst button and then click on the "geostatistical Wizard..."
button. Under data set 1, click on XYField16_200 (this is a 200 foot grid) and
under attribute click on pH. Check the box next to the word Validation. Under
Input data, click on XYField16_val (this is the validation data set) and set
attribute to pH. Under methods make sure Inverse Distance
Weighting (IDW) is selected. It should look like this.

- Click next. Note that the power is
set to 2. It is the default because people often use IDW to a power of 2. Often people refer to this as
inverse distance squared interpolation because a power of two has been shown
to predict pretty well. Click Optimize Power Value. Note that the optimum
power is 1.96 (based on cross validation). See what I mean. Click finish.
Click OK. Now you
should see an interpolated map of pH that looks something like this.

- That's a pretty ugly map because the
interpolation does not follow the field boundaries. Let's improve the quality of the
appearance of our map. Right click on layers and click on properties. Click on
the data frame tab. Under
clip to shape, check Enable and click on Specify shape. Note that the polygon
layer f16 is selected. Click ok. Click ok again. Note that the interpolation
does not completely fill the boundary file. Right click on "Inverse Distance
Weighting" and select properties. Click on the extent tab and set the extent
to "the rectangular extent of f16". Click on Symbology and then
click on Filled Contours. Change the color ramp to the red to yellow to green
ramp like before.
Click OK. Your map should look like this.

- Since you can't see the MrSid
layers, right click on them and remove them from your map. This should speed
things up.
- Save your work (<control> and the
"s" key).
Assessing Map Quality
- Is this map of pH of high
quality? Are the red's really red's or are they an artifact of a small
hotspot in the field. How do you assess map quality? Visually inspect of a map is
not adequate for assessing map quality. Map quality assessment requires
comparing predicted areas in the map with known values. There are two methods
to achieve this. The first is jack-knife analysis (also referred to as
validation analysis or validation with an independent data set). This involves
collecting a second data set and trying to use the first data set to predict
the second.
-
- I collected a validation data set.
Click off all the layers except "xyfield16_val" and "f16". These are the
random locations where I took check samples to validate the interpolations
(like the interpolations we just created.

- Right click on the Inverse distance
weighted map and click on method properties. Click next twice. You should see
something like this.

- The Y axis represents the predicted
value at each validation data point and the x value represents the measured
values for each prediction point. If there is a 1:1 correspondence between
predicted in measured, the data will fall along the dashed 1:1 line.
A high quality prediction will be clustered very close to this line. The blue line is the regression of the predicted as a function of measured. If
the map is of high quality, the blue regression line should be along the
dashed 1:1 line and the points should be scattered very close to the dashed
line. (note that the slope of the regression is about 0.35 and the intercept
is about 4.2). Is this a high quality map? There are also quantitative
methods. Look under prediction errors. The mean (-0.038 units of pH units) is
the bias of the map. Bias squared is accuracy. Bias in this map is very small
(overall very accurate). The root mean squared error (rmse, 0.311 pH units) is
the square root of precision + accuracy. It turns out that when you do the
calculations the precision component is about 0.1 pH units. So most of the
errors in this map are errors of precision (99%) rather than accuracy (1%).
Since the map is relatively unbiased, the rmse is nearly the standard
deviation of the residuals. Therefore the interpretation of of the rmse for
the map (0.311 pH units) can be compared with the standard deviation of pH for
the validation data set (0.325 pH units). This suggests that the map is not a
great improvement over the field average. Look at Mueller et al. (2001)
for greater detail on map errors.
- Note that validation analysis is
also referred to as jack-knife analysis or more rarely as cross validation
with an independent data set.
- Homework questions #1. Do you
think you are better able to assess map quality with graphs of predicted
verses measured or quantitative measures of map error? Why?
- Collecting an additional validation
data set is expensive and time consuming and is impractical for many
applications. So often, cross validation is used rather than jack-knife
analysis. With cross validation, rather than collecting an additional data
set, points from the prediction data set are removed, predicted, and then
added back to the data set until this process is complete. Then the results
plotted and analyses in the same way that results from jack-knife analysis
were analyzed. In ArcGIS, click the back key. The cross-validation result should look like this.

- Note that the slope of the
regression line for cross validation is much lower than for jack-knife
analysis (0.13 compared to 0.35). The rmse is greater for cross validation
than jackknife analysis (0.33 compared with 0.32). Especially with regular
grids, cross validation tends to over-predict errors of prediction because
this procedure does not predict the prediction problem of interest (Mueller et
al., 2002).
- Homework Questions #2. Why does
cross validation tend to over predict errors of prediction with regular grids.
Refer to Mueller et al. (2001).
- On the geostatistical wizard
click back. Set the power to 1.0. Note that I've typed
in the root mean square. I've also done the same for the validation
numbers. Note that I have also written these values in for IDW for the for
the optimal powers, 1.0,
and 1.5. Fill in the rest of this table. Click finish when finished.
Soil pH, 200 foot grid RMSE
Table
| IDW Exponent |
Cross validation (RMSE) |
Validation or jackknife
analysis (RMSE) |
| optimal = 1.9572 |
0.3253 |
0.3211 |
| 1.0 |
0.3277 |
0.3002 |
| 1.5 |
0.3259 |
0.3111 |
| 2.0 |
|
|
| 2.5 |
|
|
| 3.0 |
|
|
| 3.5 |
|
|
| 4.0 |
|
|
| 4.5 |
|
|
| 5.0 |
|
|
- Homework Questions #3. What Inverse distance
weighted (IDW) power gave the lowest Cross validation RMSE? What IDW power
gave the lowest validation RMSE? Are they the same. If you chose your
IDW power based on cross validation (pressing the Optimize Power value button)
are you guaranteed to get the most optimal prediction.
- Click Finish. Click Ok. Click on
"XYField16_200". Click on Geostatistical analyst. Then click on geostatistical
wizard. Under input data set, set attribute to "P". Click the check box next
to validation and set input data to "XYField16_val" and attribute to "P".
Click next. Click optimal Power value.
- Homework Questions #4. Conduct
IDW cross-validation and validation IDW interpolation
for soil P. Fill in this table below for soil P (200 foot grid). How does
prediction quality of soil P compare with pH. How do optimal RMSE values
compare. Did prediction quality vary substantially between pH and P and if so,
why might this occur?
Soil P, 200 foot grid RMSE
Table
| IDW Exponent |
Cross validation (RMSE) |
Validation or jackknife
analysis (RMSE) |
| optimal = |
|
|
| 1.0 |
|
|
| 1.5 |
|
|
| 2.0 |
|
|
| 2.5 |
|
|
| 3.0 |
|
|
| 3.5 |
|
|
| 4.0 |
|
|
| 4.5 |
|
|
| 5.0 |
|
|
- In the geostistical wizard, optimize
the power, and click on finish. Click ok. Right click on "Inverse Distance
Weighting_2"and select properties. Click on the extent tab and set the extent
to "the rectangular extent of f16". Click on Symbology and then
click on Filled Contours. Change the color ramp to the same ramp you have been
using.
Click OK. Your map should look like this.
- Save your work (<control> and the
"s" key).
- Repeat for SMP and K and fill
the tables in below.
Soil SMP, 200 foot grid RMSE
Table
| IDW Exponent |
Cross validation (RMSE) |
Validation or jackknife
analysis (RMSE) |
| optimal = |
|
|
| 1.0 |
|
|
| 1.5 |
|
|
| 2.0 |
|
|
| 2.5 |
|
|
| 3.0 |
|
|
| 3.5 |
|
|
| 4.0 |
|
|
| 4.5 |
|
|
| 5.0 |
|
|
Soil K, 200 foot grid RMSE
Table
| IDW Exponent |
Cross validation (RMSE) |
Validation or jackknife
analysis (RMSE) |
| optimal = |
|
|
| 1.0 |
|
|
| 1.5 |
|
|
| 2.0 |
|
|
| 2.5 |
|
|
| 3.0 |
|
|
| 3.5 |
|
|
| 4.0 |
|
|
| 4.5 |
|
|
| 5.0 |
|
|
- Homework questions #5. How does
map quality and the values for SMP and K compare to each other and with pH and
P. Overall how good were the maps in Field 16.
Scale Issues
- Homework Questions #6. Now do the same thing
for the XYField16_300 files and fill in the tables below. How has
scale of measurement (e.g. 200 foot grid vs. 300 foot grid) affected map
quality. Also graph these relationships. What kind of advice would you give to
a farmer about grid sampling intensity based on the based on the 200 and 300
foot data.
Soil pH, 300 foot grid RMSE
Table
| IDW Exponent |
Cross validation (RMSE) |
Validation (RMSE) |
| optimal = |
|
|
| 1.0 |
|
|
| 1.5 |
|
|
| 2.0 |
|
|
| 2.5 |
|
|
| 3.0 |
|
|
| 3.5 |
|
|
| 4.0 |
|
|
| 4.5 |
|
|
| 5.0 |
|
|
Soil P, 300 foot grid RMSE
Table
| IDW Exponent |
Cross validation (RMSE) |
Validation (RMSE) |
| optimal = |
|
|
| 1.0 |
|
|
| 1.5 |
|
|
| 2.0 |
|
|
| 2.5 |
|
|
| 3.0 |
|
|
| 3.5 |
|
|
| 4.0 |
|
|
| 4.5 |
|
|
| 5.0 |
|
|
Homework Questions #7. How did
scale affect prediction quality. Most (but not all) farmers that grid sample, do
so at a grid increment greater than 300 foot. What grid increment would you
recommend to a farmer.
- Save your work (<control> and the
"s" key).
Interpolation Method (Radial basis function
and kriging)
Now we will try another interpolation procedure
called Radial Basis Function. From now on we will only focus on soil P. Open the geostatistical wizard. Set input to
XYField16_200 and attribute to P. Under validation set input data to XYField16_val
and Attribute to P. Set method to Radial Basis function. Click next. Note that
it has optimized the parameter values based on cross validation errors. Click
next. Enter the RMSE information into the table below. Click next. Enter the
RMSE information into the table below. DONT CLICK NEXT. Click back three times.
Now try kriging. The graph on the right is a semivariogram. It is
used in interpolating the data using kriging. Note as sample points get further
apart (increasing on the x-axis), there are more variable (increasing on the
y-axis). Click next. click next again. Write the RMSE values in the table below.
click next and write the validation RMSE values in the table. Click on the back
arrow three times. Now click on anisotropy. Click on show search direction. Set
the Angler tolerance to 40 and the band width to 2. As you click on the up arrow
by angular direction, watch the lines move on the graph to the left. You
can grab the blue line with your mouse and move it around the circle.
Watch as semivariance values (red dots) above and semivariogram model (yellow
lines) moves as you change the Angle Direction. this is because the model
fits the data better. The question is whether this will result in a significant
improvement in prediction quality. Click next two times.
Soil P, 200 foot grid RMSE
Table
| Method |
Cross
validation (RMSE) |
Validation (RMSE) |
notes quality of
the Predicted vs. Measured |
| Radial Basis
Function |
23.42 |
17.8 |
|
| Kriging
(isotropic) |
23.61 |
18.16 |
|
| Kriging
(anisotropic) |
|
|
|
Homework questions #8. Was there a reduction in
map error (RMSE) when modeling anisotropy. Were the plots of predicted versus
measured very different. How does kriging compare to the other
radial basis function and to inverse distance weighted. What kind of advice would you give to a farmer about
interpolation methods.
I would like you to make a printout of
a P interpolation to hand in with your home work. I will tell you during
class which interpolation technique you should use.
References
ESRI. 2001 Using ArcGIS Geostatistical
Analyst. Redlands, CA.
Mueller, T.G., F.J. Pierce, O.
Schabenberger, and D.D. Warncke. 2001. Map quality for site-specific management.
Soil Sci. Soc. Am. J. 65: 1547-1558
Mueller, T.G., K.L. Wells, G.W. Thomas, R.I. Barnhisel, N.J. Hartsock, A. Kumar,
C.R. Dillon, S.A. Shearer. 2000. Soil Fertility Map Quality: Case Studies in
Kentucky. In. P.C. Robert et al. (ed.) Proc. 5th
international conference on precision Agriculture. ASA Misc. Publ., ASA, CSSA,
and SSSA, Madison, WI.
Pierce, F.J., and P. Nowak. 2000.
Precision agriculture. Encyclopedia of Science & Technology. McGraw-Hill, New
York.
Sawyer, J.E. 1994. Concepts of variable rate technology with considerations for fertilizer application. J. Prod. Agric., 7(2):195-201.