By **Timothy C. Anderson, MAI, Msc., CDEI, MAA**

In my on-going attempt to unravel some of the mysteries of real estate appraisal, as well as to give appraisers an idea of what it is that CU is and does, I have studied some actual sales in a mid-western state and then summarized those sales data, in graphic form, in the Figures below.

The exhibits that appear below are from the statistical functions in Excel®. There is some rather scary looking algebra on them. But do not worry: most of it is for comparison purposes. You do NOT have to understand how the computer arrived at those formulae (i.e., the algebra and calculus behind them) to understand the topics in this article. The math behind what those formulae tell us is not really all that difficult, but it is for advanced classes. This is an article, not an advanced class.

To understand this article, you do not even need to understand statistics. Just follow the narrative and the thrust of the charts will become clear to you.

First up is an explanation of the data the chart’s use. These data are from 2013 and 2014, so are recent. The appraiser who amassed them knows what s/he is doing, so there are no reasons to question his/her professional integrity or ability. These are actual sales data, culled from the MLS. All have closed escrow and transferred title from the seller to the buyer. The sales prices are all cash equivalent (i.e., adjusted for non-realty concessions as necessary). All of these sales data are from the same subdivision, but that subdivision has houses of varying ages, sizes, qualities of construction & maintenance, and so forth. In other words, the houses here are all subject to the same market forces, but clearly differ one from another.

Since the data were not property-specific (i.e., not all of them would be applicable to a *hypothetical* subject), what we look at in this article are the subdivision’s *trends*. Specifically, we analyze if there is any correlation between (a) the sales price per square foot and the year built; (b) between sales prices per square foot and total size; (c) between sales prices per square foot and the date of sale; and, finally (d) the correlation (if any) between the absolute sales price and the days on market.

Just to jog your memory about statistics, in any comparison there has to be a *basis* for that comparison. This basis is called the *independent* variable. It is always shown on the graph’s x-axis (e.g., the horizontal line or the base line). The *dependent* variable is always shown on the y-axis or the vertical line.

This article’s topic is the *correlation* between the dependent and independent variables. On the Figures that follow, you will see lots of blue dots and then lines of various colors. *What you are looking for is how well the lines (specifically the red line) track with the blue dots*. When the (red) line and the blue dots are close to each other, there is what is called *high correlation* (as well as low variance). All other things being equal, we look for high correlation, typically above 50% (and really, a correlation close to 90% is more-or-less ideal).

When there is a high correlation it means the data explain will the relationship between the independent variable and the dependent variable. When that correlation is low, however, it means the two variables really do not explain each other. We will see examples of these relationships as the article progresses.

Another purpose of this article is to illustrate (but not explain – too short for that) what it is CU does with all that data with which we have provided it in the past. When CU flags an appraiser’s entry in a field, it is because it has gone thru an analysis such as one of these (although *far* more in depth, breadth, and width), and then determined that the appraiser’s response did not correlate properly with the other data it has in its database. This lack of correlation does not mean the appraiser is “wrong”. It merely means the appraiser needs to explain how/where s/he derived that particular response. While there are many ways to respond to such a request, a graph such as one of those below, goes a long way toward that explanation.

Take a look at Figure 1.

It looks at the relationship between the sales price per square foot of the properties (y-axis) and the year in which a particular house was built (x-axis).

First, look at the red line. Notice its trend is slightly *uphill* from left to right. This means that newer properties tend to sell for more per square foot than to older properties. All other things being equal, you would expect this relationship. However, as you will also notice, relatively few of the blue dots (the sales price per square foot of the component sales) touch the red line. This means there is a lack of correlation (i.e., a high variance) between the two variables. In fact, the formula at the figure’s upper left-hand corner shows a correlation of only 1.85%, which is essentially no correlation at all.

What this statistical analysis tell us is that, assuming a particular property were to have been constructed between 1999 to 2007 (and all 77 in the sample were), its age at the date of sale really has nothing to do with its sales price per square foot, since they do not vary in all that much.

Therefore an age & condition adjustment for a property built within these years is likely not necessary. True, this contradicts the traditional thinking of many appraisers. But are appraisers incapable of change when the need for that change stares them in the face?

Now look at the purple line (ignore the green one, since it is a variation on the red one). While the math behind the purple line is more demanding than the math behind the red line, it is more explanatory, too. What this says is that the market current as of the date of appraisal was willing to pay more for houses built in 2002 that for houses built much before or after that date. However, they do not explain why this is so.

However, despite the fact the purple line touches more of the blue dots than the red line, it shows a correlation of only 13% between year built and sales price per square foot. While this latter line explains the market better than the red line, it does not explain it all that much better.

This Figure, therefore, indicates that, given solely *these* data, there really is no compelling reason to make an adjustment based solely on a house’s date of construction. Given different data, or using less than 77 sales, the graph might have indicated a different result.

Figure 2, however, tells a different story. Looking solely at the blue lines, it is easy to deduce that as size increases (the x-axis), sales price per square foot (the y-axis) decreases. From looking at the dots, however, that there is an overall decrease is clear, but the *rate* of decrease is not. Now look at the red line (ignore the other two since they are essentially the same as the red line). You’ll notice that, not only does the red line touch a lot of the blue dots but that, of the blue dots that don’t touch it, a whole bunch of them are really close to it. This indicates that, given this sample of data, there is a *high correlation* between a house’s square footage and its sales price per square foot. In fact, the math behind the red line (not shown here, but included by reference) shows there is an 82% correlation between the two.

In fact, *the formula in the far upper right-hand corner of the Figure quantifies that change in value*. It says that there is a $0.0302 change in sale price per square foot for every 1 square foot of variance in size from the average square footage of this sample (in this case, the average size is 1,998 square feet). In fact, *these* data indicate that for an average size house (i.e., 1,998 square feet in this sample), the market recognizes an adjustment of $91 per square foot [(-0.0302[1]x * 1,998) + 151.25 = $90.90].

Therefore, were an appraiser to make an adjustment of $15 per square foot for size differences in *this* market, based on *these* sample sales transactions, then CU would (rightly) flag it. Why so? Because the market data clearly indicate *this market* does not support an adjustment at $15 per square foot for this difference. This analysis is based on these sales, not on traditional rules-of-thumb. Obviously, using different sales, or using less than the 77 sales here would provide different results.

Now let us consider changes in sales prices per square foot as they relate to changes in sales dates. In other words, as time progressed over the time period these sales covered, how (if at all) did sales prices per square foot change? Since the sales date is fixed, it is the independent variable (the x-axis), whereas the sales price per square foot is the y-axis. See Figure 3. For purposes of this discussion, we ignore the really funky formulae and concentrate on the “simple” one (the one that calculates the red line).

Note in this Figure there are lots of the blue dots that are relatively *far* away from the red regression line. Again, this indicates the data were all over the place, thus show a great deal of variance[2] or error. Therefore, in Figure 3, there is a lot of error. It also means the data are not really reliable at predicting anything other than a trend (i.e., as time passes, value per square foot increases). The red regression line also shows the correlation of these data in predicting anything is really low at 4.2%, which is no correlation at all. So these graphs, and the data behind it, are something you would toss into the workfile and forget.

Now move on to Figure 4. It shows the relationship between total sales prices and confirmed days on market. Look at the red regression line. Not a lot of the blue dots touch it, so there is a lot of error there. Its correlation of <1% indicates there is no more linear correlation between these variables than the operation of mere random chance would explain.

However, look at the *green* regression *curve*. This is a *lot* more complex to calculate, but as you can see it touches a lot more of the blue dots (approximately 26% of them, as a matter of fact). What this graph demonstrates is that relative inexpensive properties (<$150,000) spent a lot of time on the market before going under contract, whereas more expensive properties ($160,000 to $200,000) spent relatively fewer days on the market before they sold. Then, at about $200,000+, their higher prices meant they appealed to a smaller submarket of buyers, thus their days on the market increased back to between 140 and 160 days. So what does this relationship mean to an appraiser?

On page 1 of the 1004 form, it means the “typical” range of values in the neighborhood is from about $160,000 to $200,000, with the sales outside of this range as outliers. It also means that, were the appraiser to conclude a value outside of the $160,000 to $200,000 range, the appraiser would also be concluding a longer-than-average sales period (here the *average* was ±61days). However, given the low correlation coefficient of 26%, it also means that there are reasons *other than days on the market*, that explain difference in sales prices. Thus, whatever conclusions the appraiser were to draw from this graph merit the use of a liberal seasoning of salt.

So what are the take-aways here? The only graph that really tells us anything is that of Figure 2, given that it shows an 82% correlation. Therefore, the appraiser can confidently conclude that mere square footage alone accounts for 82% of prices differences. Further, given this high degree of correlation, the appraiser could use the regression formula (-0.0302x * 151.25)[3] as one fairly accurate tool to use in forming a value conclusion. Note, however, it is no more than a tool.

What does all of this have to do with CU? CU’s built-in algorithms[4] do all of the above, plus a whole lot more, and have millions of data points to draw on, not 77, which is what we had here. It can compare all of these data points with each other one variable at a time, or it can look at the “big picture” and compare them all at once via a multi-variable regression analysis. While a multi-variable regression analysis is far from infallible, and will not work under some circumstances, if FannieMae can use tools such as this one, why should appraisers not use a similar tool, too? (If you have Excel®, then activate the statistics pack, and you will have all of the statistical computing power and potential you will ever need).

Although some of the regression tools that are popping up all over the web are appealing, Excel® offers everything you need, right at your fingertips. All you will need is a few hours study time to get up to snuff on it, and then it is virtually free. There are courses that are available with different education providers that can walk you through learning how to use it if that is the way you learn best, and even some online tools not related to appraisal that are very inexpensive and accessible (think udemy as well as Microsoft itself).

On a closing note, note the technology to take the appraiser *out* of the mortgage-lending picture has been in FannieMae’s hands for at least the last five years (and the math has been available since the late 1700s). The data *and* technology to do so exist now, and will only become keener in the future. This article was written with the residential appraiser in mind, to offer a simplified version of how Excel works and a sample from the real world where it is applied.

If appraisers do not start to adapt and change, and keep to the status quo of three or four sales on a grid, without providing some support for their analysis, why should FannieMae and local lenders continue to pay appraisers millions of dollars per year to do what FannieMae can already do essentially for free with literally a few keystrokes of CU? Algorithms already “grade” our appraisals. Right now they have the capacity to do everything we do now (for the most part), but CU can do all of this much faster, cheaper and more compliantly. FannieMae is well ahead of is in this race. We appraisers can catch-up with technology and thereby show our clients we are the ones to be doing their appraisals. We should be doing them, not brokers, not AVMs, not unlicensed desk-monkeys, and most certainly not FannieMae whose lenders have a vested interest in getting the numbers it needs to make the loans.

[1] The fact that this coefficient is *negative* means the line slopes downward from upper left to lower right. If this coefficient were *positive*, the opposite would be true.

[2] In stats-speak “variance” is also called “error”. This does not mean there is something amiss or the math is wrong somewhere. It means, instead, that when a point falls well above or below the regression line, it is in error by that distance from the regression line.

[3] In this formula, the “x” is the square footage you want to insert.

[4] Without going into a lot of calculus or philosophy, an algorithm is a “set of rules that precisely defines a sequence of operations”. A computer program is an algorithm. CU uses algorithms. Fortunately, appraisers do not have to write these algorithms since they are built into Excel®. See http://en.wikipedia.org/wiki/Algorithm.