# 8.1 Residential Property Price Data Analysis

```GE2213
Understanding Uncertainty
and Statistical Reasoning
Regression Analysis
Case Study: Residential Property Price
Data Analysis
1
Outline
•
•
•
•
•
•
•
Overview of regression analysis
Simple linear regression model
Determination coefficient
Correlation coefficient
Multiple regression model
3 special regression models
Case study: Residential property price data analysis
2
Overview of Regression Analysis
• Input
• Dependent (or called response) variable, π
• The variable we wish to estimate or predict
• Independent (or called explanatory) variable, π
• The variable used to connect (sometimes, also explain) the
dependent variable
• Output
• A linear function that allows us to
• Provide estimation: Estimate the average value of the dependent
variable based on the corresponding value of the independent
variable, usually in the form of a confidence interval
• Provide prediction: Predict the value of the dependent variable
based on the corresponding value of the independent variable,
usually in the form of a prediction interval (which must be wider
than the corresponding confidence interval)
3
Overview of Regression Analysis
• It is quite possible to develop a relation (e.g. linear,
quadratic) between X and Y that there is no causality (i.e.
no reason and effect) at all. The relationship exists owing
to casualty (i.e. happening by chance).
• e.g. A straight line may appear to provide a good model
for relating monthly output Y of a steel mill to the weight X
of computer printouts appearing on a manager's desk
during the month, but their causal relationship is tenuous.
A 3rd variable may have caused the change in both X and Y,
producing the relationship that we have observed.
• Causality can be inferred only when analysis uncovers
some plausible reasons for its existence. For instance,
workers work faster as the queue length increases, and so
a linear relationship is at least plausible.
4
Overview of Regression Analysis
Cont’d
5
Simple Linear Regression Model
• A simple linear regression model consists of two components
• Regression line: A straight line describes the dependence of a
single variable π on one variable πΏ
• Random error: The unavoidable deviation is between the actual
value and the expected value
Population Intercept
Dependent
Variable
ππ =
Population Slope Coefficient
Independent
Variable
Random Error
π·π + π·π πΏπ + πΊπ
Regression
Line
6
6
Simple Linear Regression Model
Cont’d
π
ππ
ππ
ππ
ππ = ππ + ππ πΏπ
πΏπ
• π0 represents the sample intercept
• π1 represents the sample slope coefficient
πΏ
7
Simple Linear Regression Model
– Method of Least Squares
Cont’d
• π0 and π1 are estimated using the method of least squares,
which minimizes the Sum of Squared Errors (SSE)
π
π
ππ2 =
πππΈ =
π=1
π
(ππ − ππ )2 =
π=1
[ππ − π0 + π1 ππ ]2
π=1
8
Simple Linear Regression Model
– Method of Least Squares
Cont’d
• The formulae of π0 and π1 can be obtained by first partially
differentiating SSE with respect to π0 and π1 , and next solving
π
π
2
π
π
π=1
ππ0
π
= −2
π=1
ππ − π0 + π1 ππ
=0
and
π
π
2
π=1 ππ
ππ1
π
= −2
ππ ππ − π0 + π1 ππ
=0
π=1
simultaneously
9
Simple Linear Regression Model
– Method of Least Squares
Cont’d
• The formulae of π0 and π1 are
π0 = π − π1 π
and
π1 =
π
π=1(ππ − π)(ππ −
π
2
(π
−
π)
π
π=1
π)
10
Simple Linear Regression Model
– Determination Coefficient
Cont’d
• Measures the proportion of variation in π that is removed by
the independent variable π, using the simple linear regression
model
• Sample determination coefficient
π2
=
removed variation
total variation
=
π
2
π=1(ππ −π)
π
2
π=1(ππ −π)
• A unitless number between 0 and +1, inclusive
• “Magnitude” measures how good the regression model fits
• Closer to +1, the better the regression model fits
• Also known as goodness-of-fit measure
11
Simple Linear Regression Model
– Correlation Coefficient
Cont’d
• Describe the trend and strength of a linear relationship
between two numerical variables
• Sample correlation coefficient
either π = − π 2 or π = + π 2
• A unitless number between -1 and +1, inclusive
• “Sign” indicates the trend (down = negative r values / up =
positive r values) of a linear relationship
• “Magnitude” measures the strength of a linear relationship
• Closer to -1, the stronger negative linear relationship
• Closer to +1, the stronger positive linear relationship
• Closer to 0, no linear relationship
12
Scatter Plots against
Correlation Coefficients
π
π
π
r = -1
πΏ
πΏ
r = -0.6
π
Cont’d
r=0
πΏ
π
r = +0.6
πΏ
r = +1
πΏ
13
Simple Linear Regression Model +
Correlation Coefficient in Calculator
• Data set
X
5
5
5
10
10
10
15
15
15
20
20
20
Y
1.6
2.2
1.4
1.9
2.4
2.6
2.3
2.7
2.8
2.6
2.9
3.1
14
Determination or Correlation
Coefficient – Example
• Data set: The sales of vitamin supplement and the
Month
1
2
3
4
5
6
7
8
9
10
Sales
(HK\$’000)
963.500
893.000
1057.250
1183.250
1419.500
1547.750
1580.000
1071.500
1078.250
1122.500
(HK\$’000)
37.427
40.850
41.431
44.842
51.788
63.760
63.572
44.686
48.959
50.056
15
Determination or Correlation
Coefficient – Example
Cont’d
• Scatter diagram
16
Determination or Correlation
Coefficient – Example
Cont’d
• Sample determination coefficient between sales of vitamin
π 2 = 0.8617 (you should use a hand calculator to check)
which indicates that 86.17% variation in π removed by
the independent variable π, using the regression model
• Sample correlation coefficient between sales of vitamin
π = +0.9283 (you should use a hand calculator to check)
• There is a very strong positive linear relationship between
sales of vitamin supplement and advertising expense
17
Simple Linear Regression Model
– Example
Cont’d
• We wish to examine the linear dependency of the sales of
• 10 monthly records were randomly selected
Month
1
2
3
4
5
6
7
8
9
10
Sales
(HK\$’000)
963.500
893.000
1057.250
1183.250
1419.500
1547.750
1580.000
1071.500
1078.250
1122.500
(HK\$’000)
37.427
40.850
41.431
44.842
51.788
63.760
63.572
44.686
48.959
50.056
• Find a straight line function that best fits the data
18
Simple Linear Regression Model
– Example
Cont’d
• The sample simple linear regression model
π = −15.339 + 24.765π (you should use a hand calculator
to check)
where π = sales (in HK\$’000)
π = advertising expense (in HK\$’000)
• For each HK\$1000 (= 1 &times; 1000) increase in advertising expense,
the model estimates or predicts that the sales of vitamin
supplement increases by HK\$24,765 (= 24.765 &times; 1000)
19
Simple Linear Regression Model
– Example
Cont’d
• What is the estimated or predicted amount of sales if the
π = −15.339 + 24.765π
= −15.339 + 24.765 42
= 1024.791 (in HK\$’000, i.e. HK\$1,024,791)
20
Simple Linear Regression Model
in Microsoft Excel
• Step 1: Add the “Analysis ToolPak”
• File ο  Options ο  Add-Ins ο  Click “Go” at the bottom ο  Check
“Analysis ToolPak” and click “OK”
• “Data Analysis” button can be found in the “Data” menu bar
21
Simple Linear Regression Model
in Microsoft Excel
Cont’d
• Step 2: Develop regression model
• Invoke the Data Analysis function, select “Regression” and click
“OK”
22
Simple Linear Regression Model
in Microsoft Excel
Cont’d
• Step 3: Input the necessary information
23
Simple Linear Regression Model
in Microsoft Excel
Cont’d
• Step 4: Obtain the output
|π|
π2
variation in sales of vitamin
supplement can be removed
expense
24
π0 and π1
Multiple Regression Model
• A multiple regression model is to connect one dependent
variable with two or more independent variables in a linear
function
Population Intercept
Population Slope Coefficients
ππ = π·π + π·π πΏππ + π·π πΏππ + β― + π·π² πΏπ²π + πΊπ
Dependent Variable
Independent Variables
Random Error
• π0 represents the sample intercept
• π1 , π2 … , ππΎ represent the sample slope coefficients
25
Multiple Regression Model
Cont’d
π
ππ
ππ
ππ
ππ
ππ = ππ + ππ πΏππ + ππ πΏππ
(π1π , π2π )
π2
π1
26
Multiple Regression Model – Example
Cont’d
paid to salespersons affects the sales of vitamin supplement
• Fit a multiple regression model for estimating or predicting the
sales of vitamin supplement using both plausible independent
variables
Month
1
2
3
4
5
6
7
8
9
10
Sales
(HK\$’000)
963.500
893.000
1057.250
1183.250
1419.500
1547.750
1580.000
1071.500
1078.250
1122.500
(HK\$’000)
37.427
40.850
41.431
44.842
51.788
63.760
63.572
44.686
48.959
50.056
Bonus Paid
(HK\$’000)
23.098
23.628
27.157
29.12
28.217
32.116
29.432
30.569
23.841
27.138
27
Multiple Regression Model – Example
Cont’d
• Excel output (see the Excel file)
88.6% of the variation in sales
of vitamin supplement can be
π 2 removed by the two
independent variables included
in the regression model
28
πππππ  = −279.603 + 21.216 π΄ππ£ + 15.940(π΅πππ’π )
Multiple Regression Model – Example
Cont’d
πππππ  = −279.603 + 21.216 π΄ππ£ + 15.940(π΅πππ’π )
For each HK\$1000 increase in
estimated or predicted sales of
vitamin supplement increased by
HK\$21,216; holding bonus paid
to salespersons unchanged
For each HK\$1000 increase in
bonus paid to salespersons, the
estimated or predicted sales of
vitamin supplement increased by
expense unchanged
29
Multiple Regression Model
– Quick Exercises
Cont’d
• What is the estimated or predicted sales of vitamin
supplement if the advertising expense is HK\$60,000 and
bonus to salespersons is HK\$30,000?
πππππ  = −279.603 + 21.216 π΄ππ£ + 15.940(π΅πππ’π )
60
30
• In this month, the company has an extra HK\$100,000
budget, how should the money be used? Advertising or
paying bonus to salespersons or both?
30
Special Regression Model –
• A regression model connecting one dependent variable with
two or more independent variables in a quadratic function
ππ = π½0 + π½1 π1π + π½2 π1π 2 + ππ
• The second independent variable is the square of the first
independent variable
• It is suitable when a scatter diagram indicates a non-linear
relationship, with one of the following 4 shapes
π
π
π
π
31
ο’2 &gt; 0
πΏπ
ο’2 &gt; 0
πΏπ
ο’2 &lt; 0
πΏπ
ο’2 &lt; 0
πΏπ
Special Regression Model –
Dummy-Variable Model
Cont’d
• It is useful when categorical variable is used as independent
variable
HK
π
Island
ππ = π½0 + π½1 π1π + π½2 π2π + ππ
π0 + π2
• For example
• π = Sales of vitamin supplement (HK\$’000)
• π1 = Advertising expense (HK\$’000)
• π2 = Location of shop =
Kowloon
π0
π1
0 ππ πΎππ€ππππ
1 ππ π»πππ πΎπππ πΌπ ππππ
• For π2 = 0 (i.e. Kowloon)
ππ = π0 + π1 π1π + π2 0 = π0 + π1 π1π
• For π2 = 1 (i.e. Hong Kong Island)
ππ = π0 + π1 π1π + π2 1 = (π0 +π2 ) + π1 π1π
Same
slopes
32
Special Regression Model –
Interaction-Variable Model
Cont’d
• Interaction variable is created by multiplying two or more
independent variables together
• Usually, one of those independent variables is categorical
• It allows regression models for different categories to have
different intercepts and different slopes
ππ = π½0 + π½1 π1π + π½2 π2π + π½3 π1π π2π + ππ
• Without interaction term (i.e. π½3 = 0), effect of π1 (or π2 ) on π
is measured by π½1 (or π½2 )
• With interaction term(i.e. π½3 ≠ 0), effect of π1 (or π2 ) on π is
measured by (π½1 +π½3 π2 ) [or (π½2 +π½3 π1 )] π
π0 + π2
33
π0
π1
Application of Regression Analysis
– Centa-City Index
• Why Property Price Indices?
• Investors and potential home-buyers require indicators to study
the current movement of property prices in Hong Kong
• The creation of the “Centa-City Index” aims to provide such
information to the general public as a source of reference on
trends in Hong Kong’s property market
• How are the Index constructed?
• Multiple regression analysis is used to determine the effect of
various attributes on property price
• Attributes are considered such as floor area, years of occupancy,
location, direction, view, floor level, and so on.
• Who originally developed the “Centa-City Index”?
• Jointly developed by Centaline Property Agency Limited and
Department of Management Sciences in City University of Hong
Kong
34
Application of Regression Analysis
– Centa-City Index
35
Application of Regression Analysis
– Centa-City Index
36
Application of Regression Analysis
– Centa-City Index
37
Application of Regression Analysis
– Centa-City Index
• http://www.cb.cityu.edu.hk/ms/work/hkcci/
38
Case Study:
Residential Property
Price Data Analysis
39
Case Background
• A new ocean side complex consists of two adjacent and
connected 8-floor buildings
• The complex contains 200 flats of equal size (where 96 flats
are in Building 1 and 104 flats are in Building 2)
• 96 full ocean-view flats are those facing the ocean (e.g. 101,
103, 105, 107, 109 in Building 1; 113, 115, 117, 119, 121, 123,
125 in Building 2). 40 full ocean-view flats in Building 1 also
have a good view of the pool (e.g. 101, 103, 105, 107, 109).
• 8 partial ocean-view flats (i.e. 111, 211, 311, 411, 511, 611,
711, 811 in Building 1) are those having part of their ocean
view blocked by Building 2
• 96 bay-view flats are those facing the parking lot (e.g. 102,
104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124).
• The only elevator is located near the office and the game
room, all at the east end of Building 1
40
Layout of the Complex
41
Case Background
Cont’d
• The complex was completed during an economic recession;
thus, sales were slow
• The developer furnished many unsold flats and rented them
out
• Eventually, the developer stopped all leases and sold out
remaining flats by auction
42
Study Objective
• To investigate the relationship between auction price and
• Height of a flat (expressed by the floor number)
• Distance of a flat from the elevator (expressed by the number of
flats)
• Presence or absence of full ocean view
• Presence or absence of partial ocean view
• Presence or absence of furniture in an unsold flat
43
The Data
• 106 auctioned flats (see the Excel file) are examined using a
multiple regression model as follows:
• Auction price (π, called ππ«π’ππ)
• in US \$’00
• Height of a flat (π1 , called Floor)
• Takes the floor number: 1 or 2 or … or 8
• Distance of a flat from the elevator (π2 , called Distance)
• The distance is expressed by the number of flats
• An additional distance of two flats is added to a flat in Building 2
• Takes the number of flats: 1 or 2 or … or 6 in Building 1;
9 or 10 or … or 15 in Building 2
44
The Data
Cont’d
• Full ocean view (π3 , called ππ’ππ°)
• The presence or absence of full ocean view
• A dummy variable is used
• 1 if the flat has full ocean view; 0 if not
• Partial ocean view (π4 , called End)
• The presence or absence of partial ocean view
• A dummy variable is used
• 1 if the flat is 111 or 211 or 311 or 411 or 511 or 611 or 711 or 811;
0 if not
• Furniture (π5 , called Furnish)
• The presence or absence of furniture
• A dummy variable is used
• 1 if the flat has furniture; 0 if not
45
The Data
π
:
:
πΏπ
Cont’d
πΏπ
46
Dummy Variables
πΏπ , πΏπ , πΏπ
Data Analysis
Correlation coefficient between Auction price and Floor height,
π = −0.320
47
Data Analysis
Cont’d
Correlation coefficient between Auction price and Distance from elevator,
π = −0.089
48
Data Analysis
Cont’d
• Effect of full ocean view
View of Ocean
Sample Size
Mean
St. Dev.
0
36
173.944
8.802
1
70
201.000
13.847
• Effect of partial ocean view
End Unit
Sample Size
Mean
St. Dev.
0
99
192.141
18.221
1
7
187.143
10.351
Furniture
Sample Size
Mean
St. Dev.
0
61
189.508
14.880
1
45
194.933
20.942
• Effect of furniture
49
Regression Analysis – Model 1
• All five plausible independent variables are considered
π2
π = 177.703 − 0.715π1 − 0.873π2 +31.273π3
−17.808π4 + 9.984π5
50
Regression Analysis – Model 2
Cont’d
• Effect of Height &amp; Full Ocean View interaction (π1 π3 ) on auction price is added
π2
π = 152.212 + 3.225π1 − 0.828π2 + 60.054π3
−18.651π4 + 11.481π5 − 4.724 π1 π3
51
Regression Analysis – Model 3
Cont’d
• Effect of Height might not be linearly related to Auction price;
hence, interaction variables (π1 2 and π1 2 π3 ) are included
π2
π = 132.378 + 10.211π1 − 0.820π2 + 86.733π3
−19.178π4 + 11.633π5 − 15.303 π1 π3
−0.585π1 2 + 0.958π1 2 π3
52
Conclusions and
Recommendations
• Which model will you choose?
• How the results can help in prediction?
• What additional factor(s) will you consider?
53
```