Determinants of Travellers’ Expenditures

Name:

Affiliation:

Contents

Executive Summary 3

Brief Introduction 4

Research Strategy 4

Task 1 4

Task 2 4

Task 3 5

Task 4 5

Task 5 6

Statistical Analysis 6

Task 1 6

Task 2 10

Task 3 13

Task 4 15

Task 5 16

Conclusions 20

Limitation 22

Bibliography 23

Executive Summary

The main aim of this report was to determine factors that influence travellers’ expenditures. It also aimed to examine travellers’ travel preferences, variations in expenditure with regard to various travel characteristics. Independent samples t-test, Chi-square test, and multiple regression statistical methods were utilized to investigate on these aspects.

Results showed that there were statistically significant differences between the average expenditures of a visit of independent and package tour travellers at 5% level of significance. Results also revealed that there was a statistically significant influence on expenditure by age range, duration of stay, number of visits, and number of nights at 5% significance level. On average, a unit increase in duration of stay was associated 61,728.84 unit increase in expenditure.

Brief Introduction

This study sought to establish travellers’ travel preferences, variations in expenditure with regard to various travel characteristics, and determinants of travellers’ expenditures. It entails utilization of independent samples t-test, Chi-square test, and multiple regression statistical analyses on top 10 travel data. Firstly, we build charts to identify differences between categories of top 10 travel data. Secondly, we run multiple regression of expenditure on available independent variables. Lastly, we carry out a refined analysis to develop a model for predicting customer expenditures.

Research Strategy

Different research strategies will be applied appropriately for the five tasks at hand, as follows;

Task 1

We will utilize stacked bar charts to identify and demonstrate differences in travellers’ mode of travel within each quarter of the year 2017. We will also employ stacked bar charts to establish and illustrate differences between travellers’ mode of travel and which quarter they travel with respect to the travellers’ package type (Woodley, 2001).

Task 2

We will employ boxplot charts to display the distribution of data on travellers’ expenditure changes with respect to different main purposes of travelling; identify if any significant differences in expenditure exist between the ‘main purpose of travelling’ categories. We will also use clustered boxplots to reveal distributions of data on expenditure and traveller package type according to main purpose for travelling; establish if any significant differences in expenditure and traveller package type exist between ‘mode of travelling’ categories (Woodley, 2001).

Task 3

We seek to determine if there is any significant differences between the average expenditure of a visit of independent and package tour travellers and, if there is, which is the highest. We employ the independent samples t-test statistical method; it is the appropriate tool as the dependent variable is a continuous variable and the independent variable is a categorical variable with two independent groups (Field, 2009). For validity of results, we will carry out a test of normality to check if this assumption is met for validity of t-test results. If the data does not follow a normal distribution, we will use the non-parametric Mann-Whitney U test which does not require the assumption of normality (Rajagopalan, 2006).

Task 4

We aim to establish if the mode of travel is dependent on the purpose for a visit. The dependent variable (mode of travel) is a categorical variable with 3 groups while the independent variable (purpose of visit) is a categorical variable with 6 independent groups; this meets the variables assumption for the chi-square test (Hunt, 2001). We will utilize the Chi-square statistical test to examine the association between mode of travel and the purpose of a visit. We will also check for the Chi-square test assumption that the expected frequency count is at least 5 in each cell of the contingency table (Hunt, 2001). We will also check for satisfaction of the assumption that each cell has expected count greater than 5; if this assumption is violated, we will apply the Monte Carlo approach, unbiased estimate of exact significance that does not rely on asymptotic assumptions of Chi-square (Hunt, 2001).

Task 5

Multiple regression model for the prediction of travel expenditure will be formulated by establishing the most suitable explanatory variables that forecast travel expenditure. Statistical assumptions for multiple regression will be examined; normally distributed dependent variable, absence of multicollinearity, presence of homoscedasticity, independence of observations, and approximately normally distributed residuals (Cooper, 2014). The independent variables with significant p-values (p<0.05) will be adopted for the prediction of travel expenditure.

Statistical Analysis

Task 1: Makeup of Traveler’s Travel Preferences

Variation in travellers’ mode of travel within each quarter of the year 2017

Figure 1. Stacked bar chart showing quarter of travel by mode of travel in the year 2017

Figure 1 illustrates that in the year 2017, “Air” as mode of travel had 39.76% of its travellers in the first quarter (January – March), 20.03% in the second quarter (April – March), 17.28% in the third quarter (July – September), and 22.94% of its travellers in the fourth quarter (October – December). The peak quarter for “Air” as a mode of travel is the first quarter (January – March); has highest percentage of travellers (39.76%) by “Air”. Figure 1 demonstrates that in the year 2017, “Sea” as mode of travel had 21.33% of its travellers in the first quarter (January – March), 24.59% in the second quarter (April – March), 34.81% in the third quarter, and 19.26% of its travellers in the fourth quarter (October – December). The peak quarter for “Sea” as a mode of travel is the third quarter (July – September); has highest percentage of travellers (34.81%) by sea. Figure 1 also illustrates that in the year 2017, “Tunnel” as mode of travel had 27.30% of its travellers in the first quarter (January – March), 16.49% in the second quarter (April – March), 28.11% in the third quarter, and 28.11% of its travellers in the fourth quarter (October – December). The peak quarters for “Tunnel” as a mode of travel are the third quarter (28.11%), and the fourth quarter (28.11%). The most popular mode of travel across the year is “Sea”; has the highest percentages of travellers in the second quarter (24.59%), and third quarter (34.81%).

Differences between travellers’ mode of travel and which quarter they travel with respect to the travellers’ package type

Figure 2. Stacked bar chart showing differences between travellers’ mode of travel and which quarter they travel with respect to the travellers’ package type.

Figure 2 demonstrates that the “independent” traveller package type had the highest numbers of travellers, for all modes of travel, in all the quarters across the year 2017 as compared to the “non-independent traveler package type. 29.2% of those who travelled by “Air” in the first quarter of 2017 used the “independent” traveller package type while 10.55% used the “non-independent” traveler package type. 24.32% of those who travelled by “Tunnel” in the first quarter of 2017 used the “independent” traveller package type while 2.97% used the “non-independent” traveler package type. 14.68% of those who travelled by “Air” in the second quarter of 2017 used the “independent” traveller package type while 5.35% used the “non-independent” traveler package type. 19.26% of those who travelled by “Sea” in the second quarter of 2017 used the “independent” traveller package type while 5.33% used the “non-independent” traveler package type. 16.21% of those who travelled by “Air” in the fourth quarter of 2017 used the “independent” traveller package type while 6.73% used the “non-independent” traveler package type. 23.51% of those who travelled by “Tunnel” in the fourth quarter of 2017 used the “independent” traveller package type while 4.60% used the “non-independent” traveler package type.

Task 2: Travellers’ expenditure changes with respect to different characteristics

Single chart of boxplots of expenditure according to main purpose for travelling

Figure 3. Chart showing boxplots of expenditure by main purpose of travelling

Figure 3 demonstrates that the range of expenditure for “Holiday” as main purpose of travelling is the highest as compared to other main purposes of travelling. It is followed by “Business”, “Miscellaneous”, and then “VFR”. The chart depicts that travellers spend more on holidays than they do when travelling for business, study, VFR, or miscellaneous activities. It also suggests that travellers spend more on business purposes than they do on study, VFR, or miscellaneous activities. Boxplot for “Study” is comparatively short indicating that those who travelled with main purpose as “study” spent nearly similar amounts of money. Generally, some boxplots are lower or higher than other boxplots depicting that differences in expenditure exist between “main purpose of travelling” groups. The 6 boxplots have uneven sizes implying that expenditure by travellers varies greatly in other “main purposes of travelling”, and slightly in other “main purposes of travelling”. There is variation in the medians (middle quartile line) of the different “main purpose of travelling” groups, further demonstration that differences exist between groups.

Clustered boxplot of expenditure according to main purpose for travelling, cluster defined by traveller package type

Figure 4. Chart showing clustered boxplot of expenditure according to main purpose for travelling, clustered by traveller package type

Figure 4 exhibits a slightly higher median “Holiday” expenditure for non-independent traveller packager type than that of independent traveller package type. Median “Business” expenditure differs greatly between the non-independent traveller packager type than that of independent traveller package type; the non-independent traveller package type has a higher median expenditure than that of the independent traveller package type. Median expenditures for other “main purpose of travelling” groups (VFR, Study, VFR, and Miscellaneous) also differ with respect of the traveller package type groups (non-independent traveller package type, and independent traveller package type). Some boxplots fall lower or higher than other boxplots depicting that differences in expenditure exist between “main purpose of travelling” groups with respect to the traveller package type.

Task 3: Differences between the average expenditure of a visit of independent and package tour traveller

Hypotheses

Null hypothesis, H0: Average expenditure of a visit of independent and package tour traveller are equal

Alternative hypothesis, HA: Average expenditure of a visit of independent and package tour traveller are equal are not equal

Independent samples t-test

Assumptions

Independence of observations; this assumption has been satisfied since each row/case represents a traveller (person as unit) in our SPSS data.

Normality Assumption

Figure 5. Normal Q-Q plot of spending (expenditure)

Figure 5 shows that the data points lie closely along the diagonal line implying that the data is normally distributed.

Table 1. Group Summary Statistics

Table 1 shows that the independent traveller package type has mean spending (£UK) of 226,848.31, that is lower than that of non-independent package type which has a mean spending (£UK) of 299,170.03.

Table 2. Independent Samples T-Test Results

Table 2 reveals a Levene’s test for equality of variances p-value = 0.045 which is less than 0.05 the assumption of equal variances does not hold. As such, we use second line of t-test results. Second line of t-test for equality of means shows a p-value = 0.000 less than 0.05 indicating that there is a statistically significant difference between the average expenditures of a visit of independent and package tour travellers at 5% level of significance. Package tour (non-independent) has the highest average expenditure.

Task 4: Whether mode of travel is dependent on the purpose of visit

Table 3. Chi-square Test Results: Mode of Travel by Main Purpose of Travelling

Table 3 shows that the assumption of expected count greater than 5 for each cell was violated by on cell (Note. a) hence we must use the Monte Carlo- Pearson Chi-square significance (2-sided) instead of the Pearson Chi-square Asymptotic significance (2-sided). Monte Carlo- Pearson Chi-square significance (2-sided) shows a p-value = 0.000 less than 0.05 indicating that there is a statistically significant association between mode of travel and main purpose for travelling. It depicts that mode of travel depends on the main purpose for travelling.

Task 5: Further analyses of customer’s expenditure

Linear model for predicting expenditure : regression of spending (expenditure) on age range, duration of stay, quarter, number of nights, and number of visits

Table 4. Multiple Regression Coefficients Results

Table 4 shows Variance Inflation Factors (VIF) less than 10 for all the independent variables depicting that the independent variables do not highly correlate with each other; there is absence of multicollinearity. Independent variables Quarter, and Gender have p-values 0.753, and 0.198 respectively. The p-values are greater than 0.05 implying that they do not significantly influence spending (expenditure). As such, variables Quarter, and Gender will be dropped from the model.

Table 5. Summary Model of Reduced Model

Table 5 reveals R-squared value = 0.117, implying that 11.7% of total variance in spending (expenditure) is explained by the model. Adjusted R-squared value = 0.114 depicting that 11.4% of variability in spending (expenditure) is brought about by the independent variables (age range, duration of stay, number of visits, and number of nights) while accounting for any additional predictor variable into the model. The Durbin-Watson value = 1.774 and is approximately equal to 2 depicting that there is no autocorrelation.

Table 6. ANOVA Results for Reduced Model

Table 6 shows F-statistic, F (4, 1694) = 55.886 with p-value = 0.000. The p-value is less than 0.05 indicating that there is a statistically significant influence on spending (expenditure) by at least one of the independent variables (age range, duration of stay, number of visits, and number of nights) at 5% significance level.

Table 7. Multiple Regression Coefficient Results for the Reduced Model

Table 7 reveals that all the independent variables (age range, duration of stay, number of visits, and number of nights) have p-values less than 0.05 indicating that they all have a significant influence on spending (expenditure). The coefficient estimates for age range, duration of stay, number of visits, and number of nights are 7,542.75, 61,728.84, 151.96, and -1.18 respectively.

Standardized residuals for the final model

Figure 6. Histogram of standardized residual

Figure 6 illustrates an approximately normal distribution of standardized residuals. It has a mean = 0, and standard deviation = 0.99882 approximately equal to 1.

Plot of unstandardized predicted values against spending (expenditure)

Figure 7. Plot of unstandardized predicted values against spending (expenditure)

Figure 7 reveals a positive linear relationship between unstandardized predicted values and spending (expenditure). It depicts that the independent variables adequately predict expenditure.

Conclusions

Task 1: Charts exhibited that there existed differences in the percentage of travellers who used the different modes of travel across different quarters of the year 2017. “Air” as mode of travel had 39.76% of its travellers in the first quarter (January – March), 20.03% in the second quarter (April – March), 17.28% in the third quarter (July – September), and 22.94% of its travellers in the fourth quarter (October – December). “Sea” as mode of travel had 21.33% of its travellers in the first quarter (January – March), 24.59% in the second quarter (April – March), 34.81% in the third quarter, and 19.26% of its travellers in the fourth quarter (October – December). The peak quarter for “Air” as a mode of travel was the first quarter (January – March) with 39.76% of travellers in the year 2017. The peak quarter for “Sea” as a mode of travel was the third quarter (July – September); with 34.81% of travellers by sea in the quarter. The most popular mode of travel across the year was “Sea”; with the highest percentages of travellers in the second quarter (24.59%), and third quarter (34.81%). Charts also revealed that the “independent” traveller package type had the highest numbers of travellers, for all modes of travel, in all the quarters across the year 2017 as compared to the “non-independent traveler package type. 29.2% of those who travelled by “Air” in the first quarter of 2017 used the “independent” traveller package type while 10.55% used the “non-independent” traveler package type.

Task 2: Charts of boxplots a depicted that differences in expenditure existed between “main purpose of travelling” groups. There was variation in the medians of the different “main purpose of travelling” groups, a further demonstration that differences existed between “main purpose of travelling” groups.

Task 3: Findings revealed that there was were statistically significant differences between the average expenditures of a visit of independent and package tour travellers at 5% level of significance. Package tour (non-independent) had the highest average expenditure.

Task 4: Results showed that there was a statistically significant association between mode of travel and main purpose for travelling. It depicted that mode of travel depended on the main purpose for travelling.

Task 5: Results showed that there was a statistically significant influence on expenditure by age range, duration of stay, number of visits, and number of nights at 5% significance level. On average, a unit increase in duration of stay was associated 61,728.84 unit increase in expenditure. Averagely, a unit increase in number of visits was associated with 151.96 unit increase in expenditure.

Limitation

The limitation to this study was the pool of data used. The data included the year 2017 only, inclusion of other years may help develop a more robust model than the one developed. The attribute “VFR” had many outliers which have impacted investigation on differences with respect to traveller package type.

Bibliography

Cooper, D. R., 2014. Business research methods. Boston: McGraw-Hill/Irwin.

Field, 2009. Discovering Statistics Using SPSS. 3rd Edition ed. s.l.:Sage.

Hunt, A. H., 2001. Chi Square. Orthopaedic Nursing, 20(3), pp. 68-69.

Rajagopalan, V., 2006. Selected statistical tests. New Delhi: New Age International.

Woodley, A., 2001. SPSS for Windows: An Introduction to Use and Interpretation in Research. Computers & Education, 37(3-4), pp. 390-391.