Microsoft PowerPoint - 6-HCI-Recommender

TAG:  travel web site 
Published Time: -
Filetype: pdf
Filesize: 1423617
1 Recommender Systems and HCI Francesco Ricci
eCommerce and Tourism Research Laboratory
ITC-irst
Trento – Italy
ricci@itc.it
http://ectrl.itc.it 2 Content Usability study of 6 recommender systems – What are the factors that have impact on user
satisfaction? – go beyond accuracy Explanation in collaborative filtering Explanations can change the “performance” of the
algorithm Usability study in a travel recommender system – Multiple decision styles 2 3 HCI and Recommender The accuracy of a recommendation (? How it is
measured) depends on the recommendation algorithm But the effectiveness of a RS is dependent on factors
that go beyond the quality of the algorithm The ultimate goal of a RS is to introduce users to items
that might interest them and convince users to consider
those items [Swearingen & Sinha, 2001] 4 Swearingen & Sinha Have shown that an effective recommender system: Must inspire trust in the system Has system logic that is somewhat transparent Points user towards new, not yet-experienced items Provides details about recommended items (e.g.
pictures and community ratings) Provides ways to refine recommendations (e.g. by
including or excluding particular genres). 3 5 Experimental Study A total of 19 people participated in the experiment Each participant tested either 3 book or 3 movie systems, as well as
evaluating recommendations made by 3 friends For each of the three book/movie recommender systems (presented in a
random order), users completed the following tasks: (a) Completed online registration process (b) Rated items on each RS in order to get recommendations (c) Reviewed list of recommendations (d) If the initial set of recommendations did not provide anything that
was both new and interesting, users were asked to look at additional
items (e) Completed satisfaction and usability questionnaire for each RS. After the user had tested and evaluated all three systems, a post-test
interview. 6 Measures for the Evaluation Good Recommendations: Percentage of recommended items
that the user liked. Good Recommendations were divided into
the following two subcategories – Useful Recommendations were “good” recommendations
that the user had not experienced before – Previously Liked Recommendations (Trust-Generating
Recommendations) were “good” recommendations that
the user had already experienced and enjoyed - such
items indexed users’ confidence in the RS Overall satisfaction with recommendations and with RS
(survey) Time spent registering and receiving recommendations from
the system. 4 7 Perceived Usefulness = Overall Satisfaction Users perceived RS as being useful Users did not like all RS equally And this is not because of the recommendation algorithm! 8 Factors that predict usefulness Correlations between usefulness and other aspects Good and useful recommendations (accuracy) are important
and strongly correlated with usefulness, BUT … 5 9 Good and Useful Recommendations These recommender systems are very similar in term of
good and useful recommendations, but (as seen before) quite
different in term of overall satisfaction There are other factors. 10 Impact of the Time dimension A moderate increase in the number of ratings required
does not have a strong negative impact Users appeared to be willing to invest a little more time
and effort if that outcome seemed likely Users express some impatience, but this seems not
related to number of ratings, but with the way the
information was displayed (e.g. many movies on a
screen or no detailed information). 6 11 Time 12 Trust-Generating Recommendations Recommendations that the user has previously had a
positive experience correlate with perceived usability These recommendations are not useful – do not offer new
information – but they index the degree of confidence a
user can feel 7 13 Unexpected Items Recommender systems are better than friends in
recommending unexpected items Recommender systems are useful because they “expand
the horizons” 14 Exploratory Search [Marchionini, 2006] 8 15 Information about recommended items Two versions of rating zone are compared: no
descriptions and descriptions Description of individual items correlates
positively with both perceived usefulness and
ease of use 16 Interface issues Interface matters when it gets in the way Navigation and layout seemed to be the most important
factors – Correlate with the ease of use and perceived
usefulness It is important to invest time in user-testing the
navigational structure of the RS – This can have a great impact on the user satisfaction 9 17 Predicting the degree of liking Recommending an item and predicting the degree
of liking is not the same Ratings can make users more critical of the
recommendations (“why such a rating?”) And if the system recommends items with low or
medium “predicted liking” ratings? – The user may be confused about why this item is
recommended Presenting degree of liking is high-risk feature –
the system would need to have a very high
degree of accuracy for users to benefit 18 Reflections If predicting the degree of liking is not the same of
recommending an item, then why to bother about the
MAE? Why the system should be right in predicting the
degree of liking for items that have an average (or low)
ratings? The recommendation algorithm is only a “component”
of the recommender system Instead of building sophisticated prediction algorithms
we should build sophisticated “user manipulation”
methods? 10 19 System Transparency Users like to understand what was driving a system’s
recommendation The reasoning of the RS should be at least somewhat
transparent Users are confused if all recommendations are
unrelated to the items they rated 20 Conclusion of [Swearingen & Sinha, 2001] The “goodness” of a recommendation and perceived
usefulness of a RS depends heavily upon the user’s
expectation – i.e. the expected range of information that the
recommender can provide E.g. in this study they found users interested in – Reminder recommendations = recommendations for items
that the user already though about it – “more like this” recommendations (e.g. more movies in a
particular genre) – New items (e.g. those recently released in a particular
genre) – “broaden my horizon” – really new and unexpected items 11 21 User Tasks Annotation in Context: GroupLens – suggest what news are worth reading
[Resnick et al. 1994] Find Good Items: suggest some items as a ranked list [Shardanand and Maes,
1995] Find all Good Items: all items satisfying some user needs and wants [Ricci et al,
2002] Recommend Sequence: recommending a sequence that is pleasing as a whole
[Hayes and Cunningham, 2001] [Aguzzoli et al., 2002] Recommend a Bundle: suggest a group of products that fits well together [Ricci
et al, 2002] Just Browsing: user find pleasant to browse products’ recommendations Find Credible Recommender: users try to change the input (e.g. user profile)
to find bias in the recommender algorithms Improve Profile: add rates or other user info to “improve” their profile Express Self: feel good to contribute to the system performance by adding their
comments or ratings Help Others or Influence Others. Extended from [Herlocker et al. 2004] 22 User Tasks from a HCI perspective The previous list is biased by the technology (what we can
provide) User tasks and goals should be analyzed case by case For instance in a Travel Recommender System – A place similar – Other offers “like this” – A quieter place – The attraction that I should not miss in Girona – The hotel closest to Main Square but not on a high traffic
road … Different task require different evaluation approaches. 12 23 Explanations in Recommender Systems Transparency of the reasoning process improves user
satisfaction [Swearingen & Sinha, 2001] Most of the recommendation technologies shown so far
are far too complex to be explained Collaborative filtering is mainly used as a black box If we want to use RS in high risk domains (recommend
a camera - a travel – an investment plan …) we must
add an explanation component. 24 Benefits of explanations Can build trust between the user and the system Can increase credibility of the system and confidence in the
recommendations Can reduce the errors (an explanation makes clear why the
system is making an error) – The error may be due to lack of data (e.g. missing ratings
or missing user information or not enough products) – The error may be due to the process (wrong similarity
function, or ACF not considering the context in the
prediction) [Herlocker et al., 2000] 13 25 Benefits of explanations (2) Can increase user involvement – this can push the user
to further add her knowledge (ratings) to the system Can increase the educational role of a recommender –
the information provided becomes source of new
knowledge Can increase acceptance 26 Case study in Collaborative Filtering Investigation about the roles of the explanation in
collaborative filtering – 3 questions What models and techniques are effective in supporting
explanation in an ACF system? Can explanation facilities increase the acceptance of
automated collaborative filtering systems? Can explanation facilities increase the filtering
performance of ACF system users? 14 27 White and Black models of explanation White box model – the ACF recommendation model is
“simple” – there are three steps – User enter ratings – ACF locates people with similar interests – Neighbors’ ratings are combined to form recommendations The explanations can be linked to the process/algorithm used
to generate the recommendations – White Model Black box model – generate explanations independent from
the algorithm that is really used 28 White model 1) user enter ratings – Explain what is the current content of the user
profile (ratings) – Explain the ratings that have been used the most – Explain that the user has (not) rated enough items
to make the recommendations enough reliable – Indicate products that should be rated to improve
the quality of recommendation 15 29 White model 2) ACF locates people with similar interests – Explaining the behavior of the similarity metric – Illustrate the concept of “closeness” used by the
similarity metric – Illustrate how many neighbors are considered – Illustrate the profile (ratings) of the neighbor users 30 White model 3) Neighbors’ ratings are combined to form
recommendations – Illustrate the ratings of the neighbors for the target
item – Illustrate the distribution of these ratings – Show the combination of a neighbor rating and
neighbor closeness – Illustrate the method used to combine the ratings of
the neighbors in a single prediction 16 31 Black box model Black box model – generate explanations independent
from the algorithm that is really used – Explain that the recommender was correct x% of the
time in the past – Bring information that has not been used in the
prediction E.g. show the product reviews collected from
another web site Or explain how many examples of that item have
been sold in the last month 32 Experimental Study 1 Each user is provided with 21 individual movie
recommendations each with a different explanation
component The 21 different explanation interfaces all describe the
same movie recommendation (!) The user was then asked to rate on a 1-7 scale “how
likely they would be to go and see the movie” The 21 different interfaces were presented in a random
order for each user (to account for learning effects) 17 33 Results Past performance is
the accuracy of movie
lens in the past Explanation 5 = “this
movie is similar to 4
other movies that you
rated 4 stars of higher Explanation 6 – the
importance of
providing additional
content info Explanation 17 –
worked bad ! 34 Histogram with grouping: 1 st choice Neighbor ratings
histogram (explanation
3) is similar to this –
one bar for each rating Histogram with
grouping performs
better than full
histogram because it
reduces the
dimensionality 18 35 Table of neighbor ratings: 4 th choice 36 Influence of explanation on the user Prior to the main study, in a small pilot study participants
were interviewed after they took the survey – many users perceived each “recommendation” as having
been generated using a different model – which was then
explained – Each explanation was changing the user’s internal
conceptual model of how the recommender
computed predictions In the primary study they attempted to control for this effect
by clearly stating to study participants up front that he model
was going to be the same in each case. 19 37 Conclusion 1 What models and techniques are effective in supporting
explanation in an ACF system? – There are differences in explanation techniques effects – Rating histograms seem to be the most compelling ways – Other good methods Indication of past performance Comparison with similar (highly rated) items Domain specific content features 38 The other two hypothesis Hypothesis 1: adding explanation interfaces to an ACF
system will improve the acceptance of that system among
users Hypothesis 2: adding explanation interfaces to an ACF
system will improve the performance of filtering decisions
made by users of the ACF system. – This means that one can measure differences in the
prediction accuracy of the system when using different
explanation capabilities – In principle this should not be true – Unless the explanation capability can convince you that
the system prediction is correct – and change your true
evaluation 20 39 New experiment 7 alternative systems are compared – 2 are: the old system, the old system with aesthetic
changes – 5 different explanation functionalities – mixtures of
the following two confidence Distribution of ratings 40 Procedure A survey at the beginning and a survey on exit The subjects were asked to return to MovieLens whenever they saw a
new movie and fill out a mini-survey 1. Which movie did you see? 2. Did you go because you thought you would enjoy the movie or
did you go for other reasons (such as other viewers)? 3. Did you consult MovieLens before going? 4. If you consulted MovieLens, what did MovieLens predict? 5. How much did MovieLens influence your decision? 6. Was the movie worth seeing? 7. What would you now rate the movie? 4 and 7 where use to compute the accuracy of the prediction 21 41 Results 210 users (210 standard surveys) 743 mini-surveys – In 315 cases the users consulted MovieLens before
seeing the movie – In 257 cases MovieLens had some effect on user
decision – In 213 of the cases above (83%) – the MovieLens
recommendation was not the sole reason for
choosing a movie 42 Effect on performance NO statistically significant difference between any two
experimental groups Hypothesis 2 is rejected 22 43 Effect on user acceptance In exit surveys given at the end of the study, users in
non-control groups were asked if they would like to see
the explanation interface they had experienced added
to the main MovieLens interface. 97 experimental subjects filled out the exit survey 86% of these users said that they would like to see
their explanation interface added to the system Hypothesis 1: adding explanation interfaces to an ACF
system will improve the acceptance of that system
among users – Is accepted 44 Dietorecs development and evaluation Steps in the development process – Development of a user decision model – Design of the recommendation technologies – First prototype design Iterative design and evaluation (mock up) key technologies implementation Prototype management and evaluation Technology improvement Final recommender system [Zins et al., 2004] [Bauernfeind et al., 2003] 23 45 Observational study – real travel planning
sessions N = 200 10% dialogues in travel agents (Berlin ) 40% trip planning from catalogues (Berlin) 50% trip planning on the Internet (Vienna) – 25% TisCover – 25% AllesReisen.com 46 24 47 48 Exit survey Immediately after the trip planning task Attended computer-interactive interview Perceptions and reflections about the planning process Characteristics of the prepared trip (main purpose, travel
budget, experience, organisation) General travel decision making 25 49 Coding Study material: written transcripts, videos, screen clips,
catalogues 15 coders 2 independent observations matched afterwards 50 Content I Which trip elements are initially verbalized by the
customer? Which elements determine the trip at the end of the
planning process? What drives information delivery: The user request or
information shown by the medium? Timing of trip elements: earlier/later? The way of processing: fixed or flexible? 26 51 Content II Role of additional travel characteristics: Travel
motivations, travel experience Technical process characteristics: – e.g. length of interview, interrupts, interface problems,
number of alternatives Additional process characteristics: – e.g. decision mode, decision role, involvement 52 Average Frequencies in % of respondents What? Who? When? How? Trip elements Start End User System Earlier Later Fixed Variable Activities/facilities 47 63 59 54 53 18 46 20 Type of transportation 42 77 51 56 54 22 55 10 Attractions 45 24 51 45 44 13 38 14 Length of stay 29 77 60 64 62 22 39 27 Destination: country 78 94 87 72 93 1 58 21 Destination: community 19 81 49 79 71 16 13 56 Destination: region 53 88 77 82 90 4 26 46 Accessibility of the destination 13 48 39 43 30 28 35 13 Geographical area 71 84 73 62 81 0 70 6 Natural factors 52 78 59 55 67 9 64 9 Price 29 83 79 89 69 29 24 63 Travel party 60 88 63 59 66 15 70 5 Travel type in general 40 81 72 59 81 3 60 18 Travel type: All lnclusive 14 13 8 10 9 2 6 4 Travel type: Independent Traveller 22 43 39 34 43 4 33 10 Travel type: Last Minute 9 10 16 9 14 3 5 10 Travel type: Low Budget 5 15 15 9 14 4 14 3 Travel type: Tour operator product 15 33 34 38 42 4 22 17 Travel type: Special Offer 2 7 14 9 9 8 5 11 Transfer to accommodation 11 30 19 23 9 24 13 11 Accommodation: equipment 15 74 47 77 47 40 22 52 Accommodation: pictures 10 74 54 87 53 36 36 34 Accommodation: category 25 73 43 81 61 24 18 46 Accommodation: place 26 81 55 75 45 34 47 21 Accommodation: catering 28 80 53 85 55 38 35 41 Type of accommodation 51 89 75 86 78 14 41 37 Time of travel 45 77 70 70 76 12 40 42 Additional geographic information n.a. 64 35 30 31 18 22 14 Additional information n.a. 42 52 18 25 34 40 11 Get in contact n.a. 30 42 n.a. 9 22 35 8 Number of elements 8.5 17.9 14.6 15.6 14.8 5.0 10.3 6.8 63 How to define decision styles? 27 53 Six Decision Styles found DS1: Highly pre-defined users (15%) DS2: Accommodation-oriented users (18%) DS3: Recommendation-oriented users (10%) DS4: Geography-oriented users (18%) DS5: Price-oriented users (18%) DS6: The individual traveler (32%) 54 Decision Styles I/II Name Decision style characteristics Recommendation/ Reduction strategy Highly pre-
defined Many trip attributes pre-defined
Natural resources very important Let user specify many
attributes, maybe phased:
first destination, then
accommodation and price,
then further details Accommodation
oriented Highest importance on
accommodation; high quality, not
price sensitive Only broad geographical
area, then ask for
characteristics of
accommodation; list
attributes of recommended
destinations for comparison Recommendation
oriented Few trip attributes pre-defined;
affinity for certain travel types Come up quickly with
pictures, let user ‘feel’
recommendations 28 55 Decision Styles II/II Name Decision style characteristics Recommendation/ Reduction strategy Geography
oriented Clear conception of geographical
area and region Let user search by map
(giving detailed information
about the areas clicked);
concrete accommodation
offers not before village is
determined Price-oriented Price as most important feature,
searching for benefits within a
certain price range Ask for price range and
natural resources sought;
begin list from cheapest Activity-driven
traveller Destination as cue for benefits and
activities sought Ask for benefits and activities
sought; determine travel
typology; describe offers
detailed 56 Six Decision styles ... ... are not: – exhaustive – homogeneous in their preferred travel ‘product’ – easily predictable however, they: – have similar search strategies – have specific needs for a specific travel arrangement – are prototypes which may be used particularly in the
initially phase of a search/reduction process 29 57 Common Sequence of a TR Session Filtering Specification Selection/ Sorting User System Specification of details
according to styles
Show # of avail. alternatives
Show alternatives (pictures)
Proceed to ‘specification’
Recommend action(s) for
relaxing constraints Alternatives may be shown User System Specification of further details
Ask for more information
Show # of avail. alternatives
Show alternatives (pictures)
Proceed to ‘selection/sorting’
Recommend action(s) for
relaxing constraints Alternatives must be shown User System Ask for more information
Browse through ordered list
Compare alternatives
Get recommendations
from others Show # of avail. alternatives
Show alternatives (pictures)
Present recommendations
Recommend action(s) for
relaxing constraints Learning from others, products ? 58 The Ladder of Intelligence in
Recommendation Systems 30 59 Other things learned ... Most of the users do not like a long procedure of
answering questions but want to see things quickly They become very impatient when they specify their
needs and the system does not contain one single offer They generally show scepticism because they suppose
that there must be more – Important features: Trust, Competence, Usability – System implications: Be fast, easy and transparent 60 Recommendations Facilitate tourist life – Take account of different ‘decision styles’ – Enhance adaptivity, add capability of learning and real-
time personalization – Reduce the user’s effort & arouse excitement – Avoid eliciting redundant user input – Mediate between language levels (consumption goal &
experience oriented versus package production
oriented) 31 61 Challenges raised by the findings Different decision styles require complex system design
that can cope with all these variations – Design issues are very important Decision styles are fuzzy concepts, i.e. users never
follow only one decision style – Technical approach to switching behaviour between
decision styles is unclear – Number and characteristics of decision styles may
change over time 62 GUI Design 32 63 64 1 st GUI Mock-up 2 nd GUI Mock-up 33 65 V0.5 GUI V1.0 GUI 66 Cognitive Walkthrough and Heuristic Inspection Applied to a GUI mock-up without functionalities Qualitative assessment of some user interface design choices The goal was to detect substantial weaknesses of the user
interface design exploratory learning while solving the user’s problem identifying violations of heuristics Applying guidelines from Nielson (2000): know the user,
reduce cognitive work, avoid design errors, keep consistency 34 67 Examples: problems found in the walkthrough “there is no reason for the link to “recommendation market
place” appears in the main area (and not on the left as the
other functions” “A usability problem can be envisaged for the registration
goal. The achievement of this goal seems a pre-condition for
accessing SA1 and SA2, these choices should be deactivated
unless the user is registered and logged in. “ “Another problem may arise from the fact that most of the
choices are duplicated (for example, “kind of accommodation”
in “advanced travel wish” and “accommodation”) and the
interface does not seem to help the user is keeping the
consistency (P3). “ … 68 Heuristic evaluation Conducted on the Prototype V0.5 (with partially
implemented functions) 5 Experts (2 Trento, 2 Linz, 1 Vienna) “evaluated the system functioning, the interface, and the user-system interaction, according to their
preferred heuristic procedure” by answering the PUTQ questionnaire providing a list of comments (including any problem or
error message, improvement suggestions and any other
remarks and observations relevant to the usability of
the system 35 69 PUTQ composed of 100 questions on system interface
structured by eight factors that are relevant to human-
computer interaction – compatibility, consistency, flexibility, learnability,
minimal action, minimal memory load, perceptual
limitation, and user guidance It is possible to compute an index based on the ratings
and put into relation to the possible perfect score http://www.acm.org/~perlman/question.cgi?form=PUTQ 70 PUTQ - Summary of Results 1,2 65,0 (18,9) 4,9 (1,3) Total 1,5 30,5 (27,9) 2) 4,5 (1,1) User guidance 1,1 75,6 (15,9) 5,9 (1,2) Perceptual limitat. 1,5 67,0 (15,5) 4,9 (1,1) Min. memory load 1,5 63,6 (20,4) 4,9 (1,8) Minimal action 0,7 72,9 (16,9) 5,4 (1,2) Learnability 1,1 41,8 (8,5) 2) 3,8 (1,1) Flexibility 0,8 70,3 (24,9) 5,1 (1,7) Consistency 1,0 73,5 (18,4) 5,1 (1,4) Compatibility Not Applicable 1) Average PUTQ Index Average (Std. Dev.) Effectiveness Average (Std. Dev.) 1) Excluded not available 2) Expert 5 was excluded from the analysis because of too many "not applicable" values 1= bad, 7= good 36 71 PUTQ index A direct way to asses the usability of a system 100 is the maximum – “item” is a question in the survey Computed for each user (and each factor) then averaged 100 Item 7 ) Penalty Score ( Index PUTQ i × × ? × ? × ? = i i i i w w where: i = the ith item Score i = the rating score of item i Penalty i = 1, if the item i is applicable but not available (N/A) 0, if the item i is not available Item i = 1, if the item i is applicable = 0, if the item i is not applicable w i = weighting of the importance i. 72 Results PUTQ Compatibility: Expert evaluation indicated a good
compatibility (PUTQ Index = 73.5). Especially coding
and wording were compatible with familiar conventions Consistency: the experts identified inconsistencies in
displayed symbols and data, feedback and the required
user actions. – some of the displayed symbols, data, feedback and
required user actions did not fit in user expectations
and were not clearly understandable Flexibility: the PUTQ Index of 41.2 is low 37 73 Results PUTQ (2) Learnability: the prototype was judged as being easy to learn
(PUTQ Index = 72.9) Minimal action: (the number of actions required for the user
to perform a task is minimal) the experts suggested that
improvements are still necessary (PUTQ Index = 63.6) Minimal (long-term) memory load: (assists the user in
learning an interface fast) overall, the minimal memory load
requirements were evaluated quite favorably (PUTQ Index =
67.0) Perceptual limitations: (consider the limitations of human
perceptual organization capacities) best criterion (PUTQ Index
= 75.6) User guidance: very low (PUTQ Index of 30.5 is the lowest of
all) 74 Detailed Expert Evaluation Collected problems and remarks concerning General problems / remarks Start page Navigation (user registration, left menu) Layout and design Travel planning process Recommendation process Results Searching for inspiration Many problems solved before experimental evaluation Changes in interface and design (start page and menu bars) Extension of explanations Consistency checks 38 75 Experimental Evaluation by potential Users Rigorous test of system value under experimental
conditions Within- and between subject design: 2 consecutive,
weakly structured travel planning tasks Testing against a highly developed operative system in
the market (Tiscover) Testing the performance across variants of different
potential of recommender functions 76 Two interaction styles Traditional query form Single item
recommendation Recommendation by proposing Complete bundle recommendation 39 77 System Variants DTR-A: Interactive Query Management only (i.e. empty
case base and no recommendation support via smart
sorting or through other means); DTR-B: Single Item Recommendation with Interactive
Query Management and Ranking based on a
representative case base; DTR-C: A variant with all the recommendation functions
enabled (SingleItemRecommendation,
BundleRecommendation, SeekingForInspiration). TISCOVER: a fully operational system 78 Hypotheses H1 - The recommendation-enhanced system is able to deliver useful
recommendations The position of the selected item for DTR-B should be nearer than
DTR-A to the top of the visualized result list H2 - The recommendation-enhanced system is able to foster the
construction of good travel plans Analyze the differences between the three systems (the Dietorecs
variants and TISCover) on the users’ ratings of the selected items H3 - The recommendation-enhanced system allows a more efficient
search User should perform fewer queries, examine fewer pages and
should reduce the search and decision time H4 - The recommendation-enhanced system heighten the user
satisfaction We should find significant differences between DTR-B and DTR-A
on the questionnaire. 40 79 Experimental procedure Demographic
Questionnaire: 5 min System 1: Familiarization 5 min Training 5 min Story + Test phase 30 min Satisfaction Questionnaire 5 min System 2: Familiarization 5 min Training 5 min Story + Test phase 30 min Satisfaction Questionnaire 5 min 80 Training Task Imagine you want to search for an accommodation for
two persons in the Zillertal in the price range of 30 to
70 Euro (per person and day) Please take five minutes to perform this task using the
system 41 81 First test task You won a trip to Tyrol, Austria. All transportation necessities will be
arranged according to your travel plans and will not debit your given
travel budget. This travel budget amounts to euro 150 per person per
day. You may allocate this budget to accommodation, events, sports,
cultural activities or anything else you may want to do during this
vacation trip. The budget you did not allocate in advance you will
receive as pocket money for other trip expenses. You may exceed the
total budget if you want to spend additional money on this trip. Now, it is your task to plan your individual trip on the travel site to
which you were assigned by the tutor. The trip is only restricted to
last at least 7 days and is limited to a maximum of 4 persons
(including yourself) in your travel party. The trip can be taken any
time between May and October 2003. Please, avoid locations that you
have already selected in previous tasks. Before you start looking for information on the system please
describe in a few sentences the specifics (travel wishes) of the trip
you are going to plan (when, how, travel group, destination,
accommodation, activities, etc.) with the help of this travel
recommender system taking the above-mentioned criteria into
account. 82 Second Planning Task After having completed the first travel planning task, we would like to
invite you to repeat a quite similar trip preparation task on a second
travel web site. The following restrictions apply to this task: The travel destination is Tyrol, Austria You are already back home from the previously planned trip to
Tyrol Budget handling and travel party conditions are the same as with
the first task Please, avoid the locations that you have already selected in
previous tasks. Before you start looking for information on the system please
describe in a few sentences the specifics (travel wishes) of the trip
you are going to plan (when, how, travel group, destination,
accommodation, activities, etc.) with the help of this travel
recommender system taking the above-mentioned criteria into
account. 42 83 Design and sample size Sequence Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 First System TISCover DTR-A TISCover DTR-B TISCover DTR-C Second
System DTR-A TISCover DTR-B TISCover DTR-C TISCover N = 47 10 11 10 10 2 4 84 Socio-demographics How familiar are you with Tyrol? Not familiar 30% Quite familiar 51% Very familiar 15% No answer 4% 0% 10% 20% 30% 40% 50% 60% never once a year several times a year once a month once a w eek several times a w eek How often do you inform yourself or purchase travels via the Internet? P
e
r
c
e
n
t
a
g
e

o
f

t
e
s
t

p
e
r
s
o
n
s Travel Information Travel Purchase GENDER AGE <25 AGE >25 TOTAL F 26 (56%) 3 (7%) 29 (63%) M 9 (20%) 8 (17 %) 17 (37%) TOTAL 35 (76%) 11 (24%) 46 (100%) 43 85 Travel wish specification
before using the system (DTR) Essay on Travel Plan Finished Plan Average yes: 30% no: 70% p-value Destination: yes 87% 79% 91% n.s. detailed by attributes 11% number of attributes 1.2 2 1 n.s. Accommodation: yes 94% 79% 100% 0.05 detailed by attributes 89% number of attributes 1.8 2.6 1.5 0.10 Activities: one 49% 50% 49% n.s. two 40% 43% 39% n.s. detailed by attributes 21% number of attributes 2.4 1.0 2.8 0.10 Needs specified before the trial Those that can finish a plan seem to have better specified their
needs (destination and accommodation) before searching 86 Finished Travel Plans Travel Plan Elements Not found as intended by element specified Average yes no p-value Destination: DieToRecs 78% 79% 67% n.s. TISCover 88% 93% 50% n.s. n.s. Accommodation: DieToRecs 31% 31% 0% --- TISCover 56% 56% 0% --- n.s. Activities specified:
DieToRecs 49% 50% 48% n.s. TISCover 50% 50% 0% n.s. n.s. Finished the travel planning process: 64% TISCover vs. 30%
DieToRecs A large percentage was not able to find the destination as intended
(especially if the had specified the elements) The data base (the true content) is very important with or without
recommendations! 44 87 H1 - Average Position for Items in the Result List
by DieToRecs Variants DTR-A DTR-B t-test Average Std.Dev. Average Std.Dev. Items in general 4.3 4.6 2.9 2.8 n.s. Accommodation items 5.0 0.4 2.2 1.2 n.s. Destination items 3.9 0.1 2.5 1.3 n.s. Interest items 4.0 4.8 3.5 3.0 n.s. ? Cautious confirmation of H1: Item ratings are substantially better
for DTR-B 88 H2 - Item ratings by DieToRecs variants Travel Plan Element System Variants Average DTR-A DTR-B DTR-C p-value Finished plans 30% 10% 30% 100% 0.001 Ratings Destination 4.0 2.8 4.5 5.3 0.10 significant difference 0.10 Accommodation 4.1 4.1 3.6 5.9 0.15 significant difference 0.01 significant difference 0.05 Activities 4.2 3.2 4.9 7.0 0.05 significant difference 0.1 significant difference 0.01 significant difference 0.001 Note: “1”: very dissatisfied, “7”: very satisfied => Ratings on the selected products are better the
more recommendation functions the variants have 45 89 PSSUQ 1) I liked using the interface of the system. X 2) The organization of information on the systems screen was clear. X 3) The interface of this system was pleasant. X 4) This system has all the functions and capabilities that I expect it to X 5) The information retrieved by the system was effective in helping
complete the tasks.
6) The products listed by the system as a reply to my request were X 7) I found the “recommend travel ” function useful. Dietorecs GR Only 8) I found the “seeking for inspiration ” function useful. Dietorecs GR Only 9) It was simple to use this system. X 10) It was easy to find the information I needed X 11) The information (such as online-help, on-screen messages, and X 12) Overall, this system was easy to use. X 13) It was easy to learn to use the system. X 14) There is too much information to read before I can use the system X 15) The information provided for the system was easy to understand. X 16) I felt comfortable using this system X 17) I enjoyed constructing my travel plans through this system. X 18) Overall, I am satisfied with this system. X 19) I was able to complete the tasks quickly using this system. X 20) I could not complete the tasks in the preset time frame. X 21) I believe I could become productive quickly using this system. X 22) The system was able to convince me about the goodness of the X 23) From my current experience with the system, I think I would use it X 24) Whenever I made a mistake using the system, I could recover X 25) The system gave error messages that clearly told me how to fix X Questions Additional Questions Design / Layout Functionality Satisfaction Outcome / Future Use Errors / System Reliability X Ease of Use Learnability 90 Usability and Satisfaction Evaluation Ease-of-use/ Learnability Effectiveness/ Outcome Reliability User/System Satisfaction DTR: 0.30 TIS: 0.37 DTR: 0.73 TIS: 0.61 DTR: n.s. TIS: n.s. 46 91 H4 - Average Usability and Satisfaction Scores TISCover Ø DTR Ø DTR-A DTR-B DTR-C User Satisfaction 3.2 4.6 5.2 4.5 3.3 Ease-of-use 2.8 3.6 3.9 3.5 3.1 Effectiveness/Outcome 3.4 4.6 4.9 4.6 3.4 Reliability 3.5 3.7 4.0 3.4 3.7 Note: “1”: strongly agree, “7”: strongly disagree Smaller numbers are better => H4 confirmed: the more recommendation- enhanced the better the user satisfaction DTR is the average of all DTR-? users 92 Conclusion Differences in subjective evaluations between a system
without ranking support (DTR-A) and with ranking (DTR-B)
are substantial Comparison between the DTR-C variant (recommending full
travel plans) and the baseline system TISCover demonstrated
that almost no performance difference arose Complex products like tourism destinations challenge the
evaluation procedures Performance evaluations should be run within an environment
as realistic as possible No adequate usability and satisfaction instruments available User satisfaction is expected to be higher after having
improved the GUI and navigation facilities 47 93 Conclusion II Evaluating recommender systems entails a higher level
of sophistication – For experts – In user modelling – For experimental tasks – For evaluation instruments – For logging data procedures 94 Recommendation Evaluation eval Predicted rating accept r
e
j
e
c
t Pre-consumption
user rating recommendation p
o
st
-
c
o
n
su m p
t
i
o
n
u
se r

r
a
t
i
n
g 48 95 Recommendation Evaluation There are two goals of the recommender system 1) – to have a large acceptance rate: – the user must accept the recommendation and buy
the product – He must evaluate the suggested item as useful – He must trust the recommender 2) the post-consumption rating must be high – the user must be really satisfied of the product 96 Impact of the recommender The recommender system (and the predicted rating) may
have an impact on – the accept/reject decision – The pre-consumption rating The recommender system has NO impact on the post-
consumption rating The system MUST predict correctly the post-consumption
rating But at the same time must convince the user to accept a
recommendation, i.e., must raise the pre-consumption rating These two goals may be conflicting (e.g. it is easy to convince
someone to buy a blockbuster movie, but it is not easy to
guess that the user will really like it). 49 97 Conclusions Recommender systems are more than a recommendation
algorithm The success of a recommender system is due to HCI factors Usability is a major issue Explanation of the recommendations plays an important role
in user satisfaction Recommender systems should support multiple user task Recommender systems should support tasks with multiple
interaction styles (decision styles) 98 Questions How to define an evaluation metric that takes into account
the trust that the system may generate? Think about this statement: “Recommending an item and
predicting the degree of liking is not the same”. How
this impact on a recommendation algorithm? Is it feasible a “white model” approach for an hybrid
recommender system? In the design of a recommender system is it better to focus
on acceptance of the recommendation or in post consumption
rating?
Google Search
Google
Popular Articles