Dr. Andreas Holzinger - Enjoy Thinking -

Usability Engineering Methods (UEMs) for Software Developers


by Andreas Holzinger (Original Paper in Communications of the ACM (CACM), 2005, Vol 48, Issue 1, pp. 71-74 > Digital Library of the ACM

Heuristic Evaluation | Cognitive Walkthrough | Action Analysis < > Thinking aloud | Field Observation | Questionnaires | General References

One of the basic lessons that we have learned in the area of HCI is that usability must be considered before prototyping takes place. There are techniques (e.g. Usability Context Analysis) intended to facilitate such early focus and commitment (Thomas and Bevan, 1996). When usability inspection, or testing, is first carried out at the end of the design cycle, changes to the interface can be costly and difficult to implement, which in turn leads to mere usability recommendations. These are often ignored, according to the philosophy “We don’t have usability problems”. The earlier critical design flaws are detected, the greater the chance that they can be corrected. Thus User Interface Design should more properly be called User Interface Development, analogous to Software Development since Design usually focuses on the synthesis stages, and user interface components include metaphors, mental models, navigation, interaction, appearance and usability (Marcus, 2002).

Meanwhile, it is generally accepted that the following five essential characteristics of usability should be part of any software project: Learnability - so that the user can rapidly begin working with the system; Efficiency - enabling a user who has learned the system to attain a high level of productivity; Memorability - allowing the casual user to return to the system after a period of non-use without having to re-learn everything; Errors - low error rate, so that users make fewer and easily rectifiable errors while using the system. Further, catastrophic errors must not occur; and finally, Satisfaction - pleasant to use, so that users are subjectively satisfied when using it. There are trade-offs and some criteria are more important than others, although this depends on the situation, for example: long-term efficiency may be sufficiently important to be willing to sacrifice rapid learnability (Shneiderman, 1997).

To ensure that these essential characteristics of usability exist in the software project we use methods, which we divide into inspection methods (without end users) and test methods (with end users):

Method & Category Description Advantages (Pros) Disadvantages (Cons) References (Harvard Style)
Heuristic Evaluation (HE) inspection method (from Greek heuriskein = to discover) is the most common informal method. It involves having usability specialists judge whether each dialogue element follows established usability principles (Nielsen and Mack, 1994). The original approach is for each individual evaluator to inspect the interface alone. Only after all the evaluations have been completed are the evaluators allowed to communicate and aggregate their findings. This is important in order to ensure independent and unbiased evaluations. During a single evaluation session, the evaluator goes through the interface several times and inspects the various dialogue elements and compares them with a list of recognized usability principles (e.g. the Usability Heuristics by Nielsen (Nielsen, 1994)). There are different versions of HE currently available which for example have also a cooperative character. The heuristics to be used need to be carefully selected so that they reflect the specific system being inspected, this especially under the viewpoint of Web-based services where additional heuristics become increasingly important. Usually 3 to 5 expert evaluators are necessary (cost factor), less experienced people can perform a HE, but the results are not as good. At the same time this version of HE is appropriate at times, depending on who is available to participate. application of recognized and accepted principles; intuitive; usability early in the development process; effective identification of major and minor problems; rapidity; HE can be used throughout the development process; disassociation from end users; does not identify or allow for unknown users’ needs; unreliable domain specific problem identification; HE does not necessarily result in evaluating the complete design since there is no mechanism to ensure the entire design is explored, evaluators can focus too much on one section or another; the validity of Nielsens guidelines has been questioned (Sears, 1997) Nielsen, J. and Molich, R. (1990), Heuristic evaluation of user interfaces, CHI 90, ACM, Seattle (WA), pp. 249-256.

Nielsen, J. (1992), Finding usability problems through heuristic evaluation, CHI 92, pp. 373-380.

Nielsen, J. (1994) Heuristic evaluation. In Nielsen, J. & Mack R.L. (Eds.) Usability inspection methods. John Wiley & Sons, Inc., 25-62.

Muller, M. J. and McClard, A. (1995), Validating an extension to participatory heuristic evaluation: quality of work and quality of work life, (ed.), CHI 95, ACM, Denver (CO), pp. 115-116.

Levi, M. D. and Conrad, F. G. (1996), A heuristic evaluation of a World Wide Web prototype, interactions, 3, 4, 50-61.

Sears, A. L. (1997), Heuristic walkthroughs: Finding problems without the noise, International Journal of Human-Computer Interaction, 9, 3, 213-234.

Muller, M. J., Matheson, L., Page, C. and Gallup, R. (1998), Methods & tools: participatory heuristic evaluation, interactions, 5, 5, 13-18.

Useit.com > Heuristic Evaluation > How to conduct a Heuristic Evaluation

Heuristic Evaluation System Checklist

How we do it: heuristic evaluation > Fourteen heuristics used in OCLC heuristic evaluations

From 9 Heuristics to 10 Heuristics

Web-Creators Users Group @ Stanford > Nielsens 10 Heuristics and Tognazzinis Principles

Interaction Design: beyond human-computer interaction. > Interactive Heuristic Evaluation Toolkit


Method & Category Description Advantages (Pros) Disadvantages (Cons) References (Harvard Style) Pointers
Cognitive Walkthrough (CW)
inspection method
A cognitive walkthrough is a task-oriented method with which the analyst explores the system functionalities, i.e. CW simulates step-by-step user behavior for a given task. The emphasis is put on cognitive theory, such as learnability, by analyzing the mental processes required of the users. This can be achieved during the design by making the repertory of available actions salient, providing an obvious way to undo actions and offering limited alternatives (Lewis and Wharton, 1997). The background is derived from exploratory learning principles. Several versions of CW exists including e.g. pluralistic walkthroughs wherein end users, software developers, and usability experts go through the system, discussing every single dialogue element. independence from end users and a fully functioning prototype, helps designers to take on a potential user’s perspective; effective identification of problems arising from interaction with the system, can help to define users’ goals and assumptions. Possible tediousness and the danger of an inherent bias due to improper task selection; emphasis on low-level details; non-involvement of the end user. Lewis, C., Polson, P., Wharton, C. and Rieman, J. (1990), Testing a Walkthrough Methodology for Theory-Based Design of Walk-Up-and-Use Interfaces, CHI 90, ACM, Seattle, 235-242.

Polson, P. G., Lewis, C., Rieman, J. and Wharton, C. (1992), Cognitive walkthroughs: a method for theory-based evaluation of user interfaces, International Journal of Man-Machine Studies, 36, 741-773.

Barnard, J. M. and Barnard, P. (1995), The case for supportive evaluation during design, Interacting with Computers, 7, 2, 115-143.

Lewis, C. and Wharton, C. (1997), Cognitive Walkthroughs, in Helander, M. (ed.), Handbook of Human-Computer Interaction. Second Edition., Elsevier, Amsterdam,  717-732.


Performing a Cognitive Walkthrough

Cognitive Walkthrough Example

Cognitive Walkthrough Procedure


Method & Category Description Advantages (Pros) Disadvantages (Cons) References (Harvard Style) Pointers
Action Analysis (AA)
inspection method
The method is divided into formal and back-of-envelope action analysis whereby, the emphasis is more on what the practitioners do than on what they say they do. The formal method requires close inspection of the action sequences, which a user performs to complete a task. This is also called keystroke-level analysis (Card et al., 1983). It involves breaking the task into individual actions such as move-mouse-to-menu or type-on the-keyboard and calculating the times needed to perform the action. The back-of-envelope analysis is less detailed and gives less precise results, however, it can be performed much faster. This involves a similar walkthrough of the actions a user will perform with regard to physical, cognitive and perceptual loading. To understand this thoroughly we have to keep in mind that goals are external tasks; we achieve goals; and tasks are those processes applied through some device in order to achieve the goals; we perform tasks. ACTIONS are tasks with no problem-solving and no internal control structure. We do actions. The main problem of task analysis (Carroll, 2002) is the difficulty in accommodating complicated tasks completed by more than one individual. Furthermore, the representation of a task analysis is complex, even when a simple task is studied and tends to become very unwieldy very rapidly. Such representations can often only be interpreted by those who conducted the analysis. Precise prediction of how long a task will take; a deep insight into users’ behavior. It is very time-consuming and needs high expertise. Card, S. K., Moran, T. P. and Newell, A. (1980), The keystroke-level model for user performance time with interactive systems, Communications of the ACM, 23, 7, 396-410.

Card, S. K., Moran, T. P. and Newell, A. (1983), The psychology of Human-Computer Interaction, Erlbaum, Hillsdale (NJ).

Norman, D. A. (1986), Cognitive engineering, in Norman, D. and Draper, S. (ed.), User Centered System Design: New Perspectives on Human-Computer interaction, Erlbaum, Hillsdale (NJ).

de Haan, G., van der Veer, G. C. and van Vliet, J. C. (1991), Formal modelling techniques in human-computer interaction, Acta Psychologica, 78, 1-3, 27-67.

Bourges-Waldegg, P. and Scrivener, S. A. R. (1998), Meaning, the central issue in cross-cultural HCI design, Interacting with Computers, 9, 3, 287-309.

Sutcliffe, A. G. and Carroll, J. M. (1999), Designing claims for reuse in interactive systems design, International Journal of Human-Computer Studies, 50, 3, 213-241.

JoAnn T. Hackos , Janice C. Redish, User and task analysis for interface design, John Wiley & Sons, Inc., New York, NY, 1998

The Applied Cognitive Science Lab > Introduction to Human Factors > Task Analysis


Method & Category Description Advantages (Pros) Disadvantages (Cons) References (Harvard Style) Pointers
Thinking Aloud (THA)
test method
Thinking aloud (Nielsen, 1994), may be the single most valuable usability engineering method. It involves having a end user continuously thinking out loud while using the system. By verbalizing their thoughts, the test users enable us to understand how they view the system, and this again makes it easier to identify the end users' major misconceptions. By showing how users interpret each individual interface item, THA facilitates a direct understanding of which parts of the dialogue cause the most problems. In THA the time is very important, since it is the working memory contents that are desired, thus retrospective reports are much less useful, since they rely on the users memory of what they has been thinking some time ago. A variant of THA is called constructive interaction and involves having two test users use a system together (co-discovery learning). The main advantage is that the test situation is much more natural than standard THA with single users working alone, since people are used to verbalizing their thoughts when trying to solve a problem together. Therefore, users may make more comments when engaged in constructive interaction than when simply thinking aloud for the benefit of an experimenter reveals why users do something; a very close approximation to the individual usage; the provision of a wealth of data, which can be collected from a fairly small number of users; comments of the users often contain vivid and explicit quotes; preference and performance information can be collected simultaneously; helps some users to focus and concentrate; early clues can help to anticipate and trace the source of problems to avoid later misconceptions and confusion in the early stage of design. a failure to lend itself well to most types of performance measurement; the different learning style is often perceived as unnatural, distracting and strenuous by the users; non-analytical learners generally feel inhibited; time consuming since briefing the end users is a necessary part of the preparation.

Causing users to focus and concentrate is both an advantage and disadvantage since it results in less than natural interactions at times and THA results in being faster due to the users focus.

Duncker, K. (1945), On problem-solving, in Dashiell, J. F. (ed.), Psychological Monographs of the American Psychologoical Association, Vol. 58, APA, Washington (DC), pp. 1-114.

Nisbett, R. E. and Wilson, T. D. (1977), Telling More Than We Can Know: Verbal Reports on Mental Processes, Psychological Review, 84, 3, 231-259.

Lewis, C. and Mack, R. (1982), Learning to use a text processing system: Evidence from thinking aloud protocols, (ed.), SIGCHI conference on Human factors in computing systems, Gaithersburg (MD), pp. 387-392.

Bereiter, C. and Bird, M. (1985), Use of thinking aloud in identifcation and teaching of reading comprehension strategies, Cognition and Instruction, 2, 131-156.

Gould, J. D. and Lewis, C. (1985), Designing for Usability: Key Principles and What Designers Think, Communications of the ACM, 28, 3, 300-311.

Bordage, G. and Lemieux, M. (1991), Semantic structures and diagnostic thinking of experts and novices, Academic Medicine: Journal of the Association of American Medical Colleges, 66, 9, 80-72.

Nielsen, J. (1994), Estimating the number of subjects needed for a thinking aloud test, International Journal of Human-Computer Studies, 41, 3, 385-397.

Spool, J. M., Snyder, C. and Robinson, M. (1996), Smarter usability testing: practical techniques for developing products, (ed.), Conference companion on Human factors in computing systems: common ground, Vancouver, British Columbia, Canada, 365-366.

Waes, L. V. (2000), Thinking Aloud as a Method for Testing the Usability of Websites: The Influence of Task Variation on the Evaluation of Hypertext, IEEE Transaction on Professional Communication, 43, 4, 279-291.

Andrews, K. (2001), Web Usability on the Cheap, in Holzinger, A. (ed.), Human-Computer Interaction in the 21st Century, Austrian Computer Society, Vienna, pp. 83-95.

Holzinger, A. (2003), Experiences with User Centered Development (UCD) for the Front End of the Virtual Medical Campus Graz, in Jacko, J. A. and Stephanidis, C. (ed.), Human-Computer Interaction, Theory and Practice, Lawrence Erlbaum, Mahwah (NJ), pp. 123-127.

The Interaction and Presentation Laboratory > Thinking Aloud

Thinking Aloud Protocol

The Usability Methods Toolbox > Thinking Aloud Protocol

Method & Category Description Advantages (Pros) Disadvantages (Cons) References (Harvard Style) Pointers
Field Observation (FO)
test method
Observation is the simplest of all methods. It involves visiting one or more users in their workplaces. Notes must be taken as unobtrusively as possible to avoid interfering with their work. Noise and disturbance can also lead to false results. Ideally, the observer should be virtually invisible to ensure normal working conditions. Sometimes video is used to make the observation process less obtrusive, but it is rarely necessary. Observation focuses on major usability catastrophes that tend to be so glaring that they are obvious the first time they are observed and thus do not require repeated perusal of a recorded test session. Considering that the time needed to analyze a videotape is approximately 10 times that of a user test, the time is better spent testing more subjects or testing more iterations of the design. Video is, however, appropriate in some situations. For example, a complete record of a series of user tests can be used to perform formal impact analysis of usability problems (Holzinger, 2003).

Another means of electronic observation is Data Logging, which involves statistics about the detailed use of a system. Data Logging can provide extensive timing data which is generally important in HCI & Usability. Normally, logging is used as a way to collect information about the field use of a system after release, but it can also be used as a supplementary method of collecting more detailed data during user testing. Typically, an interface log will contain statistics about the frequency with which each user has used each feature in the program and the frequency with which various events of interest (such as error messages) have occurred.

simple, examines real-life settings in real workplaces, applicable rather in the final testing, at least with using prototypes, relatively many users needed (20+), required expertise is high, Nielsen, J. and Phillips, V. L. (1993), Estimating the relative usability of two interfaces: heuristic, formal, and empirical methods, (ed.), Conference on Human Factors and Computing, Amsterdam, The Netherlands, pp. 214-221.

Rowley, D. E. (1994), Usability testing in the field: bringing the laboratory to the user, (ed.), SIGCHI conference on Human factors in computing systems: celebrating interdependence, Boston (MA), pp. 252-257.

Beyer, H. R. and Karen Holtzblatt (1995), Apprenticing with the customer, Communications of the ACM,, 38, 5, 45-52.

Wixon, D. and Ramey, J. (1996), Field Methods Casebook for Software Design, John Wiley & Sons, New York.

Wood, L. E. (1996), The ethnographic interview in user-centered work/task analysis, Field methods casebook for software design, John Wiley & Sons, Inc, New York.

Brown, D. S. (1996), The challenges of user-based design in a medical equipment market, Field methods casebook for software design, John Wiley & Sons, Inc., New York.

Kristin Bauersfeld , Shannon Halgren, &ldquo;You've got three days!&rdquo; Case studies in field
techniques for the time-challenged, Field methods casebook for software design, John Wiley & Sons, Inc.,
New York, NY, 1996

Beyer, H. and Holtzblatt, K. (1998), Contextual design: defining customer-centered systems, Morgan Kaufmann Publishers Inc, San Francisco (CA).

Helms, J., Neale, D.C., Isenhour, P.L. and Carroll, J.M. (2000). Data Logging: Higher-Level Capturing and Multi-Level Abstracting of User Activities. In Proceedings of the 40th annual meeting of the Human Factors and Ergonomics Society.

Wixon, D. R., Ramey, J., Holtzblatt, K., Beyer, H., Hackos, J., Rosenbaum, S., Page, C., Laakso, S. A. and Laakso, K.-P. (2002), Usability in practice: field methods evolution and revolution, (ed.), CHI 02, Minneapolis (MI), pp. 880-884.

Method & Category Description Advantages (Pros) Disadvantages (Cons) References (Harvard Style) Pointers
Questionnaires (Q)
test method
Many aspects of usability can best be studied by querying the users. This is especially true for issues on the subjective satisfaction of the users and their possible anxieties, which are hard to measure objectively. Questionnaires are useful for studying how end users use the system and their preferred features but need some experience to design. It is an indirect method, since it does not study the actual user interface. It only collects the opinions of the users about the user interface. One cannot always take user statements at face value. Data about people's actual behavior should have precedence over people's claims of what they think they do.

A still simpler form of questionnaire is the Interview (I). The form of the interview can be adjusted to respond to the user and encourage elaboration.

subjective user preferences, satisfaction and possible anxieties can be easily identified; can be used to compile statistics. indirect methods result in low validity (discrepancies between subjective and objective user reactions must be taken into account); needs sufficient response to be significant (we are of the opinion that 30 users is the lower limit for a study); identifies only a low number of problems relative to the other methods. Lewis, J. R. (1995), IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluationand Instructions for Use, International Journal on Human-Computer Interaction, 7, 57-78.


Quantitative Usability Methods: Surveys

Web-Based User Interface Evaluation with Questionnaires by Gary Perlman


General References:

Bevan, N. (1995), Measuring Usability as Quality of Use, Software Quality Journal, 4, 115-130.
Card, S. K., Moran, T. P. and Newell, A. (1983), The psychology of Human-Computer Interaction, Erlbaum, Hillsdale (NJ).
Carroll, J. M. (2002), Making use is more than a matter of task analysis, Interacting with Computers, 14, 5, 619-627.
Holzinger, A. (2003), Application of Rapid Prototyping to the User Interface Development for a Virtual Medical Campus., IEEE Software
Marcus, A. (2002), Dare we define user-interface design?, interactions, 9, 5, 19-24.
Nielsen, J. (1994), Usability Engineering, Morgan Kaufmann, San Francisco.
Nielsen, J. and Mack, R. L. (1994), Usability Inspection Methods, Wiley, New York.
Shneiderman, B. (1997), Designing the User Interface, Third Edition, Addison-Wesley, Reading (MA).
Stephanidis, C., Salvendy, G., Akoumianakis, D., Arnold, A., Bevan, N., Dardailler, D., Emiliani, P. L., Iakovidis, I., Jenkins, P., Karshmer, A., Korn, P., Marcus, A., Murphy, H., Oppermann, C., Stary, C., Tamura, H., Tscheligi, M., Ueda, H., Weber, G. and Ziegler, J. (1999), Toward an Information Society for All: HCI challenges and R&D recommendations, International Journal of Human-Computer Interaction, 11, 1, 1-28.
Thomas, C. and Bevan, N. (1996), Usability Context Analysis: A Practical Guide, National Physical Laboratory, Teddington (UK).

General Pointers:

A Quiz Designed to Give You Fitts

Shneiderman, Holzinger & Andrews (2003)

Ben Shneiderman,
Andreas Holzinger & Keith Andrews at the HCI 2003 Conference in Crete

© Dr.Andreas Holzinger | andreas.holzinger@medunigraz.at

Last changed: 12.08.10