While many individual technical descriptors exist to quantify and describe different kinds of acoustic phenomena, they each only describe the technical aspects of a sound itself without considering any additional non-acoustic context. Human perception, however, is greatly informed by this context. For example, humans have different expectations for the sound of an electric razor than they do for an internal combustion engine, despite both objects being able to be described by sound pressure level or a measure of roughness. No single technical descriptor alone works in all contexts as a gold standard which objectively determines whether a sound is “good.” Jury tests, however, are a great aid towards gaining a measure of this context. When seeking to effectively quantify the sound quality of a device, it is necessary to combine the perceptive information from the results of a jury test alongside one or more technical descriptors in order to provide a meaningful method of evaluation. The combination of perceptive data and technical descriptors ultimately forms a calculation rule, called a metric, which typically provides a single value that accurately characterizes how a sound will be perceived by a human. This paper describes a methodology for creating metrics by first defining a context in which to analyze a set of sounds. Appropriate methods of data acquisition are discussed, alongside jury test creation and administration. Finally, statistical methods are described for use in jury test postprocessing and the selection of technical descriptors for use in the development of a metric.