GoodGuide's health ratings methodology is grounded in the sciences of informatics and health risk assessment. We identify methods that that are widely accepted and science-based that can be used to define the chemical safety or nutritional value of products.

Selecting Products and Companies to Rate

GoodGuide focuses on rating everyday household consumer products bought either from offline or online retail outlets like supermarkets or e-commerce sites. Our core product categories are personal care, household chemical and food products.

Our goal is to rate the products that comprise the top 80% of current sales in a category, plus innovative products that are marketed as being healthy. We use a variety of sources to define our catalogue of available products, identify relevant brands and companies and collect information about product attributes required for our ratings system.

To identify, track and organize relationships between products, brands, companies, and product categories, GoodGuide follows informatics standards used to organize consumer product and corporate information. For example, we use standard UPC codes to identify unique products. We can then link our product records to retailer-specific product identifiers as well as respond to bar code scans from our mobile users. We supplement this with a custom classification system to organize products into categories, because there is no standardized method for grouping products into consumer-relevant product categories.

Designing a Ratings System

GoodGuide uses “ontologies,” or structural frameworks for organizing information, to define “what matters” when assessing the health performance of a product or company. The major issues covered are summarized in our ratings overview. Our issue framework is derived from current standards of practice in the scientific domains relevant to assessing health impacts. For example, we track issues that mirror the standard output of chemical risk assessments or nutritional evaluations. Our reliance on the informatics systems that have been developed by scientific, regulatory or other authorities to address specific issues ensures that our system provides science-based ratings and can take advantage of standardized information generation.

For each issue, we then identify a set of “indicators” that provide evidence about how a product performs on that issue. Product-level indicators are based on attributes of a product related to its potential health impacts (e.g., the level of health concern about the ingredients of a personal care product).

Data availability is one of the most important criteria for selecting indicators. In order to ensure we have comparable information available for rating products, we require that indicator information is publicly available for the majority of rated products. . Data availability influences GoodGuide's rating system in two important ways:

  • In many cases, data availability considerations require GoodGuide to rely on “screening-level” indicators rather than “data-intensive” indicators. In a world of perfect information, for example, product health ratings would be based on detailed health risk assessments that combine information about the health hazards of ingredients with data characterizing consumer exposure to those chemicals. Unfortunately, these data are almost never made available by manufacturers, so GoodGuide utilizes more readily ascertainable hazard indicators (e.g., the number of ingredients of health concern in a product).

  • Because the pervasive lack of transparency about product attributes undermines the public's ability to evaluate performance, GoodGuide has created a number of indicators that track data availability and impact product ratings. At the product level, Data Adequacy indicators track whether the specific data elements that are needed to assess a product's health impact are public. We penalize personal care or household chemical products missing complete ingredient lists in our scoring system because these products lack the data needed to assess chemical safety.

At the product-level, our ratings system is designed to support comparisons of products within a product category. The evaluative framework used to assess personal care products contains a different set of issues and indicators than the framework used to assess food products - the former focuses on characterizing the health impacts of ingredients, while the latter focuses on characterizing the nutritional value of products.

Collecting Data

For each issue and its associated indicators, GoodGuide acquires data from institutions, governmental agencies, commercial data aggregators, non-governmental organizations, media outlets and corporations. See our data page for information about our data quality procedures, update frequency and error correction policies. Product-level information is typically obtained from a manufacturer's website or product labels. GoodGuide defines the data elements required by each rating indicator and employs automated data extraction and information organization tools to create structured data from online sources. Note that GoodGuide itself does not test products to generate the data we use in our ratings.

Scoring Indicators

Upon acquiring indicator data, we score observed values according to GoodGuide's standard scale ranging from 0 to 10, where 0 represents the lowest performance and 10 the highest.

For product-level data, GoodGuide selects indicators and utilizes rating methods that vary by product category. Full details of the issues addressed and scoring methods used in different product categories are provided in the following pages:

Aggregating Indicator Scores to Generate Ratings

We roll-up indicator scores into issue-specific groups (e.g., human health impacts, ingredient disclosure) to assign ratings. All issues and indicators are not equal. In order to generate a rating that accurately reflects the relative importance of different issues or indicators, we apply weights to issues and utilize different aggregation algorithms.

Our rating frameworks define what is known as a “value tree” in multi-attribute utility theory. Each specific set of indicators, sub-issues or major issues are hierarchically organized into “nodes.” For each node, we specify the weights or aggregation algorithm used to compile scores from the constituents of that node.

Aggregation algorithms are used throughout our ratings system to combine sets of scores. Available methods include:

  • Maximum (select the highest score in a set). This is generally used in positive nodes that include certification indicators because this value promotes the most positive signal about an issue, without dilution due to inaction or no data on other indicators relevant to the same issue.
  • Minimum (select the lowest score in a set). This is generally used in negative nodes that include Hazard or Restriction indicators because this value promotes the most negative signal about an issue, without dilution by positive values on other indicators relevant to the same issue.
  • Mean (calculate the average of all scores in a set). This is generally used when aggregating scores from a set of positive and negative sub-nodes in order to allow real world signals (from either quantitative metrics or compliance counts) to influence a score in either a positive or negative direction.
  • Preferred (select score from top available indicator in a rank ordered set of indicators). This is used in nodes where data sources or indicators have been rank ordered based on quality or relevance to an issue. It promotes the score from the best available source or indicator.
  • Matrix (apply a custom calculation to a set of indicators). This is used in product-level ratings when a set of indicators have to be combined using domain-specific rules to correctly characterize an issue. Prominent examples include the scoring rules applied to rate food products on their nutritional value; personal care products on their potential human health impact; and attribution of extra credit for product performance.

The Role of Value Judgments

Value judgments are unavoidable in rating systems, and GoodGuide's is no exception. Even the most scientifically grounded assessment requires value judgments about the relative importance of various issues and types of evidence, as well as the treatment of data gaps. We acknowledge that users can disagree over the relative weight given to different health hazards as there is no objective, correct solution to the problem of how to aggregate such disparate concerns.