October 18, 2019

Comments to the Department of Housing and Urban Development on Disparate Impact

Aaron Rieke, Logan Koepke, and Urmila Janardan

We argued that HUD’s proposed changes to its disparate impact rule would undermine crucial housing protections for vulnerable communities by reducing plaintiffs’ ability to address discriminatory effects arising from the use of algorithmic models.

Re: Reconsideration of HUD's Implementation of the Fair Housing Act's Disparate Impact Standard, Docket No. FR-6111-P-02

Upturn writes to provide comments in response to the above-docketed notice of proposed rulemaking (“NPRM”) concerning proposed changes to the disparate impact standard (the “proposed rule”) as interpreted by the U.S. Department of Housing and Urban Development (“HUD”).

Upturn is a 501(c)(3) non-profit organization that advances equity and justice in the design, governance, and use of digital technology. Upturn's staff has years of experience working in partnership with the nation’s leading civil rights and public interest organizations, and has developed unique expertise at the intersection of civil rights, law, and computer science.

The proposed rule would undermine crucial housing protections for vulnerable communities. It would eviscerate HUD and other plaintiffs' ability to address discriminatory effects arising from the use of algorithmic models (hereinafter “models”), in spite of the fact that such models are “increasingly commonly used” in determining people's eligibility for a range of housing opportunities. It would effectively create a “special exemption” for parties who use such models, even though these models can have significant discriminatory effects. In sum, the proposed rule will likely result in harm to the very groups that the Fair Housing Act (“FHA”) seeks to protect, and is flatly incompatible with HUD's legal obligation to affirmatively further fair housing.

It is important to emphasize that the disparate impact doctrine is the most effective legal tool — and often the only legal tool — with which to combat discrimination arising from the use of models, in housing markets or elsewhere. Without the disparate impact doctrine, opaque, automated decisions will effectively rise above the law when there is no clear evidence of disparate treatment.

Upturn has endorsed a separate comment signed by a range of individuals and organizations with expertise in the fields of computer science, statistics, and digital and civil rights. We write separately here to further underscore some basic, technical facts that HUD must reconcile with any modification to its existing disparate impact rule.

1. Models can produce significant discriminatory effects, even when they do not rely on factors that are “substitutes or close proxies for protected classes.”

The proposed rule would allow a defendant using a model to defeat a disparate impact claim by “[p]rovid[ing] the material factors that make up the inputs used in the challenged model and show[ing] that these factors do not rely in any material part on factors that are substitutes or close proxies for protected classes under the Fair Housing Act and that the model is predictive of credit risk or other similar valid objective.”

At a basic level, this new proposed defense is overbroad because models that do not take “substitutes or close proxies for protected classes” as inputs can nevertheless have discriminatory effects. This is well-understood in the context of fair lending. For example, in 2003, Congress ordered the Federal Reserve Board ("FRB") and the Federal Trade Commission, in consultation with HUD, to conduct a study of the effects of credit history scoring, including negative or differential effects on protected classes. This substantial report was commissioned despite the fact that consumer credit files — the inputs for the models at issue — clearly did not contain “substitutes or close proxies” for protected classes. Nevertheless, Congress was interested in the effects of credit history scoring as a general matter. And rightly so: The FRB's report discovered, among other things, that the length of an individual’s credit history served as a proxy for age in ways that could not be easily addressed. The proposed rule appears to entirely disregard the need for this type of inquiry, which is only becoming more important as models become more common and more complex.

More significantly, the proposed defense is flawed as it fails to acknowledge the many phases of model development, including problem definition, data collection and labeling, model selection and training, data partitioning, and the model's actual deployment. It is widely understood that statistical models can inherit biases against protected classes at each of these steps, even when protected class attributes are not considered. Often, discrimination that arises within statistical models is not obvious: it can come from “subtle correlations discovered by training algorithms, and [is] therefore difficult to detect.” There is a rich and growing technical literature expounding on these and related issues.

These are not merely academic observations. There are many notable examples of models exhibiting a range of discriminatory effects in the real world. For example:

In the domain of criminal justice, the risk assessment instrument COMPAS has shown demonstrable bias against black individuals, resulting in longer prison sentences and harsher terms. COMPAS does not take race or ethnicity as an input, but disproportionately and incorrectly labels black individuals as highly likely to commit future crimes. There may not be one individual feature in COMPAS that is responsible for this disparity, but rather an interaction between inputs in addition to data sampling biases.

Today's facial recognition technologies are often significantly better at recognizing white faces than black and brown faces. This phenomenon has less to do with inputs to the model, but rather the skewed training data that includes a much larger volume of white, male faces than any other race or gender. This is a clear example of how machine learning can create biased models as a result of incomplete or nonrepresentative training data.

As HUD is well aware, Facebook's ad platform “optimizes” the delivery of housing advertisements on its platform, above and beyond the targeting criteria specified by an advertiser. This optimization can lead to racially skewed delivery patterns despite the fact that Facebook does not collect data about its users' race. HUD has alleged this practice violates the FHA. However, it is difficult to envision how HUD could succeed in establishing a prima facie case under the strictures of this proposed rule.

In sum, models that exhibit discriminatory effects cannot be diagnosed merely by examining its inputs. On the contrary, to properly assess a model for discriminatory effects, an investigator will likely need first understand a model's purpose, and then consult design documentation, training data, executable code, test results, and other artifacts.

2. Models can be complex, opaque, and difficult to assess — so it's unreasonable to offer blanket safe harbors based on vague or superficial criteria.

The proposed rule would allow a defendant using a model to insulate themselves from disparate impact claims by showing that, in part, “the challenged model is produced, maintained, or distributed by a recognized third party that determines industry standards” or “the model has been subjected to critical review and has been validated by an objective and unbiased neutral third party.”

These proposed defenses amount to safe harbors, which are severely out of step with adjacent regulatory regimes. For example, in the context of consumer credit, Regulation B defines requirements for “an empirically derived, demonstrably and statistically sound, credit scoring system.” This definition has become a helpful yardstick for the industry. However, unlike this proposed rule, Regulation B does not effectively create a blanket safe harbor. Rather, if a consumer credit scoring system conforms to the “empirically derived, demonstrably and statistically sound” definition, then that system is allowed to consider an applicant’s age directly — something a system would not otherwise be able to consider under ECOA. In the context of employment, the Uniform Guidelines on Employment Selection Procedures offer a nonbinding rule of thumb known as the “4/5ths rule” to help determine when an employer might be at risk of a disparate impact case. However, this rule is merely a heuristic: it is not sufficient to say whether or not an employer will ultimately be found liable for deploying a discriminatory selection procedure. Importantly, neither regime provides entities with the kind blanket immunity contemplated by this proposed rule.

Finally, as a practical and legal matter, FHA-covered entities should not enjoy immunity from disparate impact claims simply because they rely on a model provided by a third party. Such a policy would disincentivize covered entities from fully understanding and testing the models that they commission and deploy. Moreover, such an approach would create an accountability black hole, since not all model vendors are clearly covered by the FHA, and are likely to consider their models proprietary. Today, many prominent vendors, including FICO, will indemnify their customers (e.g., lenders), because those customers are ultimately responsible for their compliance with antidiscrimination laws. There is no reason to disturb this sensible structure of accountability.

3. There is no widely recognized definition of “substitutes or close proxies for protected classes.”

In two separate instances under the proposed rule, defendants can defeat a disparate impact claim in part by demonstrating that the challenged model “rel[ies] in any material part on factors that are substitutes or close proxies for protected classes under the Fair Housing Act.”

Critically, the proposed rule does not define what constitutes a factor that might be a “close prox[y],” nor does it offer a guide as to how one might determine whether or not a factor, as a statistical matter, is a close proxy to a protected class covered under the Fair Housing Act. Ultimately, the absence of clarity on how to define or measure a close proxy for a protected class — as a statistical proposition — has clear consequences: courts across the country, when faced with these affirmative algorithmic defenses, will be enlisted en masse in adjudicating statistical disputes. Not only are judges ill-equipped to arbitrate such disputes, the experts upon which courts would rely would likely be unable to offer a consensus view, needlessly drawing courts across the country into line-drawing exercises about what does and does not count as a close proxy.

4. Assessing models for discriminatory effects often requires access to data about individuals' protected class status.

The proposed rule states that HUD does not “encourag[e] the collection of data with respect to race, color, religion, sex, handicap, familial status, or national origin” and that the “absence of any such collection efforts shall not result in any adverse inference against a party.”

This provision risks discouraging covered entities from developing robust processes to test their models for discriminatory effects. There is an ongoing debate about when protected class data should be collected and for what purposes. For example, in the context of consumer credit, the FRB twice considered amendments to the Equal Credit Opportunity Act that would allow voluntary collection of protected class data for non-mortgage loan applicants in order to surface discriminatory lending decisions. To the extent HUD wishes to clarify its stance on how and when FHA-covered entities should or should not collect data about protected attributes, it should do so in a separate proceeding.

It's important to recognize that the collection of protected class data for antidiscrimination purposes has a long and important history. Banks, hospitals, housing providers, and employers routinely collect or infer race data as a key tool to measure and remediate discriminatory effects. Indeed, federal laws and regulations often mandate such collection for the purpose of measuring and addressing disparities. One example is the Home Mortgage Disclosure Act, which has made important data mutually accessible to lending institutions and community organizations — resulting in positive outcomes for banks and borrowers alike.

Looking to the future, machine learning practitioners have emphasized that awareness of sensitive attributes, such as race and gender, can be critical to detecting and remediating bias in complex models. HUD should be open to exploring how these emerging methods might help affirmatively further fair housing, rather than preemptively shuttering the door.

Upturn Authors

We appreciate the research assistance of Shazeda Ahmed and Brian Remlinger in drafting these comments.

For example, relevant to these proceedings, Upturn’s project on Economic Opportunity has produced research papers on discriminatory advertising (Ali, Muhammad, Piotr Sapiezynski, Miranda Bogen, Aleksandra Korolova, Alan Mislove, and Aaron Rieke. “Discrimination through optimization: How Facebook’s ad delivery can lead to skewed outcomes.” Retrieved from: https://arxiv.org/abs/1904.02095) and hiring algorithms (Bogen, Miranda, and Aaron Rieke. Help wanted: an examination of hiring algorithms, equity, and bias. (Upturn, 2018). Retrieved from: https://www.upturn.org/reports/2018/hiring-algorithms/), testified before the House Committee on Financial Services Task Force on Financial Technology on equitable uses of new credit data for underwriting (Testimony of Aaron Rieke, Examining the Use of Alternative Data in Underwriting and Credit Scoring to Expand access to Credit (Task Force on Financial Technology, 2019). Retrieved from: https://financialservices.house.gov/uploadedfiles/hhrg-116-ba00-wstate-riekea-20190725.pdf), and filed an amicus brief in district court detailing the operation of Facebook’s advertising platform (Upturn’s Motion for Administrative Relief for Leave to File Brief as Amicus Curiae in Support of Plaintiff’s Opposition to Facebook’s Motion to Dismiss First Amended Complaint (US Northern District of California, 2018), Retrieved from: https://www.upturn.org/static/files/2018-11-16_Upturn_Facebook_Amicus.pdf.)

This comment focuses primarily on statistical models, i.e., models that express a mathematical relationship between variables. However, many of the arguments in this comment will also apply to traditional “algorithms,” i.e., a process or set of rules.

Proposed Rule, Supplementary Information.

Jung Hyun Choi, Karan Kaul, Laurie Goodman, FinTech Innovation in the Home Purchase and Financing Market (Urban Institute, 2019), Retrieved from: https://www.urban.org/research/publication/fintech-innovation-home-purchase-and-financing-market.

Proposed Rule, Supplementary Information.

See Infra, 1.

42 U.S.C. § 3608(d), (e)(5).

See, Comments of the Center on Democracy and Technology, et al., Re: Reconsideration of HUD’s Implementation of the Fair Housing Act’s Disparate Impact Standard, (2018), Docket No. FR-6111-P-02.

Proposed Rule, § 100.500(c)(2)(i).

The Federal Reserve Board, Report to the Congress on Credit Scoring and Its Effects on the Availability and Affordability of Credit (2007), Retrieved from: https://www.federalreserve.gov/boarddocs/rptcongress/creditscore/.

Board of Governors of the Federal Reserve System, Report to the Congress on Credit Scoring and its Effects on the Availability and Affordability of Credit (2007), S-6.

David Lehr & Paul Ohm, Playing with the Data: What Legal Scholars Should Learn About Machine Learning, (U.C. DAVIS L. REV. 51, 2017), 653. (Observing that machine-learned systems “are the complicated outputs of intense human labor — labor from data scientists, statisticians, analysts, and computer programmers. From the moment these humans conceptualize a predictive task to the moment the running model is deployed, they exert significant and articulable influence over everything from how the data are cleaned to how simple or complex the algorithm’s learning process is. Along the way, they have the power to affect the running model’s accuracy, explainability, and discrimination.”)

See, e.g., Anupam Datta et al., Proxy Discrimination in Data Driven Systems, (arXiv:1707.08120, 2017), 1.

Ibid.

1
Julia Angwin et al., Machine Bias, (ProPublica, 2016), Retrieved from https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.

1
Joy Buolamwini and Timnit Gebru, Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification (Conference on Fairness, Accountability and Transparency, 2018), 77-91. (“The substantial disparities in the accuracy of classifying darker females, lighter female, darker males, and lighter males in gender classification systems require urgent attention if commercial companies are to build genuinely fair, transparent and accountable facial analysis algorithms”, 1).
1
Id.
1
See Tony Sun et al., Mitigating Gender Bias in Natural Language Processing: Literature Review (arXiv:1906.08976, 2019).

1
Secretary, United States Department of Housing and Urban Development v. Facebook, Inc., Charge of Discrimination, (2019), at 13, 19. Retrieved from: https://www.hud.gov/sites/dfiles/Main/documents/HUD_v_Facebook.pdf.
1
Muhammad Ali et al., Discrimination through optimization: How Facebook’s ad delivery can lead to skewed outcomes (2019), Retrieved from: https://arxiv.org/abs/1904.02095.
1
Secretary, United States Department of Housing and Urban Development v. Facebook, Inc., Charge of Discrimination, (2019), at 13, 19. Retrieved from: https://www.hud.gov/sites/dfiles/Main/documents/HUD_v_Facebook.pdf.

For a general overview on lessons and methods for automated decision, see: Public Scrutiny of Automated Decisions: Early Lessons and Emerging Methods (Upturn and Omidyar Network, 2018), Retrieved from: https://www.omidyar.com/insights/public-scrutiny-automated-decisions-early-lessons-and-emerging-methods).

Proposed Rule, § 100.500(c)(2)(ii), (iii).

12 C.F.R. § 202.2(p) (2014).

See, David Skanderson & Dubravka Ritter, Fair Lending Analysis of Credit Cards, (FRB of Philadelphia-Payment Cards Center Discussion Paper, 2014), Retrieved from: https://www.philadelphiafed.org/-/media/consumer-finance-institute/payment-cards-center/publications/discussion-papers/2014/D-2014-Fair-Lending.pdf.

Uniform Guidelines on Employee Selection Procedures (EEOC, 1974), § 4.D. (“A selection rate for any race, sex, or ethnic group which is less than four-fifths (4/5) (or eighty percent) of the rate for the group with the highest rate will generally be regarded by the Federal enforcement agencies as evidence of adverse impact, while a greater than four-fifths rate will generally not be regarded by Federal enforcement agencies as evidence of adverse impact.”)

Proposed Rule at § 100.500(c)(2)(i) and (iii).

Proposed Rule at § 100.5(d).

See generally: Winnie Taylor, Racial Discrimination and Monitoring Fair Lending Compliance: The Missing Data Problem in Nonmortgage Credit, (Review of Banking and Financial Law 31, 2011), 199.

National Consumer Law Center et al., Group Letter to Consumer Financial Protection Bureau Regarding Public Disclosure of New HMDA Data Points (2015), Retrieved from https://www.nclc.org/images/pdf/foreclosure_mortgage/predatory_mortgage_lending/letter-re-hmda-benefitsand-privacy.pdf.

See, e.g., Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel, Fairness Through Awareness, (Proc. of Innovations in Theoretical Computer Science, 2012), Retrieved from: https://arxiv.org/abs/1104.3913; Michael Feldman et al, Certifying and removing disparate impact, (Proc. 21st ACM KDD, 2015), Retrieved from: https://arxiv.org/abs/1412.3756; Jiahao Chen et al, Fairness Under Unawareness: Assessing Disparity When Protected Class Is Unobserved, (FAT* ’19: Conference on Fairness, Accountability, and Transparency, 2019), Retrieved from: https://doi.org/10.1145/3287560.3287594.