Inside a cavernous stone fortress in downtown Pittsburgh, attorney Robin Frank defends parents at one of their lowest points – when they risk losing their children.
The job is never easy, but in the past she knew what she was up against when squaring off against child protective services in family court. Now, she worries she’s fighting something she can’t see: an opaque algorithm whose statistical calculations help social workers decide which families should be investigated in the first place.
“A lot of people don’t know that it’s even being used,” Frank said. “Families should have the right to have all of the information in their file.”
From Los Angeles to Colorado and throughout Oregon, as child welfare agencies use or consider tools similar to the one in Allegheny County, Pennsylvania, an Associated Press review has identified a number of concerns about the technology, including questions about its reliability and its potential to harden racial disparities in the child welfare system. Related issues have already torpedoed some jurisdictions’ plans to use predictive models, such as the tool notably dropped by the state of Illinois.
According to new research from a Carnegie Mellon University team obtained exclusively by AP, Allegheny’s algorithm in its first years of operation showed a pattern of flagging a disproportionate number of Black children for a “mandatory” neglect investigation, when compared with white children. The independent researchers, who received data from the county, also found that social workers disagreed with the risk scores the algorithm produced about one-third of the time.
County officials said that social workers can always override the tool, and called the research “hypothetical.”
Child welfare officials in Allegheny County, the cradle of Mister Rogers’ TV neighborhood and the icon’s child-centric innovations, say the cutting-edge tool – which is capturing attention around the country – uses data to support agency workers as they try to protect children from neglect. That nuanced term can include everything from inadequate housing to poor hygiene, but is a different category from physical or sexual abuse, which is investigated separately in Pennsylvania and is not subject to the algorithm.
“Workers, whoever they are, shouldn’t be asked to make, in a given year, 14, 15, 16,000 of these kinds of decisions with incredibly imperfect information,” said Erin Dalton, director of the county’s Department of Human Services and a pioneer in implementing the predictive child welfare algorithm.
Critics say it gives a program powered by data mostly collected about poor people an outsized role in deciding families’ fates, and they warn against local officials’ growing reliance on artificial intelligence tools.
If the tool had acted on its own to screen in a comparable rate of calls, it would have recommended that two-thirds of Black children be investigated, compared with about half of all other children reported, according to another study published last month and co-authored by a researcher who audited the county’s algorithm.
Advocates worry that if similar tools are used in other child welfare systems with minimal or no human intervention–akin to how algorithms have been used to make decisions in the criminal justice system–they could reinforce existing racial disparities in the child welfare system.
“It’s not decreasing the impact among Black families,” said Logan Stapleton, a researcher at Carnegie Mellon University. “On the point of accuracy and disparity, (the county is) making strong statements that I think are misleading.”
Because family court hearings are closed to the public and the records are sealed, AP wasn’t able to identify first-hand any families who the algorithm recommended be mandatorily investigated for child neglect, nor any cases that resulted in a child being sent to foster care. Families and their attorneys can never be sure of the algorithm’s role in their lives either because they aren’t allowed to know the scores.
Safer, faster
Incidents of potential neglect are reported to Allegheny County’s child protection hotline. The reports go through a screening process where the algorithm calculates the child’s potential risk and assigns a score. Social workers then use their discretion to decide whether to investigate.
The Allegheny Family Screening Tool is specifically designed to predict the risk that a child will be placed in foster care in the two years after they are investigated. Using a trove of detailed personal data collected from birth, Medicaid, substance abuse, mental health, jail and probation records, among other government data sets, the algorithm calculates a risk score of 1 to 20: The higher the number, the greater the risk.
Given the high stakes – skipping a report of neglect could end with a child’s death but scrutinizing a family’s life could set them up for separation – the county and developers have suggested their tool can help “course correct” and make the agency’s work more thorough and efficient by weeding out meritless reports so that social workers can focus on children who truly need protection.
The developers have described using such tools as a moral imperative, saying child welfare officials should use whatever they have at their disposal to make sure children aren’t neglected.
“There are children in our communities who need protection,” said Emily Putnam-Hornstein, a professor at the University of North Carolina at Chapel Hill’s School of Social Work who helped develop the Allegheny tool, speaking at a virtual panel held by New York University in November.
Dalton said algorithms and other predictive technologies also provide a scientific check on call center workers’ personal biases because they see the risk score when deciding if the case merits an investigation. If the case is escalated, Dalton said the full investigation is carried out by a different social worker who probes in person, decides if the allegations are true and helps determine if the children should be placed in foster care.
CMU researchers found that from August 2016 to May 2018, the tool calculated scores that suggested 32.5% of Black children reported as being neglected should be subject to a “mandatory” investigation, compared with 20.8% of white children.
In addition, the county confirmed to the AP that for more than two years, a technical glitch in the tool sometimes presented social workers with the wrong scores, either underestimating or overestimating a child’s risk. County officials said the problem has since been fixed.
The county didn’t challenge the CMU researchers’ figures, but Dalton said the research paper represented a “hypothetical scenario that is so removed from the manner in which this tool has been implemented to support our workforce.”
The CMU research found no difference in the percentage of Black families investigated after the algorithm was adopted. The study found the workers were able to reduce this disparity produced by the algorithm.
The county says that social workers are always in the loop and are ultimately responsible for deciding which families are investigated because they can override the algorithm, even if it flags a case for mandatory investigation. Dalton said the tool would never be used on its own in Allegheny, and doubted any county would allow for completely automated decision-making about families’ lives.
“Of course, they could do that,” she said. “I think that they are less likely to, because it doesn’t make any actual sense to do that.”
Despite what the county describes as safeguards, one former contractor for the child welfare agency says there is still cause for concern.
“When you have technology designed by humans, the bias is going to show up in the algorithms,” said Nico’Lee Biddle, who has worked for nearly a decade in child welfare, including as a family therapist and foster care placement specialist in Allegheny County. “If they designed a perfect tool, it really doesn’t matter, because it’s designed from very imperfect data systems.”
Biddle is a former foster care kid turned therapist, social worker and policy advocate. In 2020, she quit, largely due to her growing frustrations with the child welfare system. She also said officials dismissed her concerns when she asked why families were originally referred for investigation.
“We could see the report and that decision, but we were never able to see the actual tool,” she said. “I would be met with … ‘What does that have to do with now?’”
In recent years, movements to reshape – or dismantle – child protective services have grown, as generations of dire foster care outcomes have been shown to be rooted in racism.
In a memo last year, the U.S. Department of Health and Human Services cited racial disparities “at nearly every major decision-making point” of the child welfare system, an issue Aysha Schomburg, the associate commissioner of the U.S. Children’s Bureau said leads more than half of all Black children nationwide to be investigated by social workers. “Over surveillance leads to mass family separation,” Schomburg wrote in a recent blog post.
With discussions about race and equity looming large in child welfare circles, Putnam-Hornstein last fall took part in a roundtable of experts convened by the conservative American Enterprise Institute and co-authored a paper that slammed advocates who believe child welfare systems are inherently racist.
She said she collaborated with the group that suggested there are “racial disparities in the incidence of maltreatment” because she sees the need for reforms, and believes “that the adoption of algorithmic decision aids can help guard against subjectivity and bias.”
Some researchers worry that as other government agencies implement similar tools, the algorithms could be allowed to make some decisions on their own.
“We know there are many other child welfare agencies that are looking into using risk assessment tools and their decisions about how much fully to automate really vary,” said Stapleton. “Had Allegheny County used it as a fully automated tool, we would have seen a much higher racial disparity in the proportion of kids who are investigated.”
'Lab rats'
A decade ago, the developers of Allegheny’s tool – Putnam-Hornstein and Rhema Vaithianathan, a professor of health economics at New Zealand’s Auckland University of Technology – began collaborating on a project to design a predictive risk model for New Zealand’s child welfare system.
Vaithianathan and colleagues prototyped a new child abuse screening model that proposed using national data to predict the risk that the child protection system would confirm allegations that a child had been mistreated by age 5. The plan was scrapped after documents revealed the Ministry of Social Development’s head sharply opposed the project, declaring: “These are children, not lab rats.”
The minister wasn’t the only one concerned. Emily Keddell, a professor of social work at Otago University in New Zealand who analyzed the tool in the peer-reviewed Critical Social Policy journal, found that it would likely have resulted in more Māori families being tagged for investigation, reinforcing “existing structural inequalities by contributing to the ongoing stigmatisation of this population.”
In response, Vaithianathan said that she and her collaborators are open to community criticism and committed to showing their work, even if jurisdictions decide against it. She added that she has worked extensively with Indigenous Māori researchers.
“We encourage agencies to listen to those critical voices and to make leadership decisions themselves,” she said.
Vaithianathan and Putnam-Hornstein said they have since expanded their work to at least half a dozen cities and counties across the United States and have explored building tools in Chile and Australia.
Brian Chor, a clinical psychologist and child welfare researcher at the University of Chicago’s Chapin Hall, said the pair are respected for confronting ethical and racial concerns in creating the tool. He also said that Pittsburgh was the perfect place to create a model algorithm for other public welfare agencies.
“Allegheny County is probably an early adopter where the stars seem to be aligned, where they have the data,” Chor said. “They have a solid recipe that I think is replicable.”
In several public presentations and media interviews, Vaithianathan and Putnam-Hornstein said they want to use public data to help families in need.
“We’re researchers and we’re trying to model what good, good approaches look like in this field,” Vaithianathan said in an interview. The developers also noted in a document sent to Pennsylvania’s Department of Human Services last year that demand for their tools had increased due to the pandemic, as the state weighed a proposal for a statewide tool that would cost $520,000 to develop and implement.
Vaithianathan has said the tool ultimately can help address racial bias, and has pointed to a 2019 Stanford University evaluation commissioned by Allegheny County that suggests it may have had a modest impact on some disparities.
“I’ve always felt that these are tools that have the opportunity to improve the quality of decision making,” Vaithianathan said at a November panel. “To the extent that they are used with careful guardrails around them, I think they also offer an opportunity for us to try and address some of those systemic biases.”
But when AP asked county officials to address Carnegie Mellon’s findings on the tool’s pattern of flagging a disproportionate number of Black children for a “mandatory” child neglect investigation, Allegheny County questioned the researchers’ methodology by saying they relied on old data.
The researchers reran the analysis using newer data to address the county’s concerns and reached many of the same conclusions.
In response to AP, Allegheny County provided research that acknowledges the tool has not helped with combating disparities in the rates at which Black and white child neglect cases are investigated. A recent unpublished analysis written by the developers themselves determined “no statistically significant effect of the algorithm on this disparity.”
“We don’t frame the entire decision-making process around race, though clearly it’s an important thing that we think about,” Dalton said.
Dalton said her team wants to keep improving the tool and is considering new updates, including adding available private insurance data to capture more information about middle class and upper income families, as well as exploring other ways to avoid needless interventions.
Dalton also downplayed the algorithm’s role in neglect investigations.
“If it goes into court, then there’s attorneys on both sides and a judge,” Dalton said. “They have evidence, right?”
Chor disagreed, saying Allegheny’s tool is applied at the most important point of the child welfare system.
“The very front end of child protection decision-making is understandably the most impactful decision that you can make on a child’s life, because once you come into contact with the hotline, with an investigator, then your chance of being removed, of course, is increased,” Chor said.
The latest version of the tool excludes information about whether a family has received welfare dollars or food stamps, data that was initially included in calculating risk scores. It also stopped predicting whether a child would be reported again to the county in the two years that followed. However, much of the current algorithm’s design remains the same, according to American Civil Liberties Union researchers who have studied both versions.
The county initially considered including race as a variable in its predictions about a family’s relative risk but ultimately decided not to, according to a 2017 document. Critics say even if race is not measured outright, data from government programs used by many communities of color can be a proxy for race. In the document, the developers themselves urged continuing monitoring “with regard to racial disparities.”
“If over a million dollars have been spent creating and maintaining this tool, only for call screeners to disagree with it, for racial disparities to stay essentially level, and for screen-ins to continue at unreasonably high rates, is that the best use of Allegheny County’s resources?” asked Kath Xu, an attorney at the ACLU.
Child welfare agencies in at least 26 states and Washington, D.C., have considered using algorithmic tools, and at least 11 have deployed them, according to a recent ACLU white paper by Xu and colleagues.
Little transparency, growing influence
Family law attorney Frank says she’s always worried about the lack of due process and secrecy surrounding Allegheny County’s child welfare algorithm. Some of her clients have asked if the system was surveilling them because they used public assistance or community programs, but she can’t answer.
“I just don’t understand why it’s something that’s kept in secret,” Frank said.
Once, Frank recalled, a judge demanded to know a family’s score, but the county resisted, claiming it didn’t want to influence the legal proceeding with the numbers spat out by the algorithm.
Bruce Noel, who oversees call screeners using Allegheny’s tool, said that while the risk score advises their decision on whether to launch an investigation, he is torn about sharing that information with families because of the tool’s complexity. He added that he is cognizant of the racial disparities in the underlying data, and said his team didn’t have much input into development.
“Given that our data is drawn from public records and involvement with public systems, we know that our population is going to garner scores that are higher than other demographics, such as white middle class folks who don’t have as much involvement with public systems,” Noel said.
Dalton said she personally doesn’t support giving parents their score because she worries it could discourage people from seeking services when they need them.
“I do think there are risks and I want the community to also be on board with … the risks and benefits of transparency,” Dalton said.
Other counties using algorithms are taking a different approach. Larimer County, Colorado, home to Fort Collins, is now testing a tool modeled on Allegheny’s and plans to share scores with families if it moves forward with the program.
“It’s their life and their history,” said Thad Paul, a manager with the county’s Child, Youth & Family Services. “We want to minimize the power differential that comes with being involved in child welfare … we just really think it is unethical not to share the score with families.”
In the suburbs south of Denver, officials in Douglas County, Colorado, are using a similar tool and say they will share scores with families who request it.
Oregon does not share risk score numbers from its statewide screening tool, which was first implemented in 2018 and inspired by Allegheny’s algorithm. The Oregon Department of Human Services – currently preparing to hire its eighth new child welfare director in six years – explored at least four other algorithms while the agency was under scrutiny by a crisis oversight board ordered by the governor.
It recently paused a pilot algorithm built to help decide when foster care children can be reunified with their families. Oregon also explored three other tools – predictive models to assess a child’s risk for death and severe injury, whether children should be placed in foster care and if so, where.
For years, California explored data-driven approaches to the statewide child welfare system before abandoning a proposal to use a predictive risk modeling tool Putnam-Hornstein’s team developed in 2019. The state’s Department of Social Services spent $195,273 on a two-year grant to develop the concept.
“During the project, the state also explored concerns about how the tool may impact racial equity. These findings resulted in the state ceasing exploration,” department spokesman Scott Murray said in an email.
Putnam-Hornstein’s team is currently working with one of the nation’s largest local child welfare systems in Los Angeles County as it pilots a related tool.
The embattled agency is being audited following high-profile child deaths, and is currently seeking a new director after its previous one stepped down late last year. The “complex-risk algorithm” helps to isolate the highest-risk cases that are being investigated, according to the county’s Department of Children and Family Services.
So far, the experiment has been limited to the Belvedere, Lancaster, and Santa Fe Springs offices, the agency said. The tool also has allowed the agency to generate and review reports about cases involving Black children and families who were deemed low-risk, but were still investigated and didn’t result in any conclusive or substantiated allegations, the county said.
In the Mojave Desert city of Lancaster, U.S. Census shows 22% of the city’s child population is Black. In the first few months that social workers started using the tool, county data shows that Black children were the subject of nearly half of all the investigations flagged for additional scrutiny.
The county did not immediately say why, but said it will decide whether to expand the tool later this year.
Back in Pittsburgh, family law attorney Frank is still trying to untangle how, exactly, the county’s algorithm is impacting each client she shepherds through the system.
To find strength on the brutal days, she keeps a birthday calendar for the children she’s helped and sends them handwritten cards to remember times when things went right.
She’s still haunted by a case in which she says she heard a social worker discuss a mother’s risk score in court around 2018. The case ultimately escalated to foster care, but Frank has never been able to understand how that number influenced the family’s outcome.
County officials said they could not imagine how a risk score could end up in court.
“There’s no way to prove it – that’s the problem,” Frank said.
Associated Press reporter Camille Fassett contributed to this report.
This story, supported by the Pulitzer Center for Crisis Reporting, is part of an ongoing Associated Press series, “Tracked,” that investigates the power and consequences of decisions driven by algorithms on people’s everyday lives.