If you are a researcher, faculty member, assistant, or student who spends time in one of the University of Pennsylvania's laboratories, you probably know that some substances you work with may be hazardous and require special handling. In some instances, it might be obvious what substances are dangerous and how to keep yourself safe — wear gloves when handling hydrogen peroxide! — but other times the rules aren’t so clear. It’s the job of the Office of Environmental Health and Radiation Safety (EHRS) to develop, implement, and manage the systems that keep Penn’s labs as safe as possible. But in the last few years, the team has begun turning to chemistry librarian Judith Currano as a vital partner in these processes.
Why? In 2014, Penn announced that all Penn Laboratories would be required to inventory their hazardous chemicals. Former Senior Lab Safety Specialist Kimberly Brown was tasked with leading this monumental project to gather inventories of all substances kept in individual laboratories across Penn’s campus. And that was only the beginning. She quickly realized that the information supplied by laboratories was often varied and incomplete, and that in order to create a genuinely thorough record of hazardous substances in Penn’s laboratories, she needed someone with experience searching for and gathering chemical information. That person was Judith Currano.
Today, the inventory includes over 130,000 containers stored across more than 1,300 lab rooms. I recently spoke with Currano and Brown to learn more about how they tackled this project, what it takes to keep this massive inventory up to date, and why understanding how to effectively search databases of chemical information is vital to keeping scientists safe.
Can you describe how you started collaborating on this inventory?
In 2020, Kimi [Brown] presented me with a list of substances that researchers had added to the system that we didn’t have enough information on. My goal was to identify these unknown substances and determine whether or not they were hazardous.
There were about 5,800 substances in the initial list we gave you. We knew we couldn’t correct all of them, so we decided to only correct the ones that are high hazard. But how do we figure out which ones are scary? That's when I thought, I don't know the answer, but maybe Judith does.
And I didn't know how to answer that question either. I could have looked up 6,000 chemicals, but that was just too much!
The first thing we had to do was look up the chemicals in PubChem, the National Library of Medicine database, to determine whether they had known health and safety risks that were coded using certain GHS codes. GHS codes are standardized codes that describe the risk these chemicals pose to a human or to the environment. Is it something that's going to kill you? Is it something that's going to irritate your skin? Is it something that's going to cause serious groundwater contamination?
I contacted my colleagues at PubChem, who told me exactly how to write a short script that would allow me to search for many substances simultaneously using their CAS Registry numbers. CAS Registry numbers are unique, one-to-one identifiers assigned by Chemical Abstracts Service (CAS) the first time a substance is published in a piece of literature that they index, and they are frequently used when purchasing substances because they are unambiguous identifiers.
I used that script search to get a set of substances from Kimi’s list that had CAS registry numbers and that had a record within the PubChem compound database. Then I took the list of GHS codes that Kimi had given me and I ran a series of searches through PubChem that essentially said, ‘Give me every substance that has this particular code.’ I combined my searches to get a final list of those substances owned by Penn that had the relevant GHS codes and sent the results to Kimi's team.
We got a list of about 80 of those substances, and then we were able to correct those records in our inventory. Keep in mind, it was 80 substances, but each substance might have multiple containers related to it. Once we made those corrections, we contacted those labs and said, ‘We made this correction, and by the way, did you know you have this chemical that's fatal if it’s inhaled? And by the way, if you don't have a purpose for having it, would you like us to get rid of it for you?’ So many labs took us up on that, which was great. It had a real impact on the safety of the labs.
The importance of chemical safety in the lab is hopefully obvious to chemists. But for the rest of us, can you explain why creating this kind of inventory is so important?
There are two types of benefits that come with the chemical inventory. One is the efficiency in the economy of lab operations, and the other is safety and compliance.
If researchers know what chemicals they have and where to find them in their lab, they're not going to inadvertently purchase duplicates. That means they're going to save time and money. It should also mean fewer expired materials, and fewer unused or unneeded chemicals. That's especially useful to research groups in chemistry that have a lot of chemicals and are heavy chemical users, but it benefits the less chemically intensive labs too, because they can more easily borrow chemicals from other labs when they only need a small quantity of something.
When it comes to safety and compliance, we can use inventory information for benchmarking storage needs when we're doing lab renovations. We can use the inventory to send targeted communications to labs about hazardous substances. If we have a change in guidelines or if an incident happens, we can use our inventory to generate a list of labs that have that chemical and message them about it.
We also use the inventory in lab moves and clean outs. We can use this to redistribute unwanted chemicals to other groups, so we’re not disposing of everything or making it a free-for-all, which isn't really a responsible way to redistribute those materials.
But all of those benefits require us not just to have the inventory, but to keep it up to date, accurate, and available. If we can't trust the inventory, then we can't have any of those benefits.
What is the process for keeping the inventory up to date?
One of the things we do is to focus our efforts in the high-rise biomedical buildings. That’s because fire codes place limits on hazardous material storage, and those limits decrease precipitously as you go up floors of the building or down below-grade.
We’ve got an urban campus, and that means we have tall buildings with labs on, say, a 13th floor. We really have to be careful about monitoring what's stored there, and it's very easy for people to forget to scan chemicals out once they remove them from their inventory or put them in when they receive them. So, on a regular schedule, we have a technician go to those buildings and physically rescan the inventory, and then do a reconciliation process in the software.
Not only are we able to clean up the inventory, but we can also look for places where we can educate them on how to better manage the inventories. We can tell them, ‘This number of chemicals weren't barcoded.’ Or, ‘These were never scanned out, and here's instructions for how to do that.’ That's really time intensive, but pretty important.
We can also electronically monitor for things like people forgetting to enter the barcodes, or forgetting to put a location, or odd container sizes. We can say, ‘Hmm, if you have that many kilograms of that substance, that's a problem. Are you sure you didn't put that in wrong?’
Even if the inventory system was thoroughly automated, it wouldn't be perfect. Whenever two systems communicate with one another, you can have problems with information being transferred to the wrong fields of a record or being parsed oddly by the receiving system. Unless we were able to get all companies everywhere to agree with all other companies everywhere on how things should be input, I don't think it would ever be perfect in this world.
This really gets at my next question! What other challenges have you run into while creating and maintaining this inventory?
An inherent challenge is that much of the chemical safety information online pertains to pure chemical substances, and that isn't always what the chemists or researchers are working with. They're trying to enter the name of a product: mixtures, solutions, different salt forms. And trying to associate that with a specific chemical with known hazards is a sticky problem.
I relied a lot on Kimi’s knowledge about how certain changes to the molecule would change its safety profile. For example, is the salt form going to be more hazardous than the neutral form? And in many cases the databases do not enjoy indexing different forms of the same molecule even though they may have different uses and properties. I was sending a lot of emails to Kimi’s team saying, ‘OK, I found all these neutral substances. I have no idea if the salt is going to be hazardous, but...’
This has always been a problem for us over here in the chemistry library. It frustrates some of my colleagues because they want to know the behavior of something that, say, has a positive charge over here, and a negative charge over there. Technically the whole substance is neutral, but the way it is represented gives really important clues as to its behavior.
Reactive chemical hazards are another issue. For example, if you mix two things, are you likely to get a reaction that is hazardous or that could get out of control? And that information historically has not been recorded well.
People like to write in their articles, ‘I mixed this with this and got this,’ because they figure that’s what people want to know. But people also want to know that they are going to be safe while they are making it. It's only relatively recently that people have started to say, ‘Gee, I think I should probably be recording this.’
Judith, when we’ve spoken before, you talked about the importance of not relying solely on an algorithm to conduct searches. Can you extrapolate on that some more?
I like to describe this as degrees of interpretation. Let’s say I’m a chemist. I do my experiment, I make a chemical, I observe it. That leads me to discover its structure, and then I encode that structure in a standardized way, right? That’s one degree of interpretation. Then I publish my paper that has my representation of the information, and it goes to a database where either a human or a machine picks out certain pieces of information from my paper and encodes them for retrieval in the database. So now we have another degree of interpretation.
Then we have the search system, which is coded by someone to work in a particular way using a particular algorithm. Now we have an algorithmic interpretation.
Meanwhile, we've got our researcher over here, and they are very interested in finding something in the search system. They take whatever's in their head and they use whatever tools the database has given them, and they run a search. Then the algorithm grabs that information and it parses it according to its instructions and attempts to match what it thinks it has received with what it sees within the database.
But the researcher won’t always find what they’re looking for. When you’re searching, it's really important to remember that you are searching for a particular string of information that is being interpreted by an algorithm. The algorithm may search for variants of the word that it knows from a dictionary that it's got, but if I don't know what that algorithm is doing, I'm going to have to try to reverse engineer it.
That's something that novice searchers don't do very well. If they do a search and they don't find anything, they might say, ‘There isn't anything.’ Or they do a search, they find something, and they say, ‘Oh great, I found everything.’ That's why we teach a course in our department that is required of all PhD students: because finding chemical information is not easy.
And these are exactly the principles we’re using to maintain the chemical inventory. I guarantee you, because I’ve asked, that none of the other customers of this product have done what Penn is doing to fill this gap. We recognized the problem and we went, ‘Wait a second, I think we're missing a whole bunch of information here.’
Rebecca, I think you can now see why I saw that problem and thought, ‘I know who to call to help me.’