NURA (curated NUclear Receptor Activity dataset): curated dataset of nuclear receptor modulators. It contains bioactivity annotations for 15,206 molecules and 11 selected Nuclear Receptors (NRs) obtained by integrating and curating data from toxicological and pharmacological databases.
ChemTastesDB: is a database that includes curated information of 2944 molecular tastants. For each molecule the following information is provided: name, PubChem CID, CAS registry number, canonical SMILES string, class taste. The molecular structure in the HyperChem (.hin) format of each chemical is provided. The database is available on Zenodo: https://zenodo.org/record/5747393#.Yhnx8ujMKUk
LC-MS/MS to fingerprints dataset: data to reproduce the models proposed in the following manuscript: Multi-task neural networks and molecular fingerprints to enhance compound identification from LC-MS/MS data, submitted to Molecules (2022) [link]. In this study, deep-learning-based approaches to predict molecular fingerprints and retrieve the structure of unknown compounds from their LC-MS/MS spectra have been developed.
Acute oral toxicity: MATLAB code and data to reproduce the QSAR models proposed in the following manuscript: D. Ballabio, F. Grisoni, V. Consonni, R. Todeschini (2019), Integrated QSAR models to predict acute oral systemic toxicity, Molecular Informatics, 38, 1800124 [link]
AR Binding (CoMPARA project): MATLAB code and data to reproduce the QSAR models proposed in the following manuscript: F. Grisoni, V. Consonni, D. Ballabio, (2019) Machine Learning Consensus to Predict the Binding to the Androgen Receptor within the CoMPARA project, Journal of chemical information and modeling, 59, 1839-1848 [link]
Mechanisms of bioconcentration: QSAR dataset of 779 compounds, 9 molecular descriptors and 3 mechanistic classes of bioconcentration; experimental BCF and KOW values are also provided.
Biodegradation: QSAR data set containing 41 molecular descriptors used to classify 1055 chemicals into 2 classes (ready and not ready biodegradable). The dataset is available at the UCI Data Repository. Details on the dataset can be found in literature: Mansouri, K., Ringsted, T., Ballabio, D., Todeschini, R., Consonni, V. (2013). Journal of Chemical Information and Modeling, 53, 867-878 [link]
Acute aquatic toxicity to Daphnia Magna: QSAR dataset consisting in 546 organic molecules to predict acute aquatic toxicity towards Daphnia Magna.
Acute toxicity to fish: QSAR dataset consisting in 908 organic molecules to predict acute fish toxicity towards Pimephales promelas (Fathead Minnow).
Cytochrome P450 – Drug interaction: QSAR datasets consisting in more than 11,900 drug-like compounds to evaluate the CYP3A4/CYP2C9 – drug interaction.
The CYP3A4 dataset has been used as tutorial data for the chapter: Grisoni, F., Ballabio, D., Todeschini, R., Consonni, V. (2018) Molecular Descriptors for Structure – Activity Applications: A Hands-On Approach. In: Nicolotti O. (eds) Computational Toxicology. Methods in Molecular Biology, vol 1800. Humana Press, New York, NY [link]