Bias analysis, detection and mitigation in deep learning-based language models
I am currently pursuing a PhD in deep learning Biases in Deep Learning, focusing on the development of ethical AI systems and enhancing AI transparency. My research aims to address and mitigate biases in AI models to ensure fairness and equity. Additionally, I am dedicated to improving AI interpretability, striving to move beyond the "black box" nature of deep learning algorithms. By promoting ethical standards and fostering a clear understanding of AI, my work contributes to creating more reliable and trustworthy AI technologies. To achieve these goals, I employ a combination of machine learning, natural language processing, and data visualization techniques. I develop tools and methodologies to detect and mitigate biases in deep learning models, ensuring they operate fairly and transparently.
TFM: Confirmation
bias analysis from massive social media data.
Recent advancements in large language models like GPT-4, Gemini, Llama 3, and Claude 3, have brought these systems closer to replicating human linguistic capabilities. These models, enhanced by vast amounts of data, improved computational resources, and increasingly complex architectures, are now capable of performing tasks that appear to require an understanding of language. However, this "understanding" is superficial, as the models function based on statistical correlations between tokens rather than true comprehension. Despite their usefulness, these models can also encode and perpetuate biases and prejudices, raising concerns about their impartiality. To address this issue, the paper proposes a technique for detecting where biases are embedded within the model's hidden states, focusing on smaller models with the intention of applying these findings to larger models.
Visualization tool developed during the investigation of the bias present in deep learning language models in Spanish. The tool allows us to explore in detail the outcome of the response of the models we present with a set of template sentences, allowing us to compare the behavior of the models when the templates are presented with a context that alludes to a man or a woman. The exploration of the data in the tool is performed at various levels of detail, from visualizing the model output itself with its weights to visualizing the aggregation of the results by categories.
Recent advances in artificial intelligence have made it possible to make our everyday lives better. However, these models capture the biases present in society and incorporate them into their knowledge. Model will have vastly differents results depending on attributes such as the subject’s gender, race or religion. Bias in AI is encompassed in study areas such as Fairness or Explainability.
Deep neural networks are hegemonic approaches to many machine learning areas, including natural language processing (NLP). These networks, somehow, learn a probability distribution of words and relations across the training collection used, inheriting the potential flaws, inconsistencies and biases contained in such a collection. As pre-trained models have been found to be very useful approaches to transfer learning, dealing with bias has become a relevant issue in this new scenario. We introduce bias in a formal way and explore how it has been treated in several networks, in terms of detection and correction.