Making the Internet a Safer Place with AI

Guest Post by Karina Halevy, Stanford AI4ALL ’16

Changemakers Stories
May 17, 2018

Making the Internet a Safer Place with AI

Header image caption: Karina, presenting Humanly at the Research Fellowship Program Celebration

AI4ALL Editor’s Note: Karina Halevy participated in AI4ALL’s first AI Research Project Fellowship program, where she was paired with a mentor who works in AI to collaborate on an AI research project over the course of 3 months. She explains her project — a system that uses natural language processing to detect cyberbullying in text — and key findings below. Learn more about AI4ALL’s Research Project Fellowship Program and other mentee projects here.

Over the twelve weeks of the AI4ALL mentorship program, I worked with IBM’s CTO of Applied AI to create Humanly, a natural language processing system designed to detect cyberbullying in text.

Above all, the program was a wonderful learning experience. On the technical front, I learned how to implement support vector machines and Naive Bayes algorithms, how to refine default machine learning models by tuning parameters, how to use Python to turn text into vectors, and how to calculate evaluation metrics such as accuracy, precision (macro and micro), and runtime. On the research front, I learned how to glean and synthesize information from technical papers, and how to run experiments, compare results, and prioritize evaluation metrics in context. I also appreciated the experiences of working through a challenging and multifaceted problem while applying technical concepts to tackle a pervasive social issue.

The technical and non-technical guidance I received from my mentor, from AI4ALL, and from the larger network of other mentors and mentees was invaluable.

Using the working definition of cyberbullying as “willful and repeated harm inflicted through the use of computers, cell phones, and other electronic devices,” I experimented with an array of machine learning models for the purpose of creating a program that could detect and address cyberbullying in real time.

As someone who avidly uses social media for personal, academic, and professional communications as well as for informational purposes, I believe that having welcoming, safe, and constructive dialogue platforms is paramount to having fruitful conversations. Trolls, bots, and others who post toxic speech on platforms such as Facebook and Twitter not only undermine this culture of constructive dialogue but also cause harm on a personal level by triggering depression and self-harm, in severe cases.

My model consisted of the following general pipeline: loading a piece of text, turning that text into a numerical vector using a vectorizer, running the vector through a classifier algorithm, and determining if the text was toxic, severely toxic, identity attacking, insulting, threatening, obscene, or any combination thereof.

In my experiment, I tried the support vector machine (SVM) and Naive Bayes algorithms. Within SVMs, I tried the linear, RBF, sigmoid, and polynomial kernels. I also tried different vectorizers: the TFIDF vectorizer, the hashing vectorizer, the count vectorizer, and custom feature engineering, all in Python using scikit-learn. To train my models, I used textual data from the Kaggle Perspective API Challenge, which contained strings of text along with their binary labels for the six bullying categories in question. To compare the performance of these algorithm-vectorizer combinations, I measured each combination’s runtime, accuracy, and precision, recall, and F1 using both macro- and micro-averages. My results indicated that the SVM with the linear kernel (namely LinearSVC) was the quickest and most accurate; within that, the hashing vectorizer and the TFIDF vectorizer had comparable accuracies.

Taking a closer look, I tried a few variations on the LinearSVC algorithms. In addition to the default hashing and TFIDF vectorizers, I tuned the hashing vectorizer parameters to include a Euclidean normalization and tuned the TFIDF vectorizer to eliminate lowercase letter processing and include a sublinear term frequency modifier.

A closer look at Karina’s experiment results

The default hashing vectorizer and the modified TFIDF had similar accuracies, but hashing had a higher precision, while TFIDF had a higher recall. As recall measures the proportion of toxic texts that were recognized by the classifier, I determined that it was more important in this case. Between falsely flagging a clean piece of text and letting a toxic comment slip, the latter is much more dangerous. Therefore, I concluded that the modified TFIDF vectorizer with the LinearSVC classifier was optimal. Using the aforementioned pipeline, I diagrammed my final model.

In sum, the Kaggle training data is first fed into the vectorizer, and the vectorized data and labels are fed into the LinearSVC. Now, with the trained vectorizer and trained classifier, I could pass in a piece of text, vectorize it, and classify it. Watch a snapshot of the classifier in action below.

Humanly text classification demo

Now that I have a basic classifier, I hope to extend and improve my model. First, I hope to try some deeper learning methods such as LSTMs and convolutional neural networks. I also hope to try some more complex vectorizers such as GloVe, Mittens, and Word2Vec.

Ultimately, I would like to integrate my classifier in real-time on common social media platforms such as Facebook and Twitter to make the internet a safer and more welcoming space.

About Karina

Karina Halevy is a Los Altos High School junior, a Stanford AI4ALL 2016 alumna, and the founder of LingHacks — the world’s first high school computational linguistics conference and hackathon. Fusing her fluency in five languages and her penchant for math, she was drawn to computational linguistics at Stanford AI4ALL, when she created a classifier that connected people to resources during natural disasters. Karina also enjoys math modeling, piano, ballet, and scientific and policy research.