Using LIME to “explain” Snorkel Labeler

If you already read the previous blogs on Snorkel (blog 1, blog 2), and LIME, then nothing much new here.

In the meantime, I got a question from a Singaporean student on how to explain the Snorkel Labeler with LIME.

This blog simply tries to answer the question. As in the previous ones, we do so by playing with a toy-model for credit sentiment in the DataGrapple blogs.

TL;DR Using LIME on the scikit-learn Random Forest in this blog, we saw that it totally overfitted irrelevant words such as tickers and people. In this blog, using LIME on the Snorkel Labeler, we observe that the Snorkel Labeler doesn’t suffer from this problem and that its labeling can be explained by a simple linear model of meaningful words.

%matplotlib inline

import re
import pickle
import numpy as np
import pandas as pd
from pprint import pprint
from scipy import sparse
import sklearn
import sklearn.ensemble
import sklearn.metrics
from sklearn.metrics import confusion_matrix, f1_score
from sklearn.pipeline import make_pipeline
from sklearn.feature_extraction.text import TfidfVectorizer

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

from metal.label_model import LabelModel
from metal.analysis import lf_summary, label_coverage
from metal.label_model.baselines import MajorityLabelVoter

from lime import lime_text

with open('./blogs', 'rb') as file:
    blogs = pickle.load(file)
    
print("We consider for the in-sample", len(blogs), "blogs.")

blogs = [blog['title'].replace('\t', '')
         + ' ' + blog['content']
         .replace('\t', '')
         .replace('\n', '')
         .replace('\r', '')
         for blog in blogs]

/home/gmarti/anaconda3/lib/python3.6/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d


We consider for the in-sample 1238 blogs.

blogs[0]

'That Is A Big Deal In a decently risk-on session (CDX IG -2.8 CDX HY -8.9 SPX @ 2,900), the CDS of Anadarko Petroleum Corp. (APC) outperformed the broader market, tightening by c65bp. Bonds are also 75-100bp tighter. That is because the oil giant Chevron Corp. (CVX) agreed to buy APC. The equity is valued $33B, which will be paid in stocks and cash (75/25: 0.3869 CVX shares and $16.25 in cash per APC share). That is a 39% premium therefore APC share soared towards the offer price (+23% on day). The transaction is expected to close in 2H19. CVX management doesn’t expect any regulatory issues. From a credit standpoint, CVX will assume $15B net debt from APC, making APC EV c$50B. CVX will issue 200M shares and pay $8B in cash. A very tight name, CVX widened 6bp to 33bp mid, making the APC/CVX spread tighten 71bp, from +70bp to -1bp! CVX is not really a story for credit. Indeed, CVX has c$9.4B cash on hand and past experience proves that it generates $8B+ FCF per year at $50-55/bbl crude (vs now WTI $64), so it looks unlikely that they will fund the non-share cash part (c$8B) with debt. And even in the unlikely event it would do that, the combined leverage would be somewhere around 1x. Adding to this point, the news that 1/ CVX expects to realize $2B synergies (proceeds partly used for debt reduction) 2/ CVX plans to sell $15-20bn of assets in 2020-2022 confirms that CVX credit is not in trouble anytime soon. Therefore the consensus expects CVX to keep its current rating (AA/Aa2), while APC will converge to CVX from its Ba1/BBB, although we don’t know if CVX will explicitly guarantee them. CVX aside, this news dragged all the US/Canada IG energy tighter, with Hess -22 Devon -15 Encana -13, partly because the market knew APC was a target and consolidation was expected. This acquisition shows the importance of size in this business, where the biggest and the most diversified players do well. '

ABSTAIN = 0 
POSITIVE = 1
NEGATIVE = 2


def vader_sentiment(text):
    analyzer = SentimentIntensityAnalyzer()
    vs = analyzer.polarity_scores(text)
    if vs['compound'] > 0.8:
        return POSITIVE
    elif vs['compound'] < -0.8:
        return NEGATIVE
    else:
        return ABSTAIN



PERFORMING = r"""\b(\d+bps tighter|tighter by \d+bp|credit spreads tighten across the board|back to the lowest spread level|CDS tightens back|little appetite to bid for single-name protection|stock up|strong performance|spreads tightening|performed|tighter|tighten|beating expectations|best performers|best performing|outperformance|outperforming|outperformer|outperformers)"""
def contains_performing_expressions(text):
    return POSITIVE if re.search(PERFORMING, text) else ABSTAIN

GOOD_RATINGS = r"\b(S&P upgraded|upgrade|upgraded|upgraded by Fitch|upgraded by Moody's)"
def contains_upgrade_expressions(text):
    return POSITIVE if re.search(GOOD_RATINGS, text) else ABSTAIN

GOOD_MOODS = r"\b(reassured credit investors|good short-term option|risk-on|positively in credit|dovish|guidance was positive|good for credit|issues have been pushed back|bullish)"
def contains_good_mood_expressions(text):
    return POSITIVE if re.search(GOOD_MOODS, text) else ABSTAIN

GOOD_LIQUIDITY = r"\b(resolve the liquidity issue)"
def contains_good_liquidity_expressions(text):
    return POSITIVE if re.search(GOOD_LIQUIDITY, text) else ABSTAIN




BAD_MOODS = r"\b(risk-off|tough test|disappointed|continued deterioration|challenging for credit|brutal punishment|hawkish|profit warning|dampen credit outlook|bearish)"
def contains_bad_mood_expressions(text):
    return NEGATIVE if re.search(BAD_MOODS, text) else ABSTAIN

UNDERPERFORMING = r"""\b(cut its profit forecast|stocks fall|higher leverage|shares plunged|widened \d+bp|CDS widened c\d+bp|bonds fell roughly \d+pts|stock got crushed|quarterly profit miss|shares sunk|loses money|risk premium through the roof|stock lost|revenues declined|downtrend in revenue|Q[1-4] results missed|bonds were trashed|defaulted on its debt|survival of the company is under threat|lost its leadership|Q[1-4] sales missed|weaker demand|sales down|stocks declined|bid single-name protection|weakens credit metrics|profit warnings|guidance dropped|missed the estimates|worst-performing|widening|underperformers|widen +\d+bp|under more pressure|curve is inverted|worst performing|CDS is wider|underperforming|underperformed|bonds were down|CDS widen by c\d+bp)"""
def contains_underperforming_expressions(text):
    return NEGATIVE if re.search(UNDERPERFORMING, text) else ABSTAIN

BAD_RATINGS = r"\b(Fitch downgraded|outlook to negative|downgrade|downgraded|outlook at negative)"
def contains_downgrade_expressions(text):
    return NEGATIVE if re.search(BAD_RATINGS, text) else ABSTAIN

FRAUDS = r"\b(money laundering|scandal)"
def contains_fraud_expressions(text):
    return NEGATIVE if re.search(FRAUDS, text) else ABSTAIN

DEFAULTS = r"\b(filed for bankruptcy|chapter 11|filed for creditor protection|continue as a going concern)"
def contains_default_expressions(text):
    return NEGATIVE if re.search(DEFAULTS, text) else ABSTAIN

BAD_MOMENTUM = r"\b(risk premium has tripled)"
def contains_bad_momentum_expressions(text):
    return NEGATIVE if re.search(BAD_MOMENTUM, text) else ABSTAIN

CATASTROPHE = r"\b(devastating impact|struck by hurricane)"
def contains_catastrophe_expressions(text):
    return NEGATIVE if re.search(CATASTROPHE, text) else ABSTAIN


LFs = [
    vader_sentiment,
    contains_performing_expressions,
    contains_upgrade_expressions,
    contains_good_mood_expressions,
    contains_good_liquidity_expressions,
    contains_underperforming_expressions,
    contains_downgrade_expressions,
    contains_bad_mood_expressions,
    contains_fraud_expressions,
    contains_default_expressions,
    contains_bad_momentum_expressions,
    contains_catastrophe_expressions,
]

LF_names = [
    'vader',
    'performing',
    'upgrade',
    'good_mood',
    'good_liquidity',
    'underperforming',
    'downgrade',
    'bad_mood',
    'fraud',
    'default',
    'bad_momentum',
    'catastrophe',
]

def make_Ls_matrix(data, LFs):
    noisy_labels = np.empty((len(data), len(LFs)))
    for i, row in enumerate(data):
        for j, lf in enumerate(LFs):
            noisy_labels[i][j] = lf(row)
    return noisy_labels

with open('labels_for_training_labelling', 'rb') as file:
    labels = pickle.load(file)

LF_matrix = make_Ls_matrix(blogs[:len(labels)], LFs)
Y_LF_set = np.array([labels[i] for i in range(len(labels))])

Ls_train = make_Ls_matrix(blogs, LFs)

label_model = LabelModel(k=2, seed=42)
label_model.train_model(Ls_train,
                        Y_dev=Y_LF_set,
                        n_epochs=1000,
                        lr=0.01,
                        log_train_every=2000)

Computing O...
Estimating \mu...
Finished Training

Ls_train = make_Ls_matrix(blogs, LFs)

Y_train_ps = label_model.predict_proba(Ls_train)
Y_train_ps

array([[0.79361875, 0.20638125],
       [0.32987538, 0.67012462],
       [0.14150953, 0.85849047],
       ...,
       [0.2567571 , 0.7432429 ],
       [0.59824714, 0.40175286],
       [0.7680894 , 0.2319106 ]])

class_names = ['POSITIVE', 'NEGATIVE']

predict_fn = (
    lambda x: label_model.predict_proba(
        make_Ls_matrix(x, LFs)))

from lime.lime_text import LimeTextExplainer

explainer = LimeTextExplainer(class_names=class_names)

indices = [0, 500, 700, 1100, 600, 65, 627, 114, 858,
           50, 52, 757, 190, 769, 47]
for idx in indices:
    exp = explainer.explain_instance(
        blogs[idx],
        predict_fn,
        num_features=10)

    exp.show_in_notebook(text=True)

Conclusion: By inspecting the Snorkel Labeler with LIME, we are not able to find blatant clues of overfitting to irrelevant tokens. Not unexpected as the Snorkel Labeler sees only the text through the heuristics defined in the labeling functions (here, the financial jargon and common vocabulary encoded in VADER). It would be interesting to see how it goes when LIME is applied to a fine-tuned pre-trained end-to-end model such as BERT or XLNET.