comment

a thoughtful web.

Good ideas and conversation. No ads, no tracking. Login or Take a Tour!

comment

user-inactivated · 3150 days ago · link · · parent · post: "Why Should I Trust You?": Explaining the Predictions of Any Classifier

The justification engine shouldn't be hard, otherwise it's no good for making explanations. What they're doing is giving criteria for what makes a good simple model of a very complicated model, and using that to learn a member of a particular class of simple model (sparse linear classifiers; draw a few lines, classify points based on what sides of the lines they lie on) to approximate some arbitrary complicated model.

So, say you have a bazillion layer deep learning model that you're using to classify people as terrorists or not terrorists. No one understands the bazillion layer model, not even the authors, they just know that it performs well enough on the testing set. You're just asking your users to trust you when you tell them that Little Timmy's kitten is a threat to national security. Now, you probably couldn't use a simpler model to do the classification, otherwise you would have saved yourself a lot of trouble and a lot of waving your hands at scary guys with crew cuts, but you might be able to approximate it locally, in a way roughly analogous to approximating a complex surface with tangent planes, with a simpler model, and then you get the explainability of the simpler model but the accuracy of the complex model. Then when your algorithm tells the FBI to investigate Little Timmy's kitten, you don't have to shrug and mumble about doing funny things with tensors that have no relation to kittens and terrorists you can can see much less explain, you can use the human-understandable approximation to see that Little Timmy has a chemistry hobby and his parents thought it would be cute to give him some supplies as a present from the cat, and that those chemicals happen to also be useful for making bombs. Then you don't loose the trust of your users, because your algorithm did a stupid thing, but they can see that it was stupid in the idiot savant way computers are stupid and not because it just doesn't work.

edit: so tldr, the clever thing here is using simple models to approximate a complicated model locally, so you can use the complicated model to give you better classifications and the simpler model to give an explanation of why it gave the classification it did, and are justified in explaining the complicated model in terms of the simple one because the simpler model is a good local approximation of the complicated model.

markup tips · 0