tech
@arina_artemis the neural network here is clip
, which was trained to determine how well a given text caption fits a picture
so a picture of an ipod would be captioned "ipod" obviously and it'd learn that that's a good caption, but a picture of text containing the word would also be captioned with the word
therefore the network learns that these are related and assigns them to roughly the same internal neurons
which isn't much of a problem usually, since it's not an image classifier, it just checks if a given text matches with the image
but what they've done here is look at one of the neurons and go "hmm this seems to be firing super often when it sees an ipod" and used that as a classifier
too bad it's actually trained to fire when the word ipod is part of an appropriate caption
that's very long sorry if it sucks as an explanation
@arina_artemis @cdmnky It's probably intended to just do object recognition! It probably got as training data a bunch of images that were supposed to just be pictures of objects. ...but some of the pictures had text. So the system learned that the text was often labels. ...so now it trusts random text it sees, if it matches text it's seen.
Just comes down to the same problem these systems often have - nobody has any idea of how they're doing their classifications, so they just... do things.