In the first stage of face detection, the camera attempts to find "features" in the image - parts that look a bit like eyes, mouths, noses. It does this by examining each position in the image and comparing it to a set of masks, generic eye-like or nose-like pictures. This essentially produces a second, grayscale image, where the brightness of a particular spot is equal to the degree to which it can match a feature. In total, this first stage produces several such images, one for each kind of feature.
In the second stage, the camera again examines each position in the image, but this time looks for a constellation of features centered on that position that make up a face - two eyes, one nose, one mouth, arranged in the right way. This yields yet another grayscale image where the brightness of each spot is equal to the degree to which the camera can find a face-like constellation of features. The image is hence turned into a map of "face-ness" in these two stages.
Finally, a threshold function is applied. Those parts of the image that are considered sufficiently face-like are treated as faces, while the only weakly facelike parts are ignored. This is why you sometimes get erroneous faces: there's a constellation of elements in the image that are just feature-like enough to add up to something that's just face-like enough to make it across the threshold. It's also why cameras struggle with beards, glasses - and sometimes, embarassingly, black people.
Most likely, you've had the experience of seeing a human face out of the corner of your eye that turned out to be nothing than some foliage, or the folds in a towel. The reason is that your own facial recognition systems are not dissimilar from a camera's. A face-like arrangement of feature-like details can trigger the feeling that there is a face there. Faces are important to us, so a face-like perception gets a lot of mental priority, much like a camera's colored rectangle.
This accidental seeing faces that aren't there happens more often at night. This is because the visual information you're processing is much, much poorer. Your eyes adjust to darkness, but not fully. So in order to make sure it doesn't miss any faces, your brain lowers the threshold of face-ness. You could do the same thing to a camera by adjusting its threshold. It would make more mistakes, but it would also catch the occasional real face it would otherwise have missed.
This brings us to altered mental states. Another reason you're more likely to see false faces at night is that you're tired. Your brain is sluggish and less good at processing information, so again, it attempts to compensate by lowering the threshold yet further. This applies to other kinds of perception, too. When I'm very tired, I sometimes get simple auditory hallucinations: I think I hear someone calling my name. There's nothing there, but my tired brain, lowering its threshold to try and compensate, seizes on a random murmur caused by pipes, a car, or a washing machine. It also makes sense that I'm hearing my name: it's very useful and important to hear when someone is calling for you.
If you've ingested certain drugs - which may well be illegal where you're reading this from, but let's just assume you might have - you experienced a stronger version of this. In terms of hallucinations, there's a difference between "full-blown" hallucinations, where you perceive things that have no basis in reality, like in a dream, and milder ones, where what you see remains much the same, but how you see it changes. Users commonly report seeing all kinds of shapes, but especially faces. The actual visual perception, as much as there is such a thing, is unchanged, but you discover face-ness in the random patterns of the wallpaper, of wood, of the clouds, and the leaves.
By adjusting its thresholds enough, you could make your camera hallucinate. Not useful, perhaps, but certainly interesting.