Although there have been many key advancements in connecting text and perception, computer-generated image captions still lack common sense. As a first step towards constraining these perception mechanisms to commonsense judgment, we have developed reasonableness monitors: a wrapper interface that can explain if the descriptive output of an opaque deep neural network is plausible. These monitor a standalone system that uses careful dependency tracking, commonsense knowledge, and conceptual primitives to explain a perceived scene description to be reasonable or not. If such an explanation cannot be made, it is evidence that something unreasonable has been perceived. The development of reasonableness monitors is work towards generalizing that vision, with the intention of developing a system-construction methodology that enhances robustness at run time by dynamic checking and explaining of the behaviors of scene understanders for reasonableness in context.