We present a framework that allows an observer to determine the structure of occluded portions of an assembly by estimating the structure of those occluded portions in a way that is consistent with visible image evidence and world knowledge. Doing this requires determining which portions of the assembly are occluded in the first place. Since each process relies on the other, we determine a solution to both problems in tandem. We extend our framework to determine confidence of one's assessment of which portions of an observed assembly are occluded, and the estimate of the structure of those occluded portions, by determining the sensitivity of one's assessment to potential new observations. We further extend our framework to determine a robotic action whose execution would allow a new observation that would maximally increase one's confidence. The formulation of our framework further allows for the elegant integration of evidence across modalities. We demonstrate such ability through the integration of information from natural-language statements describing the assembly that aid the estimation of its structure and the simultaneous resolution of both visual and linguistic ambiguity.