nil
1) They're getting effectively nothing above 400Hz. Male speech is 450-650 Hz fundamental and female speech is typically 500-800 Hz fundamental. 2) Their limiting factor is the resolution of the lens, not the resolution of the camera, which is why they've got an extreme telephoto pointed at their bag'o'chips even though it's 6 feet away. The technique is simply not applicable from any kind of useful distance for surveillance. 3) This shit works much better in microwave. Leon Theremin (yes, that theremin) developed a bug for the KGB that had the passive membrane built into a "Great Seal of the United States" (which the US ambassador dutifully hung over his desk). When the KGB beamed microwaves at it, vibration of the membrane altered the capacitance of the dipole circuit, modulating speech over the carrier and allowing the KGB to monitor and record conversations within the embassy from across the street, through walls and everything. That was in 1945. (the cavity opening was the eagle's mouth - droll, no?) Espionage has used laser interferometry to detect vibrations on panes of glass for decades. Thermal glass makes it a lot harder... but a polarizing filter and other trickery gives you much more useful results than camming a bag of chips with your iPhone.
Yeah, you are so right. The Russian Great Seal story is something else. I love this heading: Q. What does "Good Vibrations" by The Beach Boys have in common with the Great Seal's bad vibrations? Back in May 1966, Esquire Magazine was all about "Bugging the Bedroom", Esquire Magazine, May 1966
Wow. That's an amazing article. Did you find that in the link I had above? I'll be honest, I didn't read what I posted; I know about "the thing" from Robert Wallace's book Spycraft which didn't have any pictures, as I did it as an audiobook. Fun fact: the article you linked was written by Nick Pileggi, author and screenwriter of Goodfellas, Casino, City Hall and American Gangster and, from 1987 until her death in 2012, Mr. Nora Ephron.
I liked the martini olive microphone: The link to the Esquire article was in an email discussion ensuing from the MIT article in a mailing list that I happen to be on. ʞɐıuzoʍ ǝʌǝʇs is on this mailing list as well... Anything good that I post usually comes from that crowd.The martini-olive microphone with transmitter has attracted a good deal of attention in the press, was mentioned in a Senate hearing on bugging devices and is de rigueur in spy films. The mike is at the left end of the olive, where the hole should be, and the aerial is concealed inside the toothpick. Actually, this is too expensive ($200), too low-powered and gimmicky for a real pro. Its range is not more than fifty feet, and an operative would need his receiver and tape in the next room. Nevertheless, smart hostesses dispense twists of lemon.
Interesting, if true. A naive application of the described approach (assuming no rolling shutter trickery) would sample one point on the edge of the visual reactor, and interpret the deviation of its position in each frame as a (scalar) amplitude. Clearly under such circumstances Nyquist's Theorem would apply, and the highest frequency that could be captured faithfully would be half the framerate. Doubling that would require getting more data out of each frame, which seems like it would be easy under just the right circumstances but nigh impossible otherwise. One approach would be to sample two visual reactors, yielding two samples per frame with their effective times differing by the amount of time it takes sound to get from one to the other. This would be easy to do, but you would need sample sources at the right relative distances. 54 cm would turn a 300 fps framerate into 600 samples/s. A higher or lower difference in distance between visual reactors and sound source modulo 108 cm (assuming 300 fps) would yield lower-quality results, with times between samples alternating between two different values. You'd want to normalize the two sample sets to the same volume to avoid artifacts at the frequency of their offset.(if you want to reconstruct human speech at 300Hz, you preferably want to capture at 300 fps or higher)
You'd run into serious frame lock problems, too - lossy codecs such as h.264 and (pretty much everything in consumer gear) don't much give a crap about temporal frame length. This doesn't matter when you're recording video with audio as the frame captures both. When you're syncing two systems the footage tends to drift after about five minutes. If you're looking for interframe CMOS roll and comparing two different samples in order to get an interpolated waveform, that shit would have to be locked tight to provide anything useful as the harmonic effects would be cmopletely swamped by the inaccuracies of the frame start and stop times.