Concerning the alignment, you have to build a mechanical part to keep the two cameras tied together. The
Camera support plates of the kit would do. It is important that the cameras do not move relative to each others because you will calibrate one time, and reapply the found parameters on each photograph afterwards.
The stereopi team provides a
tutorial to create depth maps which requires that the two images are aligned. At the end of step 4, before they start to play with depth maps, the script have extracted all the parameters needed to correct the images.
Unfortunately, I did not manage to make it work sufficiently well myself, but I will eventually try again.
In the first prototype of the
stereomaton, we used hugin on the stereopi. We first aligned one stereo pair on the PC with hugin and then copied the .pto project file on the board. We had just to cut the image produced by raspistill with imagemagick, place it where the project expected the two images of the other pair, and export directly with nona (the command line part of hugin that is useful in our case). It was slow [1] because it used the CPU but worked pretty well.
In a second version (actually since today), we used opencv
warpPerspective method. It is far quicker [2] but does not correct the camera barrel distortion. However, this distortion is sufficiently small for the brain to ignore it in most situations. The downside is that you have to find the homography matrices which is also a bit challenging. The way we used was to ask pano_trafo (a tool of hugin) to extract projected coordinates from the calibration project of the previous prototype and then used them (corrected by the right translation factor) in findHomography method of opencv to extract the matrices.
Another downside of the automated method is that you are stuck with the stereo window placement you have fixed in the calibration phase. That is why we added a rope on the stereomaton to indicate the zone where nothing has to be to avoid any
window violation.
This is the big picture of the way we used to automate the process.
[1] About 40s to process an image in the maximum definition of the V1 camera
[2] About 5s to process an image in the maximum definition of the V1 camera