The eyepiece of a clip-on (or any thermal scope for that matter), projects an image out. In order for a clip-on to be 1x, it needs to project a field of view that is exactly the same angle as the FOV it sees. That is how magnification is defined: AFOV divided by FOV, with AFOV being Apparent FIeld of View.
In a digital optic like a thermal, the AFOV is defined by the size of the microdispaly and the specifics of the optics in the eyepiece.
In many modern clip-ons, there are more pixels in the microdisplay than in the image sensor.
In order to get the magnification to exactly 1x, you constrain the image taken in by the objective to a smaller portion of the microdisplay (something called windowing).
That way, you can decrease the projected AFOV to exactly match the incoming FOV.
By changing where exactly that "window" is positioned in the microdisplay, you can also do the alignment (collimation).
That is why Risley prisms are an unnecessary complication in most modern clip-ons.
Risley prisms are necessary in analog devices like Night Vision clip-ons that are not digitized in any sort of way.
ILya