Skip to Content

ainterpo3ds.f segmentation fault

4 replies [Last post]
rdrighetto
Offline
Joined: 5 Aug 2015

Hi,

I am getting a segmentation fault error originating in function AINTERPO3DS (v9.11 151031). Backtracing gives:

Final lines:

frealign_v9.exe 0000000001ABC9DF Unknown Unknown Unknown
frealign_v9.exe 0000000001AC03FD Unknown Unknown Unknown
frealign_v9.exe 0000000001B731C0 Unknown Unknown Unknown
frealign_v9.exe 0000000000451A76 ainterpo3ds_ 88 ainterpo3ds.f
frealign_v9.exe 00000000004461F9 presb_ 104 presb.f
frealign_v9.exe 000000000043BAB2 lmain_ 812 lmain.f
frealign_v9.exe 000000000040643E MAIN__ 693 frealign_v9.f
frealign_v9.exe 00000000004005BE Unknown Unknown Unknown
frealign_v9.exe 0000000001B74A3B Unknown Unknown Unknown
frealign_v9.exe 0000000000400429 Unknown Unknown Unknown

Line 88 of ainterpo3ds.f is:

CBUF = A3DF(ID)

But sometimes the segfault originates in line 83:

Final lines:

frealign_v9.exe 0000000001B05346 Unknown Unknown Unknown
frealign_v9.exe 0000000001B087A6 Unknown Unknown Unknown
frealign_v9.exe 0000000001BD1020 Unknown Unknown Unknown
frealign_v9.exe 000000000044A6A8 ainterpo3ds_ 83 ainterpo3ds.f
frealign_v9.exe 000000000043F240 presb_ 104 presb.f
frealign_v9.exe 000000000043601B lmain_ 812 lmain.f
frealign_v9.exe 0000000000407513 MAIN__ 693 frealign_v9.f
frealign_v9.exe 00000000004005EE Unknown Unknown Unknown
frealign_v9.exe 0000000001BD289B Unknown Unknown Unknown
frealign_v9.exe 00000000004004A9 Unknown Unknown Unknown

Line 83 of ainterpo3ds.f is:

RBUF = A1 * A2 * A3

Some additional info that I could gather:

-It happens for specific particles in specific cycles for specific references; these particles are OK because refinement/reconstruction of this dataset works under other conditions; also, previous cycles of the same refinement ran without problems;

-It happens for these same particles in the same cycles regardless of whether I'm refining the alignment parameters or just doing classification (i.e. if PMASK is 0 0 0 0 0)

-If I change slightly the alignment and classification resolution limits (0.1 A higher or lower, or 1 resolution shell higher or lower in this case) the error doesn't happen, at least not until the cycles I observed (am I too unfortunate in my choice of resolution limits??)

-Also, if I compile with -O0 the error seems to go away (but for -O1, -O2 and -O3 it happens)

-Finally, the error does not happen with the distributed binaries, until the cycles I observed (btw, which Makefile is used to generate them?)
I am compiling myself because I modified other parts of the code for my project, but nothing related to ainterpo3ds.f. I also tried compiling the original source code without any modification and got the error. Also tried both Intel and GNU makefiles.

Could this be a bug? Any other hint?

Thanks for helping.

niko
Offline
Joined: 5 May 2009
Thanks for your careful

Thanks for your careful testing and documentation! Yes, this looks like a bug. Maybe some variable does not get initialized correctly and then, depending on the compiler, this can lead to unreasonable values. Alternatively, maybe rounding errors sometimes lead to out-of-bounds addresses. If you would like to test if an out-of-bounds error occurs, you could check that the index ID is larger than 0 and never exceeds NSAM*IPAD/2*NSAM*IPAD*(NSAM*IPAD+2). If is outside of this range, skip CBUF = A3DF(ID) and set AINTERPO3DS = 0.0 before returning from the AINTERPO3DS subroutine.

The makefiles used to compile the binaries are Makefile_linux_amd64_pgi_static and Makefile_linux_amd64_pgi_mp_static.

rdrighetto
Offline
Joined: 5 Aug 2015
Thanks a lot for the quick

Thanks a lot for the quick and detailed reply, Niko.
I will first verify if the error persists using PGI to see the behavior of different compilers, and then implement the additional checks. I'll post the results here.

rdrighetto
Offline
Joined: 5 Aug 2015
PGI compiler prevents crash

I confirm that the PGI compiler is somehow able to prevent this segmentation fault.

For the other compilers (I tested Intel only) I tried to patch ainterpo3ds.f as suggested, but it resulted in .par files with alignment parameters all equal to 0.0, also many ***** and NaN values.
The patched code can be downloaded here:
https://drive.switch.ch/index.php/s/f6xWrRS8FhVragY
Maybe I did something wrong.

niko
Offline
Joined: 5 May 2009
Thanks for trying. I am glad

Thanks for trying. I am glad it works with the PGI compiler.