Platforms that need both config and arch def support: XX( 1) Windows/x86: XX a. Works with gfortran/gcc XX b. Works with MSVC++ as interface compiler XX c. Works w/o cygwin lib XX d. Works with ifort as f77 compiler XX ( 2) Solaris x86 ( 3) G4 a. OS X b. Linux ( 4) G5 XX a. OS X b. Linux (no access) ( 5) POWER5 a. AIX (IBM) XX b. Linux (IBM) -- gcc only ( 6) POWER4 XX a. Linux (kate) -- gcc only, poor s/c perf due to lack of PPC asg kernel b. AIX (no access) -- maybe caeser from Kate? ( 7) Most present x86 platforms have only 64-bit arch defs, must decide on policy, and support 32 bits where practical - For non-x86 this is mostly easy: after dinking wt assembly kernels, can use same arch defs for both. For x86, 32 & 64 are different always (diff # of regs), so more of a pain. Old x86 exist only in 32 bit, so that is easy. Modern archs are a pain: need 32 bit for windoze and people who haven't upgraded real OSes. Platforms that need for sure to update arch defs for 4.2: XX ( 8) Solaris/SPARC -- USIV only ( 9) Itanium2 Kernel support needed: XX (10) SPARC assemblies need to support 64 bit ABI -- USIV kernels only! XX (11) Need to write single precision Core2Duo kernel XX (12) Need to try intermixing M-loop for double MIPS kernel -- no win XX (13) Need to write single precision MIPS kernel (22) Need single precision PPC kernel based on dp kernel -- POWER4 has no altivec, and sp perf is 63% of peak, dp is 73% Must be investigated/done: (14) Threaded code perf drops through floor when MaxMalloc is exceeded XX (15) Change atlconf.txt to use names rather than numbers (too fragile) (16) Have ATL_gemm0, called from C & F77 ifaces. This routine has all checks: (a) doing, gemv,ger,axpy,dot, (b) doing syrk, etc. Must be compiled wt. iface somehow to avoid having gemm install die . . . ZZ (17) Initial timings indicate icc can vectorize at least some kernels; need ZZ to try supporting icc on x86. ZZ --> Cannot fathom error behavior of icc, not supporting. YY (18) Historically, icc has dominated gcc on the Itanium for all codes except YY hand-tuned matmul kernels. Need to see if still true, and support icc YY if so. YY -> No access to icc on Itanium ZZ (19) See if cc is worth supporting independently on UltraSPARC, or just use ZZ gcc 4 all the time. ZZ -> No time, blow this off (20) See if we can use USIV arch defs for USIII. XX (21) Need to change it so only interface compiled with ICC (now compiles most XX non-kernel routines). XX(23) Need to scope if xlc can provide better gened perf on POWERx; XX - Seems to be slightly slower for DGEMM on Pwr5, accrding to hand search XX (24) Need to scope how to do prefetch in xlc -- throw -qasm=gcc flag