Build Instructions
==================

See the online documentation for more detailed instructions:
http://portal.nersc.gov/project/sparse/strumpack/master/index.html
http://portal.nersc.gov/project/sparse/strumpack/v5.1.0/index.html

Or see example_build.sh for examples on how to call CMake.

STRUMPACK uses the CMake build system. It is best to use an up to date
version of CMake, at least CMake 3.11. To build use the following
steps:

> tar -xvzf strumpack-x.y.z.tar.gz
> cd strumpack-x.y.z
> mkdir build
> mkdir install
> cd build
> # see below or in the manual for extra options for CMake
> export METIS_DIR=/path/to/metis/install
> cmake ../ \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=../install \
    -DCMAKE_CXX_COMPILER=<C++ (MPI) compiler> \        # optional, should be detected
    -DCMAKE_C_COMPILER=<C (MPI) compiler> \
    -DCMAKE_Fortran_COMPILER=<Fortran77 (MPI) compiler>
> make -j4
> make install
> # optional
> make examples -j4
> make tests -j4
> make test  # parallel tests might fail on login nodes


This will build STRUMPACK in a folder separate from the source
directory, which is the recommended way of building.

There are a number of dependencies which must be met in order for the
build to succeed. These are:
- C++11, C and Fortran77 compilers.
- BLAS, LAPACK.
- Metis.
- getopt_long.

Optional dependencies:
- MPI, ScaLAPACK (BLACS usually comes with ScaLAPACK)
  Disable by adding -DSTRUMPACK_USE_MPI=OFF
- OpenMP >= 4.5 support in the C++ compiler.
  Disable by adding -DSTRUMPACK_USE_OPENMP=OFF
- ParMetis.
- Scotch and PT-Scotch.
- CUDA, cuBLAS and cuSOLVER
- SLATE.
- CombBLAS.

The (C++/C/Fortran) (MPI) compiler can be specified as follows:
> cmake ../ \
    -DCMAKE_CXX_COMPILER=mpic++ \
    -DCMAKE_C_COMPILER=mpicc \
    -DCMAKE_Fortran_COMPILER=mpif90


BLAS/LAPACK/ScaLAPACK
=====================

CMake will try to find BLAS, LAPACK and ScaLAPACK libraries
automatically. CMake will look in the standard locations, or you can
specify the libraries directly using:
  -DTPL_BLAS_LIBRARIES=".." \
  -DTPL_LAPACK_LIBRARIES=".." \
  -DTPL_SCALAPACK_LIBRARIES=".."
as lists of ;-separated libraries, or directly via the linker flags as
follows:
> cmake ../ \
   -DCMAKE_EXE_LINKER_FLAGS="-L/usr/lib64/mpich/lib/ -lscalapack -lmpiblacs"

or by setting the following environment variable for SCALAPACK:
> export SCALAPACK_DIR=/path/to/scalapack/

In order to get good performance, one should install an optimized or
vendor supplied BLAS implementation. Examples are Intel MKL, Cray
LibSci, AMD ACML, OpenBLAS, IBM ESSL or ATLAS.

On Cray systems, the compiler wrappers usually take care of linking
BLAS/LAPACK and ScaLAPACK. In that case, nothing needs to be set
manually. If CMake still find the wrong libraries, you can set:
  -DTPL_BLAS_LIBRARIES="" \
  -DTPL_LAPACK_LIBRARIES="" \
  -DTPL_SCALAPACK_LIBRARIES="" \
in order to avoid CMake from looking for the libraries.

When using Intel MKL we recommend using the the link advisor:
https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor
Always ask for the 32-bit integer interface layer, check ScaLAPACK and
BLACS.

For instance when the Intel MKL link line advisor returns the
following:

${MKLROOT}/lib/intel64/libmkl_scalapack_lp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_gnu_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group -lgomp -lpthread -lm -ldl

Set this:

-DTPL_SCALAPACK_LIBRARIES="${MKLROOT}/lib/intel64/libmkl_scalapack_lp64.a;-Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_gnu_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group;-lgomp;-lpthread;-lm;-ldl"

i.e., separate the arguments with a ';', where everything between
-Wl,--start-group and -Wl,--end-group is a single argument.


METIS
=====
For Metis, the following environment variable can be set:
> export METIS_DIR=/path/to/metis

METIS can be downloaded and installed as follows:
> wget http://glaros.dtc.umn.edu/gkhome/fetch/sw/metis/metis-5.1.0.tar.gz
> tar -xvzf metis-5.1.0.tar.gz
> cd metis-5.1.0
> make config cc=gcc prefix=`pwd`/install
> make install
> export METIS_DIR=`pwd`/install

STRUMPACK's CMake will then look at the environment variable
$METIS_DIR, in $METIS_DIR/include for metis.h. Likewise, it will look
in $METIS_DIR/lib for libmetis.

Alternatively, one can specify the paths directly when calling CMake
as follows:
> cmake ../ \
    -DTPL_METIS_INCLUDE_DIRS=/usr/local/metis/include \
    -DTPL_METIS_LIBRARIES=/usr/local/metis/lib/libmetis.a


CUDA (optional)
===============
Can be used to accelerate the sparse direct solver, but is not used in
the approximate factorizations.

> cmake ../ -DSTRUMPACK_USE_CUDA=ON

CMake will look for the CUDA compiler and libraries (cuBLAS and
cuSOLVER) in the default location or at CUDAToolkit_ROOT, which can be
set as
> export CUDAToolkit_ROOT=..
or passed directly to CMake
> cmake ../ -DSTRUMPACK_USE_CUDA=ON -DCUDAToolkit_ROOT=...

For full GPU support in the distributed memory sparse direct solver,
one should also compile with support for SLATE, see below.



HIP (optional)
===============
To enable support for HIP in the sparse solver:

> export HIP_DIR=...
> cmake ../ \
      -DSTRUMPACK_USE_HIP=ON \
      -DCMAKE_HIP_ARCHITECTURES=gfx90a \
      -DCMAKE_CXX_COMPILER=hipcc

In the above, adjust the HIP_DIR, the GPU architecture, and the HIP
compiler.

For full GPU support in the distributed memory sparse direct solver,
one should also compile with support for SLATE, with the HIP backend,
see below.



SLATE (optional) for GPU accelerated ScaLAPACK
==============================================
SLATE is a modern ScaLAPACK alternative, bringing support for GPU
acceleration. It can be found at: https://bitbucket.org/icl/slate

Support for SLATE in STRUMPACK can be enabled with for instance:
> export SLATEHOME=$HOME/slate/
> cmake ../ -DTPLE_ENABLE_SLATE=ON \
>   -DTPL_SLATE_INCLUDE_DIRS="$SLATEHOME/include/;$SLATEHOME/blaspp/include;$SLATEHOME/lapackpp/include" \
>   -DTPL_SLATE_LIBRARIES="$SLATEHOME/lib/libslate_scalapack_api.so;$SLATEHOME/lib/libslate.so;$SLATEHOME/blaspp/lib/libblaspp.so;$SLATEHOME/lapackpp/lib/liblapackpp.so"

Or if you simply define SLATE_DIR to point to the SLATE and blas++ and
lapack++ installation directories, STRUMPACK's CMake should find them
and you do not need to specify TPL_SLATE_INCLUDE_DIRS and
TPL_SLATE_LIBRARIES.

Note that SLATE requires MPI_THREAD_MULTIPLE. So you need to
initialize MPI with MPI_Init_thread with the required argument set to
MPI_THREAD_MULTIPLE.



ParMETIS and (PT)Scotch (optional)
==================================
ParMETIS and (PT)Scotch are optional dependencies, enabled by default
unless CMake cannot find them. ParMETIS can be enables/disabled
explicitly by adding the following option to the CMake command:
    -DTPL_ENABLE_PARMETIS=ON
 or
    -DTPL_ENABLE_PARMETIS=OFF
And similarly for (PT-)Scotch:
    -DTPL_ENABLE_SCOTCH=ON
 or
    -DTPL_ENABLE_SCOTCH=OFF

For ParMETIS and PT-Scotch, the following environment variables can be
set:
> export ParMETIS_DIR=/path/to/parmetis
> export SCOTCH_DIR=/path/to/scotch

CMake will then look in $ParMETIS_DIR/include for parmetis.h and in
$SCOTCH_DIR/include for scotch.h and ptscotch.h. Likewise, it will look
in $ParMETIS_DIR/lib and $SCOTCH_DIR/lib for libparmetis, libscotch,
libscotcherr, libptscotch, libptscotcherr.

Or, one can set -DTPL_PARMETIS_INCLUDE_DIRS, -DTPL_PARMETIS_LIBRARIES,
-DTPL_SCOTCH_INCLUDE_DIRS and -DTPL_SCOTCH_LIBRARIES":

    -DTPL_PARMETIS_INCLUDE_DIRS=/path/to/parmetis/include \
    -DTPL_PARMETIS_LIBRARIES=/path/to/parmetis/libparmetis.a

    -DTPL_SCOTCH_INCLUDE_DIRS=/path/to/scotch/include \
    -DTPL_SCOTCH_LIBRARIES="/path/to/ptscotch/libscotch.a;...libscotcherr.a;...libptscotch.a;...libptscotcherr.a"



CombBLAS (optional)
===================
Combinatioral BLAS can be used as an alternative for MC64, in order to
permute the matrix to get nonzeros on the diagonal.

https://people.eecs.berkeley.edu/~aydin/CombBLAS/html/

Enable this feature by adding
  -DTPL_ENABLE_COMBBLAS=ON
to the CMake command. Also set the following environment variables:

  export COMBBLAS_DIR=/path/to/combinatorial-blas-2.0/CombBLAS/install/
  export COMBBLASAPP_DIR=/path/to/combinatorial-blas-2.0/CombBLAS/Applications/

to the appropriate directories for your setup.



ButterflyPACK (optional, needs >= 1.2.0)
========================================
See: https://github.com/liuyangzhuan/ButterflyPACK

ButterflyPACK provides support for Hierarchically Off-Diagonal
Low-Rank and Butterfly matrix compression. ButterflyPACK is written in
Fortran. STRUMPACK provides C++ interfaces to the functionality of
ButterflyPACK and STRUMPACK uses ButterflyPACK in the sparse
preconditioner.

Enable/disable support for ButterflyPACK using:
    -DTPL_ENABLE_BPACK=ON
 or
    -DTPL_ENABLE_BPACK=OFF

And then set
> export ButterflyPACK_DIR=/path/to/butterflypack/

This will find the required libraries and include directories.
Alternatively, add the following to the CMake invocation:
    -DTPL_BPACK_INCLUDE_DIRS="$BPACKHOME/SRC_DOUBLE/;$BPACKHOME/SRC_DOUBLECOMPLEX" \
    -DTPL_BPACK_LIBRARIES="$BPACKHOME/build/SRC_DOUBLE/libdbutterflypack.a;$BPACKHOME/build/SRC_DOUBLECOMPLEX/libzbutterflypack.a"


Usage Instructions
==================
Please see the examples in the examples/ folder.

Suppose you want to build your application, called myapp, which
depends on STRUMPACK, using CMake, then you would add the following
lines to your CMakeLists.txt file:

> cmake_minimum_required(VERSION 3.13)
> project(myapp VERSION 0.1 LANGUAGES CXX)
> find_package(STRUMPACK REQUIRED)
> add_executable(myexe main.cpp)
> target_link_libraries(myexe PRIVATE STRUMPACK::strumpack)


And then invoke CMake with the path to the STRUMPACK installation
folder set:

> export STRUMPACK_DIR=/some/path/STRUMPACK/install
> cd myapp
> mkdir build
> cd build
> cmake ../
> make


How to generate the doxygen documentation
=========================================

After running CMake, just type
> make doc

This requires the doxygen executable is available on you machine.
The documentation will be generated in you build folder in the doc subdir.
