Trusted execution environments (TEE) such as Intel’s Software Guard Extension
(SGX) have been widely studied to boost security and privacy protection for the
computation of sensitive data such as human genomics. However, a performance
hurdle is often generated by SGX, especially from the small enclave memory. In
this paper, we propose a new Hybrid Secured Flow framework (called
“HySec-Flow”) for large-scale genomic data analysis using SGX platforms. Here,
the data-intensive computing tasks can be partitioned into independent subtasks
to be deployed into distinct secured and non-secured containers, therefore
allowing for parallel execution while alleviating the limited size of Page
Cache (EPC) memory in each enclave. We illustrate our contributions using a
workflow supporting indexing, alignment, dispatching, and merging the execution
of SGX- enabled containers. We provide details regarding the architecture of
the trusted and untrusted components and the underlying Scorn and Graphene
support as generic shielding execution frameworks to port legacy code. We
thoroughly evaluate the performance of our privacy-preserving reads mapping
algorithm using real human genome sequencing data. The results demonstrate that
the performance is enhanced by partitioning the time-consuming genomic
computation into subtasks compared to the conventional execution of the
data-intensive reads mapping algorithm in an enclave. The proposed HySec-Flow
framework is made available as an open-source and adapted to the data-parallel
computation of other large-scale genomic tasks requiring security and scalable
computational resources.

By admin