High-Throughput Molecular Data Storage Readout Using Reconfigurable Lightweight Hybrid Symbolic Encoding

Maraveni Abhishek; Pabbala Priyanka

doi:10.64751/ijdim.2026.v5.n3.1124

Authors

Maraveni Abhishek Author
Pabbala Priyanka Author

DOI:

https://doi.org/10.64751/ijdim.2026.v5.n3.1124

Keywords:

Deoxyribonucleic Acid (DNA) Data Storage, Molecular Digital Archive (MDA), Hardware-Accelerated Data Readout, Heterogeneous CPU Computing, Hybrid Symbolic Consensus Coding (HSCC), Base Quality-Aware Filtering (BQAF)

Abstract

The exponential growth of global digital data is creating unprecedented storage challenges, with worldwide data generation expected to exceed 180 zettabytes by 2025, while conventional storage media suffer from limited lifespan and increasing energy consumption. Recent studies indicate that Deoxyribonucleic Acid (DNA)-based molecular storage can achieve storage densities exceeding 200 PB per gram and preserve data for hundreds to thousands of years, making it a promising solution for long-term archival applications. However, existing hardware-accelerated DNA data readout platforms rely on fixed primer identification, static index verification, conventional majority voting, and iterative Low Density Parity Checking (LDPC) decoding, resulting in limited adaptability, high memory traffic, increased bandwidth requirements, substantial hardware complexity, elevated power consumption, and longer decoding latency. Furthermore, the absence of early-stage quality-aware filtering allows lowconfidence sequencing reads to propagate through the recovery pipeline, reducing reconstruction accuracy and scalability. To address these limitations, this work proposes a Hardware-Accelerated Data Readout Platform for Molecular Digital Archive (MDA) using heterogeneous CPU computing. The proposed architecture incorporates Primer and Index Reconfigurable Templates (PIRT) for adaptive fragment identification, Base Quality-Aware Filtering (BQAF) for early elimination of unreliable sequencing reads, and On-Chip Lightweight Compression (OLC) to reduce memory traffic and storage overhead. Additionally, a novel Hybrid Symbolic Consensus Coding (HSCC) decoder replaces conventional LDPC decoding by combining symbolic consensus generation with parity-aware correction to efficiently handle substitution, insertion, and deletion errors with lower hardware complexity. Through deep pipelining, FIFO-based decoupling, and parallel hardware execution, the proposed framework achieves improved throughput, reduced latency, enhanced scalability, lower resource utilization, and reliable data reconstruction for next-generation molecular digital archive systems