Extraction of Genes and Transcripts Associated with Liver Cancer Using Machine Learning

  • Koshiro Sekine Kyoto Institute of Technology
  • Teruhisa Hochin Kyoto Institute of Technology
  • Hiroki Nomiya Kyoto Institute of Technology
  • Hideki Yoshida Kyoto Institute of Technology
Keywords: Differential Gene Expression, Differential Transcript Expression, Feature Extraction, Machine Learning

Abstract

The rapid development of large-scale genome analysis technology in recent years has facilitated the acquisition of genome data, but it is difficult to extract effective information from a large amount of data. To solve this problem, machine learning has been attracting attention. In this paper, we used machine learning to extract genes and transcripts associated with liver cancer. Liver cancer is difficult to cure completely and is generally treated by surgery, making it difficult to treat elderly people or those with reduced physical strength. As an overview of the method, using the liver cancer dataset of NBDC, genes and transcripts extracted by using statistical hypothesis tests were used as input to the machine learning to create a classifier. Then, genes and transcripts with high contribution rate to the classification were extracted from the classifier. As a result, we obtained genes and transcripts that were considered to be associated with liver cancer from the created classifier. The results of this paper are expected to contribute to the development of gene therapy for liver cancer. In addition, since the method in this paper is not specialized for liver cancer, it can be expected to be applied to other cancers.

References

K. Sekine, T. Hochin, and H. Nomiya, ”Extraction of Genes Associated with Liver Cancer Using Machine Learning,” 2020 9th International Congress on Advanced Applied Informatics (IIAI-AAI), pp. 7-12, 2020.

H. Ide, K. Kanamori, and H. Ohwada, ”Extraction of ncRNA associated with acute lung injury using machine learning,” Proceedings of the Annual Conference of JSAI, vol. JSAI2016, pg. 4J44, 2016. (in Japanese).

R. D´ıaz-Uriarte, Ramon and S. Alvarez de Andr ´ es, ”Gene selection and classi- fication ´ of microarray data using random forest,” BMC Bioinformatics, vol. 7, no. 1, pg. 3, 2006.

O. Okun and P. Helen, ”Random forest for gene expression based cancer classification: Overlooked issues,” In Iberian Conference on Pattern Recognition and Image Analysis, pp. 483–490, 2007.

N.L. Bray, H. Pimentel, P. Melsted, and L. Pachter, ”Near-optimal probabilistic rna-seq quantification,” Nature Biotechnology, vol. 34, no. 5, pp. 525–527, 2016.

H. Pimentel, N.L. Bray, S. Punte, ”Differential analysis of RNA-seq incorporating quantification uncertainty,” Nature Methods, vol. 14, no. 7, pp. 687–690, 2017.

Ensembl, Homosapiens.grch38.cdna.all.fa.gz, 2019. Accessed on March 20, 2021. [Online]. Available: http://ftp.ensembl.org/pub/release-99/fasta/homo sapiens/cdna/Homo sapiens.GRCh38.cdna.all.fa.gz

C. Soneson, M. I. Love, and M. D. Robinson, ”Differential analyses for rna-seq: transcript-level estimates improve gene-level inferences,” F1000Research, vol. 4, pg. 1521, 2016.

A.D. Yates, P. Achuthan, W. Akanni, J. Allen, J. Allen, J. Alvarez-Jarreta, M. R. Amode, I. M. Armean, A. G. Azov, R. Bennett, J. Bhai,K. Billis, S. Boddu, J. C. Marugan, C. Cummins, C. Davidson,K. Dodiya, R. Fatima, A. Gall, C. G. Giron, L. ´ Gil, T. Grego,L. Haggerty, E. Haskell, T. Hourlier, O. G. Izuogu, S. H. Janacek,T. Juettemann, M. Kay, I. Lavidas, T. Le, D. Lemos, J. G. Martinez,T. Maurel, M. McDowall, A. McMahon, S. Mohanan, B. Moore,M. Nuhn, D. N. Oheh, A. Parker, A. Parton, M. Patricio, M. P. Sakthivel, A. I. Abdul Salam, B. M. Schmitt, H. Schuilenburg,D. Sheppard, M. Sycheva, M. Szuba, K. Taylor, A. Thormann, G. Threadgold, A. Vullo, B. Walts, A. Winterbottom, A. Zadissa, M. Chakiachvili, B. Flint, A. Frankish, S. E. Hunt, G. IIsley, M. Kostadima, N. Langridge, J. E. Loveland, F. J. Martin, J. Morales,J. M. Mudge, M. Muffato, E. Perry, M. Ruffier, S. J. Trevanion,F. Cunningham, K. L. Howe, D. R. Zerbino, and P. Flicek, “Ensembl2020,”Nucleic Acids Research, vol. 48, no. D1, pp. D682–D688, 112019. [Online]. Available: https://doi.org/10.1093/nar/gkz966

W. Su, J. Sun, K. Shimizu, and K. Kadota, ”Tcc-gui: a shiny-based application for differential expression analysis of rna-seq count data,” BMC Research Notes, vol. 12, no. 1, pg. 133, 2019.

J. Sun, T. Nishiyama, K. Shimizu, and K. Kadota, ”Tcc: an r package for comparing tag count data with robust normalization strategies,” BMC Bioinformatics, vol. 14, no. 1, pg. 219, 2013.

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, ”Optuna:Anext- generation hyperparameter optimization framework,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631, 2019.

H. Nakagawa, JGAD000229, NBDC Human Database, 2018. Accessed on: March 20, 2021. [Online]. Available: https://humandbs.biosciencedbc.jp/hum0158-v2

L. Luo, L. Chen, K. Ke, B. Zhao, L. Wang, C. Zhang, F. Wang, N. Liao, X. Zheng, and Y. Wang, ”High expression levels of clec4m indicate poor prognosis in patients with hepatocellular carcinoma,” Oncology Letters, vol. 19, no. 3, pp. 1711–1720, 2020.

A. C. Fields, G. Cotsonis, D. Sexton, R. Santoianni, and C. Cohen, ”Survivin expression in hepatocellular carcinoma: correlation with proliferation, prognostic parameters, and outcome,” Modern Pathology, vol. 17, no. 11, pp. 1378–1385, 2004.

Y.-L. Chen, Y.-M. Jeng, C.-N. Chang, H.-J. Lee, H.-C. Hsu, P.-L. Lai, and R.-H. Yuan, ”TERT promoter mutation in resectable hepatocellular carcinomas: A strong association with hepatitis C infection and absence of hepatitis B infection,” International Journal of Surgery, vol. 12, no. 7, pp. 659–665, 2014.

Y. Hou, Z. Wang, S. Huang, C. Sun, J. Zhao, J. Shi, Z. Li, Z. Wang, X. He, N. L. Tam, and L. Wu, ”Ska3 promotes tumor growth by regulating cdk2/p53 phosphorylation in hepatocellular carcinoma,” Cell death & disease, vol. 10, no. 12, pg. 929, 2019.

X. Jin, H. Nagano, K. Sakon, H. Yamamoto, H. Eguchi, A. Kanmoto, K. Kondo, I. Arai, S. Morimoto, K. Dono, S. Nakamori, K. Umeshita, and M. Kadota, ”Clinipathological study on cdc25 expression in hepa- tocellular carcinoma cases,” in Liver, 2000, (in Japanese).

B. Xu, W. LV, X. Li, L. Zhang, and J. Lin, ”Prognostic genes of hepatocellular carcinoma based on gene coexpression network analysis,” Journal of Cellular Biochemistry, vol. 120, 2019.

H. Hu, L. Xu, Y. Chen, S.-J. Luo, Y.-z. Wu, S.-H. Xu, M.-T. Liu, F. Lin, Y. Mei, Q. Yang, Y.-y. Qiang, Y.-w. Lin, Y.-j. Deng, T. Lin, Y.-q. Sha, B.-J. Huang, and S.-J. Zhang, ”The upregulation of trophinin-associated protein (troap) predicts a poor prognosis in hepatocellular carcinoma,” J Cancer, vol. 10, pp. 957–967, 2019.

J. Li, J.-Z. Gao, J.-L. Du, Z.-X. Huang, and L.-X. Wei, ”Increased cdc20 expression is associated with development and progression of hepatocellular carcinoma,” International journal of oncology, vol. 45, no. 4, pp. 1547–1555, 2014.

Y. Zhang, W. Wang, Y. Wang, X. Huang, Z. Zhang, B. Chen, W. Xie, S. Li, S. Shen, and B. Peng, ”NEK2 promotes hepatocellular carcinoma migration and invasion through modulation of the epithelial-mesenchymal transition,” Oncology reports, vol.39, no. 3, pp. 1023–1033, 2018.

W. Dai, H. Miao, S. Fang, T. Fang, N. Chen, and M. Li, ”CDKN3 expression is negatively associated with pathological tumor stage and CDKN3 inhibition promotes cell survival in hepatocellular carcinoma,” Molecular medicine reports, vol. 14, no. 2, pp. 1509–1514, 2016.

F. Liu, Z. Pan, J. Zhang, J. Ni, C. Wang, Z. Wang, F. Gu, W. Dong, W. Zhou, and H. Liu, ”Overexpression of RHEB is associated with metastasis and poor prognosis in hepatocellular carcinoma,” Oncology letters, vol. 15, no. 3, pp. 3838–3845, 2018.

T.-Y. Jung, J.-E. Ryu, M.-M. Jang, S.-Y. Lee, G.-R. Jin, C.-W. Kim, C.-Y. Lee, H. Kim, E. Kim, S. Park, S. Lee, C. Lee, W. Kim, T. Kim, S.-Y. Lee, B.-G. Ju, and H.- S. Kim, ”Naa20, the catalytic subunit of NatB complex, contributes to hepatocellular carcinoma by regulating the LKB1–AMPK–mTOR axis,” Experimental & Molecular Medicine, vol. 52, no. 11, pp. 1831–1844, 2020.

D. W.-H. Ho, A. K.-L. Kai, and I. O.-L. Ng, ”Tcga whole-transcriptome sequencing data reveals significantly dysregulated genes and signaling pathways in hepatocellular carcinoma,” Frontiers of Medicine, vol. 9, no. 3, pp. 322–330, 2015.

Y. Lin, B. Chen, X. Yu, H. Yi, J. Niu, and S. Li,”Suppressed expression of cxcl14 in hepatocellular carcinoma tissues and its reduction in the advanced stage of chronic hbv infection,” Cancer Manag Res., vol. 11, pp. 10435–10443, 2019.

X. Gu, H. Li, L. Sha, Y. Mao, C. Shi, and W. Zhao, ”CELSR3 mRNA expression is increased in hepatocellular carcinoma and indicates poor prognosis,” PeerJ, vol.7, pg.e7816, 2019.

G. Yang, Y. Liang, T. Zheng, R. Song, J. Wang, H. Shi, B. Sun, C. Xie, Y. Li, J. Han, S. Pan, Y. Lan, X. Liu, M. Zhu, Y. Wang, and L. Liu, ”Fcn2 inhibits epithelialmesenchymal transition-induced metastasis of hepatocellular carcinoma via tgf-β/smad signaling,” Cancer Letters, vol. 378, no. 2, pp. 80–86, 2016.

S.-X. Lu, C. Z. Zhang, S.-P. Chen, C.-H. Wang, L. Liu, J. Fu, L. Zhang, H. Wang, D. Xie, and J.-P. Yun, ”Zic2 promotes tumor growth and metastasis via PAK4 in hepatocellular carcinoma,” Cancer letters, vol. 402, pp. 71–80, 2017.

B. Herrera, M. Garc´ıa-Alvaro, S. Cruz, P. Walsh, M. Fern ´ andez, C. Roncero, I. Fab- ´ regat, A. Sanchez, and G. J. Inman, ”Bmp9 is a proliferative and survival factor for ´ human hepatocellular carcinoma cells,” PLOS ONE, vol. 8, no. 7, pp. 1–12, 2013.

N. P. Y. Ho, C. O. N. Leung, T. L. Wong, E. Y. T. Lau, M. M. L. Lei, E. H. K. Mok, H. W. Leung, M. Tong, I. O. L. Ng, J. P. Yun, S. Ma, and T. K. W. Lee, ”The interplay of UBE2T and Mule in regulating Wnt/β-catenin activation to promote hepatocellular carcinoma progression,” Cell Death & Disease, vol. 2021, no. 2, pg. 148, 2021.

Published
2022-03-11
Section
Technical Papers