戴万阳

教授 (博导、重要学科岗)
单位:南京大学数学学院
返回:戴万阳中文版主页
量子计算区块链国际工业革命论坛 理事长
江苏大数据区块链与智能信息专委会 主任
江苏省概率 统计学会    理事长
江苏金融科技研究中心 特邀专家
国际 《人工智能、机器学习与数据科学》杂志  主审
国际《无线电工程与技术》 主审
国际《MDPI数学杂志概率统计特刊》 客座主编


机器学习基因突变与密码子优化及蛋白质生成


  • 论文题目



  • 英文摘要

      We conduct gene mutation rate estimations via developing mutual information and Ewens sampling based convolutional neural network (CNN) and machine learning algorithms. More precisely, we develop a systematic methodology through constructing a CNN. Meanwhile, we develop two machine learning algorithms to study protein production with target gene sequences and protein structures. The core of the CNN and machine learning approach is to address a two-stage optimization problem to balance gene mutation rates during protein production. To wit, we try to optimally coordinate the consistency between the given input DNA sequences and the given (or optimally computed) target ones through controlling their intermediate gene mutation rates. The purposes in doing so are aimed to conduct gene editing and protein structure prediction. For example, after the gene mutation rates are estimated, the computing complexity of protein structure prediction will be reduced to a reasonable degree. Our developed CNN numerical optimization scheme consists of two newly designed machine learning algorithms. The stochastic gradients for the two algorithms are designed according to the Kuhn-Tucker conditions with boundary constraints and with the support of Ewens sampling, multi-input multi-output (MIMO) mutual information, and codon optimization techniques. The associated learning rate bounds are explicitly derived from the method and the two algorithms are numerically implemented. The convergence and optimality of the algorithms are mathematically proved. To illustrate the usage of our study, we also conduct a real-world data implementation.


  • 关键词与关键技术

    • Gene mutation rate, convolutional neural network (CNN), machine learning, Ewens sampling, multi-input multi-output (MIMO) mutual information, stochastic gradient, Kuhn-Tucker condition


  • 内容简介

    • 基因突变是生命演化及与环境抗争交融中正常发生的现象,如何预测与控 制好基因突变是人类所关心的。本文中的大数据大模型人工智能机器学习算法 与系统构架研究可用来预测控制基因序与蛋白质结构并为发展有效药物治疗疾 病提供新系统模型与思路,所设计的相关量子加密解码的Quantum-Transformer 为今后的量子人工智能在生物医学工程与生物信息学中应用提供了高效计算平 台方案。另外,在我们的前期论文及IBM, Google, Microsoft等都在发展量子 计算机试图为这方面的研究提供超及算力,从而可以和我该文的研究直接对接。


  • 相关论文

  • 点击这里查看更多相关论文