第九届中国运筹学会科学技术奖获奖者专辑

人工智能中的生成式方法: 数学模型、优化算法及其应用

  • 郭田德 ,
  • 幸天驰 ,
  • 韩丛英 ,
  • 孟帅
展开
  • 1. 中国科学院大学数学科学学院, 北京 100049
韩丛英  E-mail: hancy@ucas.ac.cn

收稿日期: 2025-04-21

  网络出版日期: 2025-09-09

基金资助

国家自然科学基金重点项目(12431012);国家自然科学基金重点项目(U23B2012)

版权

运筹学学报编辑部, 2025, 版权所有,未经授权,不得转载。

Generative methods in artificial intelligence: Mathematical models, optimization algorithms and applications

  • Tiande GUO ,
  • Tianchi XING ,
  • Congying HAN ,
  • Shuai MENG
Expand
  • 1. School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China

Received date: 2025-04-21

  Online published: 2025-09-09

Copyright

, 2025, All rights reserved. Unauthorized reproduction is prohibited.

摘要

随着深度学习和神经网络技术的持续发展, 生成式方法在机器学习领域取得了重要突破, 并在多个应用场景中展现出巨大的潜力。本文构建了人工智能生成式方法的统一数学框架, 并系统介绍了其核心技术, 包括变分自编码器(VAE)、生成对抗网络(GAN)、扩散模型和流模型, 同时深入分析了不同方法在各类任务中的优势与局限。进一步地, 本文探讨了人工智能中的生成式方法在数学、物理、生命科学、医学、计算机科学与工程等领域的应用前景。最后, 本文总结了当前人工智能中的生成式方法所面临的关键挑战, 并重点探讨了其在数学与智能优化研究中的未来发展方向。本文期望为相关领域的研究人员和从业者提供有价值的参考与启示。

本文引用格式

郭田德 , 幸天驰 , 韩丛英 , 孟帅 . 人工智能中的生成式方法: 数学模型、优化算法及其应用[J]. 运筹学学报, 2025 , 29(3) : 1 -33 . DOI: 10.15960/j.cnki.issn.1007-6093.2025.03.001

Abstract

With the continuous development of deep learning and neural network technologies, generative methods have made significant breakthroughs in the field of machine learning and have demonstrated immense potential across various application scenarios. This paper constructs a unified mathematical framework for artificial intelligence generative methods and systematically introduces its core technologies, including variational autoencoder (VAE), generative adversarial network (GAN), diffusion model, and flow-based model. Additionally, it provides an in-depth analysis of the advantages and limitations of different methods in various tasks. Furthermore, this paper explores the application prospects of generative methods in artificial intelligence across fields such as mathematics, physics, life sciences, medicine, computer science, and engineering. Finally, the paper summarizes the key challenges faced by generative methods in artificial intelligence and discusses their future development directions in the fields of mathematics and intelligent optimization. This paper aims to provide valuable insights and references for researchers and practitioners in related fields.

参考文献

1 Kingma D P, Welling M. Auto-encoding variational bayes [C]//2nd International Conference on Learning Representations, 2014.
2 Higgins I, Matthey L, Pal A, et al. Beta-VAE: Learning basic visual concepts with a constrained variational framework [C]//5th International Conference on Learning Representations, 2017.
3 Van Den Oord A , Vinyals O , Kavukcuoglu K . Neural discrete representation learning[J]. Advances in Neural Information Processing Systems, 2017, 30, 6309- 6318.
4 Zhao S, Song J, Ermon S. InfoVAE: Information maximizing variational autoencoders [EB/OL]. [2025-02-27]. arXiv: 1706.02262.
5 Goodfellow I J , Pouget-Abadie J , Mirza M , et al. Generative adversarial nets[J]. Advances in Neural Information Processing Systems, 2014, 2, 2672- 2680.
6 Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks [C]//4th International Conference on Learning Representations, 2016.
7 Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks [C]//34th International Conference on Machine Learning, 2017: 214-223.
8 Ho J , Jain A , Abbeel P . Denoising diffusion probabilistic models[J]. Advances in Neural Information Processing Systems, 2020, 6840- 6851.
9 Nichol A Q, Dhariwal P. Improved denoising diffusion probabilistic models [C]//Proceedings of the 38th International Conference on Machine Learning, 2021.
10 Dhariwal P , Nichol A . Diffusion models beat GANs on image synthesis[J]. Advances in Neural Information Processing Systems, 2021, 8780- 8794.
11 Song J, Meng C, Ermon S. Denoising diffusion implicit models [C]//9th International Conference on Learning Representations, 2021.
12 Dinh L, Krueger D, Bengio Y. NICE: Non-linear independent components estimation [EB/OL]. [2025-02-27]. arXiv: 1410.8516.
13 Dinh L, Sohl-Dickstein J, Bengio S. Density estimation using real NVP [C]//5th International Conference on Learning Representations, 2017.
14 Kingma D P , Dhariwal P . Glow: Generative flow with invertible $1\times 1$ convolutions[J]. Advances in Neural Information Processing Systems, 2018, 10236- 10245.
15 Liu Z W , Li M Q , Han C Y , et al. STDNet: Rethinking disentanglement learning with information theory[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 8 (35): 10407- 10421.
16 Zhang Z C, Liu Y L, Han C Y, et al. PetsGAN: Rethinking priors for single image generation [C]//The Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022: 3408-3416.
17 Zhang Z C, Liu Y L, Han C Y, et al. Generalized one-shot domain adaptation of generative adversarial networks [C]//The 36th Conference on Neural Information Processing Systems, 2022, (35): 13718-13730.
18 Zhang Z C, Li B N, Nie X C, et al. Towards consistent video editing with text-to-image diffusion models [C]//The 37th Conference on Neural Information Processing Systems, 2023, (36): 58508-58519.
19 Zhang Z C, Liu Y L, Han C Y, et al. Transforming radiance field with Lipschitz network for photorealistic 3D scene stylization [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2023: 20712-20721.
20 Villani C . Optimal Transport: Old and New[M]. Cham: Springer, 2009.
21 Ho J, Salimans T. Classifier-free diffusion guidance [EB/OL]. [2025-02-27]. arXiv: 2207.12598.
22 郭田徳, 李安琪, 韩丛英. 组合优化问题的机器学习求解方法[J]. 中国科学: 数学, 2025, 55 (2): 451- 480.
23 Liu W Z, Han C Y, Guo T D, et al. Fusion of multi-level information: Solve large-scale traveling salesman problem with an efficient framework [C]//31st International Conference on Neural Information Processing, 2025: 89-103.
24 Li A Q , Guo T D , Han C Y , et al. On the optimal pivot path of simplex method for linear programming based on reinforcement learning[J]. SCIENCE CHINA Mathematics. Special Issue on AI Methods for Optimization Problems, 2024, 6 (67): 1263- 1286.
25 Shi Y C , Han C Y , Guo T D . NeuroPrim: An attention-based model for solving NP-hard spanning tree problems[J]. SCIENCE CHINA Mathematics, 2024, 6 (67): 1359- 1376.
26 Wang C G, Yang Y D, Slumbers O, et al. A game-theoretic approach for improving generalization ability of TSP solvers [C]//ICLR Workshop on Gamification and Multiagent Solutions, 2022.
27 Graikos A , Malkin N , Jojic N , et al. Diffusion models as plug-and-play priors[J]. Advances in Neural Information Processing Systems, 2022, 35, 14715- 14728.
28 Sun Z , Yang Y . DIFUSCO: Graph-based diffusion solvers for combinatorial optimization[J]. Advances in Neural Information Processing Systems, 2023, 36, 3706- 3731.
29 Zhao H, Yu K X, Huang Y H, et al. DISCO: Efficient diffusion solver for large-scale combinatorial optimization problems [EB/OL]. [2025-02-27]. arXiv: 2406.19705.
30 Polu S, Sutskever I. Generative language modeling for automated theorem proving [EB/OL]. [2025-02-27]. arXiv: 2009.03393.
31 Wang M , Deng J . Learning to prove theorems by learning to generate theorems[J]. Advances in Neural Information Processing Systems, 2020, 33, 18146- 18157.
32 Lin Y, Tang S, Lyu B, et al. Goedel-Prover: A frontier model for open-source automated theorem proving [EB/OL]. [2025-02-27]. arXiv: 2502.07640.
33 Simonovsky M, Komodakis N. GraphVAE: Towards generation of small graphs using variational autoencoders [EB/OL]. [2025-02-27]. arXiv: 1802.03480.
34 Bojchevski A, Shchur O, Zugner D, et al. NetGAN: Generating graphs via random walks [C]//International Conference on Machine Learning, 2018: 610-619.
35 Luo T , Mo Z , Pan S J . Fast graph generation via spectral diffusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46, 3496- 3508.
36 Cao N D, Kipf T. MolGAN: An implicit generative model for small molecular graphs [EB/OL]. [2025-02-27]. arXiv: 1805.11973.
37 Xu M, Yu L, Song Y, et al. GeoDiff: A geometric diffusion model for molecular conformation generation [C]//10th International Conference on Learning Representations, 2022.
38 Ingraham J B , Baranov M , Costello Z , et al. Illuminating protein space with a programmable generative model[J]. Nature, 2023, 623, 1070- 1078.
39 Amirrajab S , Lorenz C , Weese J , et al. Pathology synthesis of 3D-consistent cardiac MR images using 2D VAEs and GANs[J]. Machine Learning for Biomedical Imaging, 2023, 2, 288- 311.
40 Corso G, St?rk H, Jing B, et al. DiffDock: Diffusion steps, twists, and turns for molecular docking [C]//11th International Conference on Learning Representations, 2023.
41 Paganini M, de Oliveira L, Nachman B. CaloGAN: Simulating 3D high energy particle showers in multi-layer electromagnetic calorimeters with generative adversarial networks [EB/OL]. [2025-02-27]. arXiv: 1712.10321.
42 Panos B , Kleint L , Voloshynovskiy S . Exploring mutual information between IRIS spectral lines. I. Correlations between spectral lines during solar flares and within the quiet Sun[J]. The Astrophysical Journal, 2021, 912, 121.
43 Cai M X, Lee K L K. $\rho$-Diffusion: A diffusion-based density estimation framework for computational physics [EB/OL]. [2025-02-27]. arXiv: 2312.08153.
44 Nichol A Q, Dhariwal P, Ramesh A, et al. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models [C]//International Conference on Machine Learning, 2022.
45 Ramesh A, Dhariwal P, Nichol A Q, et al. Hierarchical text-conditional image generation with CLIP latents [EB/OL]. [2025-02-27]. arXiv: 2204.06125.
46 Saharia C , Chan W , Saxena S , et al. Photorealistic text-to-image diffusion models with deep language understanding[J]. Advances in Neural Information Processing Systems, 2022, 35, 36479- 36494.
47 Rombach R, Blattmann A, Lorenz D, et al. High-resolution image synthesis with latent diffusion models [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 10684-10695.
48 Wu J , Zhang C , Xue T , et al. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling[J]. Advances in Neural Information Processing Systems, 2016, 82- 90.
49 Groueix T, Fisher M, Kim V G, et al. AtlasNet: A Papier-Maché approach to learning 3D surface generation [EB/OL]. [2025-02-27]. arXiv: 1802.05384.
50 Poole B, Jain A, Barron J T, et al. DreamFusion: Text-to-3D using 2D diffusion [C]//The 11th International Conference on Learning Representations, 2023.
51 Mildenhall B , Srinivasan P P , Tancik M , et al. NeRF: Representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65, 99- 106.
52 Shi Y, Wang P, Ye J, et al. MVDream: Multi-view diffusion for 3D generation [EB/OL]. [2025-02-27]. arXiv: 2308.16512.
53 Tulyakov S, Liu M Y, Yang X, et al. MoCoGAN: Decomposing motion and content for video generation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 1526-1535.
54 Ho J , Salimans T , Gritsenko A , et al. Video diffusion models[J]. Advances in Neural Information Processing Systems, 2022, 35, 8633- 8646.
55 Donahue C, McAuley J, Puckette M. Adversarial audio synthesis [C]//7th International Conference on Learning Representations, 2019.
56 Kumar K , Kumar R , De Boissiere T , et al. MelGAN: Generative adversarial networks for conditional waveform synthesis[J]. Advances in Neural Information Processing Systems, 2019, 14910- 14921.
57 Roberts A, Engel J, Raffel C, et al. A hierarchical latent vector model for learning long-term structure in music [C]//International Conference on Machine Learning, 2018: 4364-4373.
58 Phung H, Dao Q, Tran A. Wavelet diffusion models are fast and scalable image generators [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 10199-10208.
59 Prenger R, Valle R, Catanzaro B. WaveGlow: A flow-based generative network for speech synthesis [C]//ICASSP, 2019: 3617-3621.
60 Yu L, Zhang W, Wang J, et al. SeqGAN: Sequence generative adversarial nets with policy gradient [C]//AAAI Conference on Artificial Intelligence, 2017, 31(1).
61 Bowman S R, Vilnis L, Vinyals O, et al. Generating sentences from a continuous space [C]//Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, 2016.
62 Gong S, Li M, Feng J, et al. DiffuSeq: Sequence to sequence text generation with diffusion models [C]//The 11th International Conference on Learning Representations, 2023.
63 Yuan H, Yuan Z, Tan C, et al. SeqDiffuSeq: Text diffusion with encoder-decoder transformers [EB/OL]. [2025-02-27]. arXiv: 2212.10325.
64 Strudel R, Tallec C, Altché F, et al. Self-conditioned embedding diffusion for text generation [EB/OL]. [2025-02-27]. arXiv: 2211.04236.
65 Reid M, Hellendoorn V J, Neubig G. Diffuser: Discrete diffusion via edit-based reconstruction [EB/OL]. [2025-02-27]. arXiv: 2210.16886.
66 He Z F, Sun T X, Tang Q, et al. DiffusionBERT: Improving generative masked language models with diffusion models [C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023.
67 Ha D, Schmidhuber J. World models [EB/OL]. [2025-02-27]. arXiv: 1803.10122.
68 Chen J Y, Ganguly B, Xu Y, et al. Deep generative models for offline policy learning: Tutorial, survey, and perspectives on future directions [EB/OL]. [2025-02-27]. arXiv: 2402.13777.
69 Hu J, Sun Y, Huang S, et al. Instructed diffuser with temporal condition guidance for offline reinforcement learning [EB/OL]. [2025-02-27]. arXiv: 2306.04875.
文章导航

/