Auto Lip-Sync Pada Karakter Virtual 3 Dimensi Menggunakan Blendshape

Matahari Bhakti Nendya, Syahri Mu’min


Proses pembuatan karakter virtual 3D yang dapat berbicara seperti manusia merupakan tantangan tersendiri bagi animator. Problematika yang muncul adalah dibutuhkan waktu lama dalam proses pengerjaan serta kompleksitas dari berbagai macam fonem penyusun kalimat. Teknik auto lip-sync digunakan untuk melakukan pembentukan karakter virtual 3D yang dapat berbicara seperti manusia pada umumnya. Preston blair phoneme series dijadikan acuan sebagai pembentukan viseme dalam karakter. Proses pemecahan fonem dan sinkronisasi audio dalam software 3D menjadi tahapan akhir dalam proses pembentukan auto lip-sync dalam karakter virtual 3D.


Auto Lip-Sync on 3D Virtual Character Using Blendshape. Process of making a 3D virtual character who can speak like humans is a challenge for the animators. The problem that arise is that it takes a long time in the process as well as the complexity of the various phonemes making up sentences. Auto lip-sync technique is used to make the formation of a 3D virtual character who can speak like humans in general. Preston Blair phoneme series used as the reference in forming viseme in character. The phonemes solving process and audio synchronization in 3D software becomes the final stage in the process of auto lip-sync in a 3D virtual character.


lip-sync; blendshapes; karakter virtual 3D; fonem; viseme; animasi

Full Text:



Arai, K., T. Kurihara, & K. Anjyo. 1996. “Bilinear interpolation for facial expression and metamorphosis in real-time animation”. The Visual Computer, Vol. 12, 105–116.

Bergeron, P., & P. Lachapelle. 1985. Controlling facial expression and body movements in the computer generated short ”tony de peltrie”.

Cassell, J., S. Prevost, J. Sullivan, & E. Churchill. 2000. Embodied Conversational Agents. Cambridge, MA: MIT Press.

Choe, B., & H. Ko. 2001. “Analysis and synthesis of facial expressions with hand-generated muscle actuation basis”. IEEE Computer Animation Conference, (pp. 12-19).

Deng, Z., P. Chiang, P. Fox, & U. Neumann. 2006. “Animating blendshape faces by cross-mapping motion capture data”. Proceedings of the 2006 symposium on Interactive 3D graphics and games, (pp. 43-48).

Fisher, C. 1968. “Confusions among visually perceived consonants”. Journal of Speech and Hearing Research (JSHR), 796–800.

Goldschen, A. J., O.N. Garcia, & E. Petajan. 1994. “Continuous optical automatic speech recognition by lipreading”. In Proceedings of the 28th Asilomar Conference on Signals, Systems, and Computers, (pp. 572–577).

Lewis, J., M. Cordner, & N. Fong. 2000. “Pose space deformation”. Proceedings of the 27th annual conference on Computer graphics and interactive techniques (pp. 165-172). SIGGRAPH.

Lewis, J., J. Mooser, Z. Deng, & U. Neumann. 2005. “Reducing blendshape interference by selected motion attenuation”. Proceedings of ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3DG), (pp. 25-29).

Martino, J. M., L.P. Magalhaes, & F. Violaro. 2006. “Facial animation based on context-dependent visemes”. Journal of Computers and Graphics, Vol. 30, No. 6, 971 – 980.

Parke, F. 1974. “A Parametric Model for Human Faces”. Utah: Ph.D. Thesis, University of Utah.

Pighin, F., J. Hecker, D. Lischinski, R. Szeliski, & D. Salesin. 1998. “Synthesizing realistic facial expressions from photographs”. SIGGRAPH Proceedings, (pp. 75-84).

Sera, H., S. Morishma, & D. Terzopoulos. 1996. “Physics-based muscle model for mouth shape control”. IEEE International Workshop on Robot and Human Communication, 207-212.

Serenko, A., N. Bontis, & B. Detlor. 2007. “End-user adoption of animated interface agents in everyday work application”. Behaviour and Information Technology, 119-123.

Sifakis, E., I. Neverov, & R. Fedkiw. 2005. “Automatic determination of facial muscle activations from sparse motion capture marker data”. ACM Trans. Graph 24(3), (pp. 417–425).

Taylor, S., M. Mahler, B. Theobald, & I. Matthews. 2012. “Dynamic units of visual speech”. In ACM/ Eurographics Symposium on Computer Animation (SCA), 275–284.

Walden, B. E., R.A. Prosek, A.A. Montgomery, C.K. Scherr, & C. J. Jones. 1977. “Effects of training on the visual recognition of consonants”. Journal of Speech, Language and Hearing Research (JSLHR), Vol. 20, No. 1, 130–145.

Waters, K., & T. Levergood, T. 1993. Decface: An automatic lip-synchronization.


Article Metrics

Abstract view : 954 times
PDF - 817 times


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License

View Rekam Stats