freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

人工智能分析報告-nvidia:使用深層神經(jīng)網(wǎng)絡(luò)的面部性能捕獲facialperformancecapturewithdeepneuralnetworks(存儲版)

2025-09-01 13:17上一頁面

下一頁面
  

【正文】 must be processed using a conventional capture pipeline. This provides the input/target pairs that are necessary for the work to learn how to perform the mapping from video footage to vert ex positions. According to our experiments, approximately 10 minutes worth of training material is sufficient per actor. (b) For processing the bulk of the material, the conventional capture pipeline can be skipped. Depending on the amount of material, this can yield significant cost savings in productio n. shots. Naturally, some amount of mov ement needs to be allowed, and we achieve this through input data augmentation in the training phase (Section ). In our case, the outputs of the capture pipeline are the perframe positions of the control vertices of a facial mesh, as illustrated in Figure 2. There are various other ways to encode the facial expres sion, including rig parameters or blend shape weights. In the system where our work was developed in, those kind of encodings are introduced in later stages, mainly for pression and rendering purposes, but the primary capture output consists of the positions of approximately 5000 animated vertices on a fixedtopology facial mesh. Existing capture pipeline at Remedy Target data necessary for training the neural work was generated using Remedy Entertainment’s existing capture pipeline based on the mercial DI4D PRO system [Dimensional Imaging 20xx] that employs nine video cameras. The benefit of this system is that it captures the nuanced interactions of the skull, muscles, fascia Input video frame Output Figure 2: Input for the conventional capture pipeline is a set of nine images, whereas our work only uses a cropped portion of the center camera image converted to grayscale. Output of both the conventional capture pipeline and our work consists of the 3D positions of ~5000 animated control vertices for each frame. and skin of an actor so as to bypass plex and expensive f acial rigging and tissue simulations for digital doubles. First an unstructured mesh with texture and optical flow data is cre ated from the images for each frame of a facial performance. A fixedtopology template mesh, created prior to the capture work us ing a separate photogrammetry pipeline, is then projected on to the unstructured mesh and associated with the optical flow. The tem plate mesh is tracked across the performance and any issues are fixed semiautomatically in the DI4DTrack software by a tracking artist. The position and orientation of the head are then stabilized using a few key vertices of the tracking mesh. Finally a point cache of the facial performance is exported for the fixedtopology tem plate mesh. The point cache file contains the positions of each ver tex in the mesh for each frame of animation in the shot. Additional automated deformations are later applied to the point cache to fix the remaining issues. These deformations were not ap plied to the point caches in the training set. For example, the eyelids are deformed to meet the eyeball exactly and to slide slightly with motion of the eyes. Also, opposite vertices of the lips are smoothly brought together to improv e lip contacts when needed. After ani mating the eye directions the results are pressed for runtime use in Remedy’s Northlight engine using 416 facial joints. Pose space deformation is used to augment the facial animation with detailed wrinkle normal map blending. 2 Previous Work While facial performance capture systems e in many varieties, all share a fundamental goal: the nonrigid tracking of the shape of the actor’s head throughout a performance, given inputs in the form of video sequences, RGBD sequences, or other measurements. Once solved for, the moving geometry is often further retargeted onto an existing animation rig for further processing. In this work, we concentrate on the first problem: given a video sequence as in put, our system outputs a timevarying mesh sequence that tracks the performance. There are numerous existing methods for timevarying facial 3D re construction (tracking). Markers drawn at specific locations on the actor’s face enable multiview stereo techniques to find the mark ers’ trajectories in space, and knowledge of their positions on the face geometry allow estimating a deformation for the entire head [Williams 1990]. Markerless techniques, on the other hand, attempt to track the entire face simultaneously, often with the help of a pa rameterized template head model that may include animation pri ors, ., [Zhang et al. 20xx。 however, after our work has learned to mimic the host algorithm, it produces results at a fraction of the cost. While we base our sys tem on a specific mercial solution, the same general idea can be built on top of any f acial motion capture technique taking video inputs. 3 Network Architecture Our input footage is divided into a number of shots, with each shot typically consisting of 100–20xx frames at 30 FPS. Data for each input frame consists of a 1200 1600 pixel image from each of the nine cameras. As explained above, the output is the perframe vertex positions for each of the ~5000 facial mesh vertices, ., ~15000 scalars (= Nout) in total. As the input for the work, we take the 1200 1600 video frame from the central camera, crop it with a fixed rectangle so that the face remains in the picture, and scale the remaining portion to 240 320 resolution. Furthermore, we convert the image to grayscale, resulting in a total of 76800 scalars to be fed to the work. The resolution may seem low, but numerous tests confirmed that increasing it did not improve the results. During the course of the project, we experimented with two neu ral work archit
點擊復制文檔內(nèi)容
公司管理相關(guān)推薦
文庫吧 www.dybbs8.com
備案圖鄂ICP備17016276號-1