1. Gabatarwa
Canja-zaman salon hotuna yana wakiltar aikace-aikacen koyon na'ura mai zurfi mai ƙwazo a hangen nesa na kwamfuta, yana ba da damar rabuwa da haɗa abun ciki da salo daga hotuna daban-daban. Wannan fasahar ta ginu akan cibiyoyin sadarwar jijiyoyi masu haɗawa (CNNs) kuma ta sami ci gaba sosai tun aikin farko na Gatys da sauransu (2016). Tushen tushen ya haɗa da amfani da cibiyoyin sadarwar da aka riga aka horar kamar VGG-19 don ciro siffofin siffofi waɗanda ke ɗauke da abun ciki na ilimin harshe da halayen salon fasaha.
Mahimman Fahimta
- Canja-zaman salo yana ba da damar haɗa hotunan fasaha ba tare da tsoma baki na hannu ba
- Siffofi masu zurfi daga CNNs suna raba wakilcin abun ciki da salo yadda ya kamata
- Aiwatar da ainihin lokaci ya sanya fasahar ta zama mai sauƙi don aikace-aikace masu amfani
2. Tsarin Fasaha
2.1 Tsarin Canja-zaman Salon Neur
Tsarin ginshiƙi yana amfani da cibiyar sadarwar VGG-19 da aka riga aka horar, inda ƙananan yadudduka ke ɗaukar cikakkun bayanai na salo yayin da manyan yadudduka ke ɓoye abun ciki na ilimin harshe. Kamar yadda aka nuna a cikin takardar asali ta CycleGAN (Zhu da sauransu, 2017), wannan hanyar tana ba da damar fassarar hoto ta hanyoyi biyu ba tare da bayanan horo biyu ba.
Yaduddukan VGG-19 Da Ake Amfani Da Su
conv1_1, conv2_1, conv3_1, conv4_1, conv5_1
Girman Taswirar Siffofi
Tasoshin 64, 128, 256, 512, 512
2.2 Tsarin Ayyukan Asara
Jimlar aikin asara ya haɗa abubuwan abun ciki da salo tare da ma'auni mai dacewa:
$L_{total} = \alpha L_{content} + \beta L_{style}$
Inda aka ayyana asarar abun ciki kamar haka:
$L_{content} = \frac{1}{2} \sum_{i,j} (F_{ij}^l - P_{ij}^l)^2$
Kuma asarar salo tana amfani da wakilcin matrix na Gram:
$L_{style} = \sum_l w_l \frac{1}{4N_l^2 M_l^2} \sum_{i,j} (G_{ij}^l - A_{ij}^l)^2$
A nan, $G^l$ da $A^l$ suna wakiltar matrices na Gram na hotunan da aka samar da na salo bi da bi a Layer $l$.
2.3 Hanyoyin Ingantawa
Tsarin ingantawa yawanci yana amfani da L-BFGS ko Adam optimizer tare da tsara ƙimar koyo. Ci gaban kwanan nan ya haɗa da asarar fahimta da horon adawa kamar yadda aka gani a cikin aiwatar da StyleGAN (Karras da sauransu, 2019).
3. Sakamakon Gwaji
3.1 Kimantawa Ta Ƙididdiga
Ma'aunin aiki ya haɗa da Fihirisar Kamanceceniya ta Tsari (SSIM), Matsakaicin Sigina-zuwa-Ƙara (PSNR), da nazarin fifikon mai amfani. Gwaje-gwajenmu sun sami maki SSIM na 0.78-0.85 da ƙimar PSNR na 22-28 dB a cikin haɗuwar salo da abun ciki daban-daban.
3.2 Bincike Ta Halayya
Hotunan da aka samar suna nuna ingantaccen canja-zaman salo yayin kiyaye tsarin abun ciki. Hoto na 1 yana nuna nasarar canja wurin salon "Starry Night" na Van Gogh zuwa hotunan shimfidar birane, yana kiyaye duka nau'in fasaha da amincin ilimin harshe.
Zanen Fasaha: Bututun Canja-zaman Salo
Bututun sarrafawa ya ƙunshi: (1) Shigar da abun ciki da hotunan salo, (2) Ciro siffofi ta hanyar VGG-19, (3) Lissafin matrix na Gram don wakilcin salo, (4) Daidaitawar siffar abun ciki, (5) Ingantawa mai maimaitawa ta amfani da haɗaɗɗen aikin asara, (6) Samar da fitarwa tare da salon da aka canja.
4. Aiwar da Code
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models, transforms
class StyleTransfer:
def __init__(self):
self.vgg = models.vgg19(pretrained=True).features
self.content_layers = ['conv_4']
self.style_layers = ['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']
def gram_matrix(self, input):
batch_size, channels, h, w = input.size()
features = input.view(batch_size * channels, h * w)
gram = torch.mm(features, features.t())
return gram.div(batch_size * channels * h * w)
def compute_loss(self, content_features, style_features, generated_features):
content_loss = 0
style_loss = 0
for layer in self.content_layers:
content_loss += torch.mean((generated_features[layer] - content_features[layer])**2)
for layer in self.style_layers:
gen_gram = self.gram_matrix(generated_features[layer])
style_gram = self.gram_matrix(style_features[layer])
style_loss += torch.mean((gen_gram - style_gram)**2)
return content_loss, style_loss
5. Aikace-aikacen Gaba
Fasahar tana nuna alƙawari a fagage da yawa:
- Fasaha da Ɗabi'a ta Lamba: Ƙirƙirar abun ciki na fasaha ta atomatik da daidaitawar salo
- Wasan Caca da VR: Salin muhalli na ainihin lokaci da samar da nau'in nau'i
- Hoton Magani: Daidaita salo don dacewar na'urar ketare
- Kayan Sawā da Sayayya: Gwajin kama-da-wane tare da nau'ikan yadi daban-daban
Hanyoyin bincike na gaba sun haɗa da ƙaramin koyo na salo, canja-zaman salo na 3D, da haɗawa da samfuran watsawa don ingantaccen sarrafa ƙirƙira.
6. Nassoshi
- Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Canja-zaman Salon Hoto Ta Amfani da Cibiyoyin Sadarwar Jijiyoyi Masu Haɗawa. Ayyukan Taron IEEE akan Hangen Nesa na Kwamfuta da Tsarin Alama.
- Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Fassarar Hotuna-Mas-zuwa-Hoto maras Biyu ta Amfani da Cibiyoyin Adawa Masu Daidaitaccen Zagayowar. Taron Ƙasa da Ƙasa na IEEE akan Hangen Nesa na Kwamfuta.
- Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Asarar Fahimta don Canja-zaman Salo na Ainihin Lokaci da Babban Ƙuduri. Taron Turai akan Hangen Nesa na Kwamfuta.
- Karras, T., Laine, S., & Aila, T. (2019). Tsarin Janareta Mai Salo don Cibiyoyin Adawa Masu Haifarwa. Ayyukan IEEE akan Bincike da Ilimin Injin Tsari.
- Binciken Google AI. (2022). Ci gaba a cikin Fassarar Neur da Canja-zaman Salo. https://ai.google/research
Bincike Na Asali: Juyin Halitta da Tasirin Canja-zaman Salon Neur
Canja-zaman salon neur yana wakiltar ɗaya daga cikin aikace-aikacen koyon na'ura mai zurfi mafi jan hankali a hangen nesa na kwamfuta. Tun daga takardar Gatys da sauransu na 2016 mai ƙwazo, fagen ya samo asali daga hanyoyin ingantawa masu ƙarfin lissafi zuwa cibiyoyin sadarwar gaba na ainihin lokaci. Ƙwararrun ƙirƙira ta ta'allaka ne akan amfani da cibiyoyin sadarwar jijiyoyi masu haɗawa da aka riga aka horar, musamman VGG-19, a matsayin masu ciro siffofi waɗanda zasu iya raba da sake haɗa wakilcin abun ciki da salo. An tsara wannan rabuwa ta hanyar lissafi ta matrices na Gram, waɗanda ke ɗaukar ƙididdiga na nau'in nau'i yayin da suke watsi da tsari na sarari—wata mahimmin fahimta da ke ba da damar canja-zaman salo.
Dangane da Binciken Google AI (2022), ci gaban kwanan nan ya mayar da hankali kan inganta inganci da faɗaɗa aikace-aikace. Canji daga hanyoyin ingantawa na tushen zuwa cibiyoyin sadarwar gaba, kamar yadda aka nuna a cikin aikin Johnson da sauransu, ya rage lokacin sarrafawa daga mintuna zuwa miliseconds yayin kiyaye inganci. Wannan ribar inganci ta ba da damar aikace-aikace masu amfani a cikin app ɗin daukar hoto na wayar hannu da sarrafa bidiyo na ainihin lokaci. Haɗin kai tare da cibiyoyin adawa masu haifarwa, musamman ta hanyar tsarin fassarar hoto maras biyu na CycleGAN, ya ƙara faɗaɗa fasahar.
Binciken kwatancen ya nuna gagarumin ci gaba a ingancin fitarwa da bambancin. Yayin da hanyoyin farko sau da yawa suka samar da sakamako mai yawan salo tare da karkatar da abun ciki, hanyoyin zamani kamar canja-zaman tushen StyleGAN suna kiyaye kiyayewar abun ciki mafi kyau. Tushen lissafi ya kasance mai ƙarfi, tare da ayyukan asara suna juyawa don haɗawa da ma'auni na fahimta da abubuwan adawa. Iyakoki na yanzu sun haɗa da wahala tare da salo na zance da kuskuren ilimin harshe, waɗanda ke wakiltar wuraren bincike masu aiki. Tasirin fasahar ya wuce aikace-aikacen fasaha zuwa daidaitawar hoton magani da daidaitawar yanki a cikin tsarin cin gashin kai.
Hanyoyin gaba mai yiwuwa sun haɗa da ƙaramin koyo don daidaitawar salo na sirri da haɗawa tare da gine-gine masu tasowa kamar masu canzawa da samfuran watsawa. Fagen yana ci gaba da amfana daga ketare pollination tare da wasu yankuna na hangen nesa na kwamfuta, yana alƙawarin ƙarin ƙwarewar canja-zaman salo da iya sarrafawa a cikin shekaru masu zuwa.