Zaɓi Harshe

Bincike na Cibiyoyin Sadarwar Masu Gaba da Masu Gaba don Fassarar Hotuna zuwa Hotuna

Cikakken bincike kan tsarin GAN, hanyoyin horarwa, da aikace-aikace a fassarar hotuna, gami da cikakkun bayanai na fasaha, sakamakon gwaji, da alkiblar gaba.
rgbcw.org | PDF Size: 0.4 MB
Kima: 4.5/5
Kimarku
Kun riga kun ƙididdige wannan takarda
Murfin Takardar PDF - Bincike na Cibiyoyin Sadarwar Masu Gaba da Masu Gaba don Fassarar Hotuna zuwa Hotuna

Teburin Abubuwan Ciki

1. Gabatarwa

Cibiyoyin Sadarwar Masu Gaba da Masu Gaba (GANs) sun kawo sauyi mai girma a fagen ƙirƙirar hotuna da sarrafa su. Wannan takarda tana ba da cikakken bincike kan tsarin GAN da aka tsara musamman don ayyukan fassarar hotuna zuwa hotuna. Babban ƙalubalen da ake magana a kai shi ne koyon alaƙa tsakanin nau'ikan hotuna daban-daban guda biyu (misali, hotuna zuwa zane-zane, rana zuwa dare) ba tare da buƙatar bayanan horarwa masu haɗin kai ba, wani ci gaba mai mahimmanci akan hanyoyin da ake kulawa da su na gargajiya.

Binciken ya ƙunshi ra'ayoyin tushe, tsare-tsare shahararrun kamar CycleGAN da Pix2Pix, ƙa'idodin lissafin da ke ƙarƙashinsu, aikin gwaji akan bayanan da aka yi amfani da su don gwaji, da kuma kimanta ƙarfinsu da iyakokinsu. Manufar ita ce ba da cikakkiyar albarkatu ga masu bincike da masu aiki da nufin fahimta, amfani, ko faɗaɗa waɗannan ƙirar ƙirƙira masu ƙarfi.

2. Tushen Cibiyoyin Sadarwar Masu Gaba da Masu Gaba (GANs)

GANs, waɗanda Goodfellow da sauransu suka gabatar a shekara ta 2014, sun ƙunshi cibiyoyin sadarwar jijiyoyi guda biyu—Mai Ƙirƙira (G) da Mai Rarrabe (D)—ana horar da su lokaci guda a cikin wasan gaba.

2.1. Tsarin Gindi

Mai Ƙirƙira yana koyon ƙirƙirar samfuran bayanai masu kama da na gaske daga wani vector na hayaniya ko hoton tushe. Mai Rarrabe yana koyon bambancewa tsakanin samfuran gaske (daga yankin da ake nufi) da samfuran ƙarya da Mai Ƙirƙira ya samar. Wannan gasar tana motsa duka cibiyoyin sadarwar don inganta har sai Mai Ƙirƙira ya samar da sakamako masu gamsarwa sosai.

2.2. Hanyoyin Horarwa

Ana tsara horarwa a matsayin matsalar ingantawa ta minimax. Mai Rarrabe yana nufin ƙara ƙarfinsa na gano ƙarya, yayin da Mai Ƙirƙira ke nufin rage yawan nasarar Mai Rarrabe. Wannan sau da yawa yana haifar da rashin kwanciyar hankali a lokacin horarwa, yana buƙatar hanyoyi masu hankali kamar hukuncin gradient, daidaitawar sauti, da maimaita gogewa.

3. Tsare-tsaren Fassarar Hotuna zuwa Hotuna

Wannan sashe ya ƙunshi cikakkun bayanai game da manyan tsare-tsaren da suka daidaita ainihin ra'ayin GAN don fassara hotuna daga wani yanki zuwa wani.

3.1. Pix2Pix

Pix2Pix (Isola da sauransu, 2017) tsarin GAN mai sharadi (cGAN) ne don fassarar hotuna masu haɗin kai. Yana amfani da tsarin U-Net don mai ƙirƙira da kuma mai rarrabe PatchGAN wanda ke rarraba facin hotunan gida, yana ƙarfafa cikakkun bayanai masu yawan mitar. Yana buƙatar bayanan horarwa masu haɗin kai (misali, taswira da hoton tauraron dan adam da ya dace da ita).

3.2. CycleGAN

CycleGAN (Zhu da sauransu, 2017) yana ba da damar fassarar hotuna zuwa hotuna ba tare da haɗin kai ba. Babban ƙirƙirarsa shine asara daidaitaccen zagaye. Yana amfani da nau'i-nau'i na mai ƙirƙira-mai rarrabe guda biyu: ɗaya don fassara daga yanki X zuwa Y (G, D_Y) ɗayan kuma don fassara komawa daga Y zuwa X (F, D_X). Asarar daidaitaccen zagaye tana tabbatar da cewa fassara hoton sannan a sake komawa da shi yana ba da ainihin hoton: $F(G(x)) ≈ x$ da $G(F(y)) ≈ y$. Wannan ƙuntatawa tana tilasta fassara mai ma'ana ba tare da bayanai masu haɗin kai ba.

3.3. DiscoGAN

DiscoGAN (Kim da sauransu, 2017) tsari ne na zamani mai kama da CycleGAN, wanda kuma aka tsara don fassarar da ba ta haɗu ba ta amfani da asarar sake gini ta hanyoyi biyu. Yana jaddada koyon alaƙar tsakanin yankuna ta hanyar gano wakilcin ɓoyayyen abubuwa da aka raba.

4. Cikakkun Bayanai na Fasaha & Tsarin Lissafi

Asarar gaba don taswira $G: X → Y$ da mai rarrabensa $D_Y$ ita ce:

$\mathcal{L}_{GAN}(G, D_Y, X, Y) = \mathbb{E}_{y\sim p_{data}(y)}[\log D_Y(y)] + \mathbb{E}_{x\sim p_{data}(x)}[\log(1 - D_Y(G(x)))]$

Cikakkiyar manufa don CycleGAN ta haɗa asarar gaba don duka taswirori ($G: X→Y$, $F: Y→X$) da asarar daidaitaccen zagaye:

$\mathcal{L}(G, F, D_X, D_Y) = \mathcal{L}_{GAN}(G, D_Y, X, Y) + \mathcal{L}_{GAN}(F, D_X, Y, X) + \lambda \mathcal{L}_{cyc}(G, F)$

inda $\mathcal{L}_{cyc}(G, F) = \mathbb{E}_{x\sim p_{data}(x)}[||F(G(x)) - x||_1] + \mathbb{E}_{y\sim p_{data}(y)}[||G(F(y)) - y||_1]$ kuma $\lambda$ yana sarrafa mahimmanci na daidaitaccen zagaye.

5. Sakamakon Gwaji & Kimantawa

An gudanar da gwaje-gwaje akan bayanai da yawa don tabbatar da tsare-tsaren.

5.1. Bayanan

5.2. Ma'auni na Ƙididdiga

An auna aikin ta amfani da:

5.3. Babban Binciken

CycleGAN ya yi nasarar fassara dawakai zuwa zebra da akasin haka, yana canza nau'in rubutu yayin da yake adana matsayi da bango. A kan aikin taswirori↔sama, Pix2Pix (tare da bayanan haɗin kai) ya fi CycleGAN a daidaiton matakin pixel, amma CycleGAN ya samar da sakamako masu ma'ana duk da yin amfani da bayanan da ba a haɗa su ba. Asarar daidaitaccen zagaye tana da mahimmanci; ƙirar da aka horar ba tare da ita ba sun kasa adana tsarin abun ciki na shigarwa, sau da yawa suna canza shita ba bisa ka'ida ba.

6. Tsarin Bincike & Nazarin Lamari

Nazarin Lamari: Canja Salon Fasaha tare da CycleGAN

Manufa: Canza hotunan yanayin zamani zuwa salon masu zane-zane na Impressionist (misali, Monet) ba tare da misalan haɗin kai na {hoton, zane} ba.

Aikace-aikacen Tsarin:

  1. Tattara Bayanai: Tattara saiti biyu marasa haɗin kai: Saiti A (zanen Monet da aka tattara daga tarin gidan kayan gargajiya), Saiti B (hotunan yanayi na Flickr).
  2. Tsarin Ƙirar: Ƙaddamar da CycleGAN tare da masu ƙirƙira na tushen ResNet da masu rarrabe PatchGAN 70x70.
  3. Horarwa: Horar da ƙirar tare da asarar haɗe (gaba + daidaitaccen zagaye). Kula da asarar sake ginin zagaye don tabbatar da adana abun ciki.
  4. Kimantawa: Yi amfani da makin FCN don duba ko bishiyoyi, sararin sama, da duwatsu a cikin hoton "salin Monet" da aka samar sun daidaita da ma'ana da shigarwar hoton. Gudanar da nazarin mai amfani don kimanta ingancin salo.

Sakamako: Ƙirar tana koyon amfani da nau'ikan gogewar goge, palettes na launi, da haske na al'ada na Monet yayin da yake riƙe da tsarin ainihin wurin. Wannan yana nuna ikon tsarin na raba "abun ciki" daga "salo" a ko'ina cikin yankuna.

7. Aikace-aikace & Alkiblar Gaba

7.1. Aikace-aikace na Yanzu

7.2. Alkiblar Bincike na Gaba

8. Nassoshi

  1. Goodfellow, I., da sauransu. (2014). Cibiyoyin Sadarwar Masu Gaba da Masu Gaba. Ci gaba a Cibiyoyin Sadarwar Bayanai ta Jijiyoyi (NeurIPS).
  2. Isola, P., da sauransu. (2017). Fassarar Hotuna zuwa Hotuna tare da Cibiyoyin Sadarwar Masu Gaba da Masu Gaba masu Sharadi. Taron IEEE akan Hangon Nesa na Kwamfuta da Tsarin Alama (CVPR).
  3. Zhu, J.-Y., da sauransu. (2017). Fassarar Hotuna zuwa Hotuna marasa Haɗin kai ta amfani da Cibiyoyin Sadarwar Masu Gaba da Masu Gaba masu Daidaitaccen Zagaye. Taron Duniya na IEEE akan Kwamfutar Hangen Nesa (ICCV).
  4. Kim, T., da sauransu. (2017). Koyon Gano Alakar Tsakanin Yankuna tare da Cibiyoyin Sadarwar Masu Gaba da Masu Gaba. Taron Duniya na Koyon Injina (ICML).
  5. Ronneberger, O., da sauransu. (2015). U-Net: Cibiyoyin Sadarwar Convolutional don Rarraba Hotunan Likitanci. Taron Duniya akan Lissafin Hotunan Likitanci da Gudanar da Gudanarwa ta Taimakon Kwamfuta (MICCAI).

9. Binciken Kwararru: Fahimtar Gindi, Tsarin Ma'ana, Ƙarfafawa & Kurakurai, Shawarwari masu Amfani

Fahimtar Gindi: Babban tsalle na CycleGAN da na zamani ba kawai fassarar da ba ta haɗu ba ce—har ma ita ce tsarawa na daidaita yanki ba tare da kulawa ba ta hanyar daidaitaccen zagaye a matsayin fifiko na tsari. Yayin da Pix2Pix ya tabbatar da cewa GANs na iya zama ƙwararrun masu fassara masu kulawa, fagen ya kasance cikin ƙunci saboda ƙarancin bayanan haɗin kai. Hazakar CycleGAN ta kasance cikin gane cewa ga matsalolin da yawa na duniyar gaske, alaƙar tsakanin yankuna kusan bijective ne (doki yana da takwaransa zebra ɗaya, hoton yana da salon zane). Ta hanyar tilasta wannan ta hanyar asarar zagaye $F(G(x)) ≈ x$, ana tilasta ƙirar ta koyon taswira mai ma'ana, mai adana abun ciki maimakon rugujewa ko samar da shirme. Wannan ya sake tsara matsalar daga "koyo daga misalan haɗin kai" zuwa "gano tsarin raba da ke ƙarƙashinsu," tsarin da ya fi dacewa wanda bincike daga Cibiyar Binciken AI ta Berkeley (BAIR) akan koyon wakilci ba tare da kulawa ba ya goyi bayansa.

Tsarin Ma'ana: Ma'anar takardar tana gina inganci daga ƙa'idodin farko. Ta fara da wasan minimax na GAN na tushe, nan da nan tana nuna rashin kwanciyar hankalinsa—babban ƙalubalen. Sannan ta gabatar da GAN mai sharadi (Pix2Pix) a matsayin mafita ga wata matsala daban (bayanai masu haɗin kai), yana shirya matakin don ainihin ƙirƙira. Gabatarwar CycleGAN/DiscoGAN an gabatar da ita a matsayin juyin halitta da ya wajaba don karya dogaro da bayanan haɗin kai, tare da asarar daidaitaccen zagaye an sanya shi cikin kyau a matsayin ƙuntatawa mai ba da dama. Kwararar ta motsa daidai daga ka'ida (tsarin lissafi) zuwa aiki (gwaje-gwaje, ma'auni, nazarin lamari), yana tabbatar da da'awar ra'ayi tare da shaidar gwaji. Wannan yayi daidai da ingantacciyar hanyar da ake samu a cikin manyan wallafe-wallafen taro kamar na ICCV da NeurIPS.

Ƙarfafawa & Kurakurai: Ƙarfin da ya mamaye shi ne kyawawan ra'ayi da amfanin aiki. Ra'ayin daidaitaccen zagaye yana da sauƙi, mai fahimta, kuma yana da tasiri mai ƙarfi, yana buɗe aikace-aikace daga hotunan likitanci zuwa fasaha. Tsare-tsaren sun ƙaddamar da ingantaccen fassarar hotuna. Duk da haka, kurakurai suna da mahimmanci kuma an rubuta su da kyau a cikin wallafe-wallafen da ke biye da su. Na farko, tsammanin bijection sau da yawa ana karya shi. Fassara "tabarau a kan" zuwa "tabarau kashe" ba shi da kyau—yawancin jihohin "kashe" sun dace da jihar "a kan" ɗaya. Wannan yana haifar da asarar bayanai da kayan aikin matsakaici. Na biyu, horarwa har yanzu yana da rashin kwanciyar hankali. Duk da dabarun kamar asarar ainihi, cimma haɗuwa akan sabbin bayanai sau da yawa ya fi alchemy fiye da kimiyya. Na uku, iko yana da iyaka. Kuna samun abin da ƙirar ta ba ku; sarrafa mafi kyau akan halaye na musamman (misali, "sanya motar ja kawai, ba sararin sama ba") ba a goyi bayan asalin ba. Idan aka kwatanta da ƙarin ƙirar watsawa na zamani, GANs don fassarar na iya fuskantar wahala tare da haɗin kai na duniya da cikakkun bayanai masu girma.

Shawarwari masu Amfani: Ga masu aiki, saƙon yana bayyana a sarari: fara da CycleGAN don tabbatar da ra'ayi amma a shirya don ƙaura fiye da shi. Ga kowane sabon aikin, da farko a yi kimantawa sosai idan yankunanku da gaske suna da daidaitaccen zagaye. Idan ba haka ba, duba zuwa sabbin tsare-tsare kamar MUNIT ko DRIT++ waɗanda ke ƙirƙira taswirori na nau'i-nau'i da yawa a sarari. Ku saka hannun jari sosai a cikin tsara bayanai—ingancin saiti marasa haɗin kai yana da mahimmanci. Yi amfani da dabarun daidaitawa na zamani (misali, daga StyleGAN2/3) kamar daidaitawar tsawon hanya da rashin aiki idan ana ƙoƙarin fassarar babban ƙuduri. Don aikace-aikacen masana'antu da ke buƙatar ƙarfi, yi la'akari da hanyoyin haɗin gwiwa waɗanda ke amfani da ƙirar kamar CycleGAN don fassarar m daɗaɗɗen sannan kuma cibiyar sadarwar gyara mai kulawa akan ƙaramin saiti na nau'i-nau'i da aka tsara. Gaba yana kwance ba a watsi da fahimtar daidaitaccen zagaye ba, amma a haɗa shi tare da ƙarin ƙirar ƙirƙira masu bayyanawa, masu kwanciyar hankali, da masu sarrafawa, wani yanayi da aka riga aka gani a cikin sabon bincike daga cibiyoyi kamar MIT CSAIL da Google Research.