Kandelia: CAPTCHA

Tuesday, March 20, 2007

Image Spam & CAPTCHA

前幾天, 黃老師用 Skype 傳來一個 DigiTimes 的新聞網址, 內容和小和的研究主題 CAPTCHA 相關, 如下:

Image Spam 全新攻擊手法再進化！
運用 Captcha 技術避開 OCR 郵件過濾機制

賽門鐵克日前發表 2007 年 1 月份垃圾郵件安全研究報告，同時也是該公司第一份專門針對垃圾郵件使用狀況的調查報告。依據該報告指出，2006 年 12 月的垃圾郵件發展狀況，大致上依舊沿續著前幾個月分的主要趨勢走向，例如圖像式垃圾郵件（Image Spam）仍然是當前盛行的主要攻擊手法之一，其數量約占所有垃圾郵件的 35％；而 2006 年 12 月的垃圾郵件比例，依舊居高不下，約占所有電子郵件寄送流量的 80％。

不過，2006 年 12 月，仍有 2 個專門逃避垃圾郵件過濾機制的全新攻擊手法及趨勢，其一為網路上出現變相採用現行網路上十分常見的 Captcha 技術，來逃避以 OCR 光學辨識為基礎之垃圾郵件過濾機制的偵測。當前許多討論群組、部落格或共享軟體（Shareware）下載服務網站，為了防止有心人透過機器人自動程式進行大量註冊帳號或濫發廣告留言，多半會採用 Captcha 技術來防範，該技術是透過圖形函式庫，產生 1 組機器人自動程式所無法正常判讀的圖形式認證碼，使用者僅能依靠肉眼來判讀並輸入，才可順利登錄下載或留言；如今，該技術卻被垃圾郵件駭客運用在 Image Spam 的攻擊上。

除此之外，另一個值得關注的新趨勢，則有愈來愈多的垃圾郵件，是透過植入合法官方通訊、電子報或廣告郵件等形式及手法來展開攻擊，此一手法可以避開垃圾郵件過濾機制當中常見的特徵比對法的偵測，以達到順利散發這類型垃圾郵件的目的。

再者，賽門鐵克並透過自家誘捕網路，收集、分析 12 月份各類型郵件的比重狀況，其中健康訊息類的垃圾郵件，即高達所有垃圾郵件的 27％、財經訊息類則占 26％、產品訊息類約 23％、網路行銷類 10％、休閒類占 4％，詐編類 4％、釣魚郵件 3％、成人色情 3％。

Saturday, September 16, 2006

Human Interaction Proofs

昨天上午與學生 meeting, 由於小和的實驗上沒有具體進度, 所以我們就針對微軟所發表的一篇論文討論。這篇論文的首位作者是 Kumar Chellapilla, 任職於 Microsoft Research, 研究主題之一就是 Human Interaction Proof, 至今一共發表了四篇論文, 昨天我們討論的論文是:
Building Segmentation Based Human-friendly Human Interaction Proofs (HIPs)

所謂的 Human Interaction Proof (HIP) 就是在 Internet 上, 用來證明 (prove) 某個互動 (interaction) 是由人 (human) 所發出的, 而不是由程式自動發出的。

大家要在回應文章時, 如果沒有登入自己的帳號, 系統會要求你輸入某個圖片中的文數字, 當你輸入正確時, 就代表系統認為這個回應是人發出的, 而不是網路電腦程式自動發出的。這樣做的目的就是希望阻止一些廣告或惡意癱瘓系統的程式利用大量回應的方式, 達到廣告或癱瘓系統的目的。

這篇論文的摘要寫得非常好, 完全符合科學論文摘要的要求, 沒有多寫一句話, 也沒有少寫一句話, 重點都寫出來了, 把 HIP 的重點與考量都交代的一清二楚, 同學寫論文時, 可以參考。

Abstract
Human interaction proofs (HIPs) have become common place on the internet due to their effectiveness in deterring automated abuse of online services intended for humans. However, there is a co-evolutionary arms race in progress and these proofs are becoming more difficult for genuine users while attackers are getting better at breaking existing HIPs. We studied various popular HIPs on the internet to understand their strength and human friendliness. To determine HIP strength, we adopted a direct approach of building computer attacks using image processing and machine learning techniques. To understand human-friendliness, a sequence of users studies were conducted to investigate HIP character recognition by humans under a variety of visual distortions and clutter commonly employed in reading-based HIPs. We found that many of the online HIPs are pure recognition tasks that can be easily broken using machine learning. The stronger HIPs tend to pose a combination of segmentation and recognition challenges. Further, the HIP user studies show that given correct segmentation, computers are much better at HIP character recognition than humans.

摘要的第一句話說明HIPs 變得越來越常見的原因。
Human interaction proofs (HIPs) have become common place on the internet due to their effectiveness in deterring automated abuse of online services intended for humans.

第二句, 用 however 來表達話風一轉的語氣, 馬上指出 HIPs 所面臨的困難, 這句話也引出作者想要解決的問題。
However, there is a co-evolutionary arms race in progress and these proofs are becoming more difficult for genuine users while attackers are getting better at breaking existing HIPs.

第三句話說明了作者藉由研究許多 HIPs 系統, 了解到一個好的 HIPs 需要同時去考量 security strength 與 human friendliness 兩個因素。
We studied various popular HIPs on the internet to understand their strength and human friendliness.

接下來都是用不定詞 To 所引領的兩句話, 就是分別說明作者是如何考量這兩個因素。

To determine HIP strength, we adopted a direct approach of building computer attacks using image processing and machine learning techniques.

To understand human-friendliness, a sequence of users studies were conducted to investigate HIP character recognition by humans under a variety of visual distortions and clutter commonly employed in reading-based HIPs.

然後, 作者開始說明研究成果為何。
We found that many of the online HIPs are pure recognition tasks that can be easily broken using machine learning.
The stronger HIPs tend to pose a combination of segmentation and recognition challenges.
Further, the HIP user studies show that given correct segmentation, computers are much better at HIP character recognition than humans.