夏天協奏曲 11月27日全省上映
Archive for 九月, 2009
夏天協奏曲 11月27日全省上映
九月 28th, 2009The best choice to grab data from websites: Python + Twisted + lxml
九月 26th, 2009This is a translation of my article 抓取網頁的最佳語言 : Python written in chinese
At first
At first, I used C/C++ to write programs for grabbing data from websites. I tried to write a library for these tasks, but I realized that it’s not easy to implement a HTTP client library. Then, I used cUrl library for downloading pages, but even with cUrl, it’s not productive. I had to to modify program frequently, the compiling time is costly. There was also no regular expression for C/C++. I also had to deal with many annoying details like memory management, string handling.
Then
After that, I was wondering, C/C++ is not a nice choice to grab data from websites. Why do I have to handle so many details? Why don’t I just use script language or other language? At first I was worrying about the performance, and then I realized that the performance of language is not the bottleneck. What’s more? I can get much more benefits if I use script language, it is easier to develop and debug. So I decided to find another solution for grabbing data from websites.
How about Perl?
Long time ago, I used Perl to write CGI programs, like guest-book, website managing system and so on. That said, Perl is a 『write-once』 language. Lots of Perl programs are full filled with short syntax and symbols. It is really difficult to read. And it is not easy to modularize Perl programs. It doesn’t support OO well. And there is no more new version of Perl. Even the new Perl is under construction, but it takes too long time, I still think it is almost dead. For these reasons and personal feeling, I don’t like Perl.
PHP
As a popular programming language designed for websites, I don’t think it is suitable to use in other situations. And although it is popular, it is really a bad designed language. It is also not an easy job to modularize PHP programs, it doesn’t support OO well, too. The name-space is also a big problem, there are so many function looks like mysql_xxxx, mysql_oooo. But even such a bad language got its advantage. That is: popular, popular and popular. Some one said that:
PHP is the BASIC of the 21st century
Well, what ever, PHP is out.
Lua
Lua is a light weight script language, almost everything about design of Lua is for performance. I wanted to warp C/C++ library for Lua, but there is also lots of weakness of Lua. It is not easy to modularize, too. And almost everything in Lua is designed for performance, its syntax is not so friendly. What’s more, there are little resources for Lua, I might have to build everything I need. So Lua is not on the list.
Java
Java is a language grows with Internet, it is absolutely qualified. But, I don’t like it because it is too verbose. And what’s more, it is too fat! I want to throw my laptop that has only 256MB RAM out the window when I am running Eclipse on it. I’m sorry, I don’t like Java. The guy I mentioned in PHP, also said that:
Java is the COBOL of the 21st century
Python
Finally, I postdd questions on PTT, then one recommend Python. Well, Python? WTF? I have never heard that before. And I searched it and ask some questions. Then I found that it is exactly what I want! It can be extended easily. If I need performance, I can write module in C for Python. And there are so many resources to use. You can find almost any Python libraries that you can imagine. Also, those libraries are easy to install, you can type 『easy_install』 to install almost everything you want. Most of script languages are not suitable for big program, but Python is not the one among them, it is easy to modularize, and it supports OO well. What else, it is really easy to read and write. There are also lots of big guy use Python, like Google, YouTube and so on. When I decide to learn Python, I buy a Learning Python and start my journey with Python.
Fall in love with Python
It did’t let me feel disappointed. It is very productive to develop with Python. I wrote almost everything that I did in C/C++ before. But for grabbing data from websites, there is still lots of work to do.
Twisted
It is really a piece of cake for Python to get a web page. There are standard modules, urllib and urllib2. But they are not good enough. Then, I find Twisted.
Twisted is an event-driven networking engine written in Python and licensed under the MIT license.
It is very powerful. It has beautiful callback design for handling async operations named deferred. You can write one line to grab a page:
getPage("http://www.google.com").addCallback(printPage)
You can also use its deferred to handle data
d = getPage("http://www.google.com") d.addCallback(parseHtml) d.addCallback(extractData) d.addCallback(saveResult)
What’s more, I wrote an auto-retry function for twisted to retry any async function automatically, you can read An auto-retry recipe for Twisted.
Beautifulsoup
It is not a difficult job to get page from a website. Parsing html is a much more difficult job. There are standard modules of Python, but they are too simple. The biggest trouble of parsing html is: there are so many websites don’t follow the standard of html or xhtml. You can see lots of syntax error in those pages. It makes parsing become a difficult job. So I need an html parser that can deal wrong html syntax well. Then, here comes BeautifulSoup, it is an html parser written in Python, it can handle wrong html syntax well. But there is a problem, it is not efficient. For example, you want to find a specific tag, then you write:
soup.find('div', dict(id='content'))
It is okay when you do this in a small page. But it is a big problem if you do that in a big page, its tag finding method is very very slow. At first, I expect the bottleneck will be on network, but with beautifulsoup, the bottleneck is on parsing and finding tags. You can notice that when you run your spider, the CPU usage rate is 100% all the time. I run profile for my program, most of the time of running are in soup.find. For performance reason, I have to find another solution.
lxml
Then, I find a nice article: Python HTML Parser Performance, it shows comparison of performance of different Python html parsers. The most impressive one is lxml. At first, I am worrying about that is it difficult to find target tags with lxml. And I notice that it provides xpath! It is much easier to write xpath then find methods of beautifulsoup. And it is also much more efficient to use lxml to parse and find target tags. Here are some real life example I wrote:
def getNextPageLink(self, tree): """Get next page link @param tree: tree to get link @return: Return url of next page, if there is no next page, return None """ paging = tree.xpath("//span[@class='paging']") if paging: links = paging[0].xpath("./a[(text(), '%s')]" % self.localText['next']) if links: return str(links[0].get('href')) return None
listPrice = tree.xpath("//*[@class='priceBlockLabel']/following-sibling::*") if listPrice: detail['listPrice'] = self.stripMoney(listPrice[0].text)
With beautifulsoup, I have to write logic in Python to find target tags. With lxml, I write almost all logic in xpath, it is much easier to write.
Useful FireFox tool
With xpath, it is not a difficult job to find target tags. But it would be wonderful if you can try xpath on websites, right? I find there are some plugins of FireFox are very useful for writing spiders. Here are some useful tools for analysis:
Example
I wrote an example to show how it looks like.
# -*- coding: utf8 -*- import cStringIO as StringIO from twisted.internet import reactor from twisted.web.client import getPage from twisted.python.util import println from lxml import etree def parseHtml(html): parser = etree.HTMLParser(encoding='utf8') tree = etree.parse(StringIO.StringIO(html), parser) return tree def extractTitle(tree): titleText = unicode(tree.xpath("//title/text()")[0]) return titleText d = getPage('http://www.google.com') d.addCallback(parseHtml) d.addCallback(extraTitle) d.addBoth(println) reactor.run()
This is a very simple program, it grabs title of google.com and prints it out. Very elegance, isn’t it? :D
Conclusion
One year has been passed since I wrote this article in Chinese. Today, I still use Python + Twited + lxml for grabbing data from websites. You might not agree what I said, but they are best tool to write spider (crawler or whatever) for me.
An auto-retry recipe for Twisted.
九月 26th, 2009You can find chinese version of this article here : Deferred應用的實例 : retry
–
For grabbing pages from websites with twisted, you can write this:
getPage("http://www.google.com").addCallback(printPage)
But what about the HTTP server is too busy to response? Your getPage fails. For reasons, you have to modify your program to retry once the getPage failed. So you might write a method like this:
getPageRetry(5, "http://www.google.com")
Well, it seems works fine, but how about you got some other operations to retry? connectToClient, getTime and so on. You have to write
connectToClientRetry(5, "example.com") getTimeRetry(5, "timeserver.com")
You have to write retry-version version of every function that may fail. That’s not a good idea. Twisted provides a very beautiful ansyc callback object named Deferred. It looks like Chain-of-responsibility, but it can also handle errors well. With its beautiful design, I wrote a retry function that can warp all function return Deferred that might fail. That’s no need to modify any exists code of getPage to make it to retry automatically. Here is the recipe:
import logging from twisted.internet import defer log = logging.getLogger('retry') def retry(times, func, *args, **kwargs): """retry a defer function @param times: how many times to retry @param func: defer function """ errorList = [] deferred = defer.Deferred() def run(): log.info('Try %s(*%s, **%s)', func.__name__, args, kwargs) d = func(*args, **kwargs) d.addCallbacks(deferred.callback, error) def error(error): errorList.append(error) # Retry if len(errorList) < times: log.warn('Failed to try %s(*%s, **%s) %d times, retry...', func.__name__, args, kwargs, len(errorList)) run() # Fail else: log.error('Failed to try %s(*%s, **%s) over %d times, stop', func.__name__, args, kwargs, len(errorList)) deferred.errback(errorList) run() return deferred
Here you are! Now you can use it to make your operation to retry automatically!
from twisted.internet import reactor from twisted.web.client import getPage def output(data): print 'output', data def error(error): print 'finall error', error d = retry(3, getPage, 'http://www.google2.com') d.addCallbacks(output, error) d = retry(3, getPage, 'http://www.google.com') d.addCallbacks(output, error) reactor.run()
到底是誰賣了你的購物資料
九月 24th, 2009前些天,為了畢業專題,我上網在PCHome的DA量販買了一條USB 轉 RS232的線傳輸線,有趣的是,就在昨天,我接到了詐騙電話,他們很清楚我買了什麼東西,告訴我他們弄錯了帳款,變成了自動分期扣款,我用不屑的口吻說了一句』喔…』,不知道是從我的口氣感到沒有希望還怎樣,通話過一陣子就斷了,這不是我第一次接到這樣的詐騙電話,每次我接到都覺得很火大,火大不是在於電話本身,而是賣家把我的資料洩漏出去,而更火大的是,通常都沒人想去追究到底是從哪洩漏出去的,每個環節都聲稱他們都有定期掃毒、更換密碼、他們的平台很安全,因為舉證困難,比起找到哪裡流出去的,把責任推給別人比較簡單,這讓我覺得相當火大,於是我就開始想,到底是誰把資料流出去的,做了一些研究和推論
大家都一樣火大
我相信,接到詐騙電話的人,發現自己資料被一清二楚地流出去時,都一樣火大,在評價上肯定不會有什麼好評,於是我找了DA量販的負評來看,果不其然,也有人抱怨接到了詐騙電話,時間也都很接近
評價等級: 待加強 (2009/09/17 20:57:59) (最新一筆)
評價意見: 我上個月買一顆電池, 今天就接到詐騙電話, 為何對方會知道詳細交易內容? 煩請加強你們的交易資料保密, 謝謝!
評價等級: 待加強 (2009/08/29 01:31:54) (最新一筆)
評價意見: 商品品質:產品質量太差太粗糙,買回來一裝,把我上千元的手把控制器整個磨損!! 洞大小根本不對,難放入更難拔出!!太可怕了!! 貪圖東西便宜,到頭來卻是當垃圾丟了!! 而且還造成原來搖控器的損壞,最後實在是虧更大!!! 今晚還莫名其妙接到廠商打來說付款有問題,錢都從我信用卡裡扣了!還有啥問題?!! 麻煩廠商下次自行先查清楚,這年頭詐騙太多,沒事別亂打給顧客說款項有問題,害我差點報警!!
光看這樣似乎是DA量販有意或無意的把資料流出去,但這麼想太過武斷,因為能接觸到個人資料的環節不少,我不清楚他們背後的流程怎麼跑,但是資料經過的環有不少,可能這些環節都會流出去,至於環節有哪些,從使用者的電腦、商家、物流、銀行等等,我們先一項一項來看
使用者
俗話說,懷疑別人前要先懷疑自己,通常,使用者的這一環節是最弱的,不管在安全知識等各方面,都是相當不足容易被侵入的,但是考慮到一個重點,其實就可以發現,資料從使用者這邊流出去的可能是有,但其實非常少,只要站在詐騙集團的立場來看就很清楚,問題就出在於成本,使用者用電腦的作業系統、環境都不相同,做為詐騙集團,如果要從使用者這裡取得資料,例如植入木馬,有如大海撈針一般,如何知道這使用者有在線上購物,並且能夠抓到他購物的資料? 當然也可以使用釣魚的手法寄假的登入頁面,只是比起店家的信箱,買家的資料較難取得,再者,同一個人可以被騙幾次? 所以,再笨的詐騙集團都知道,從使用者個體電腦取得資料一點都不划算,我有見過很多商家都這麼說:
店家回覆: 非常謝謝您的支持!針對此詐騙電話,pchome已做防護,詐騙資料是由客戶端所收到的購買回件做詐騙,並非商店街外洩,請更改您本身的密碼,以防詐騙集團入侵你的信箱!希望以後還能再為您服務喔!^^ (2009/09/22 09:50:34)
像這之類的,根本是鬼扯! 詐騙集團監視你的信箱能撈到多少資料? 而你又能被騙幾次? 所以下次店家要推卸責任請想更好一點的理由
店家
店家的安全防護和知識等等,其實就和一般使用者沒兩樣,比起從使用者身上拿到資料,資料一次拿到都是一堆的,店家當然是最好的選擇,因為防護差又好騙,舉個例子,我可能只要寄封信謊稱自己是PCHome的工程師,近來很多人的電腦都被詐騙集團植入木馬,我們免費提供木馬清除程式,又或著釣魚(Phishing)的手法,把假的登入頁面,大量寄給商家,這之中肯定會有人上勾,只要有一個店家上勾,就有一票的資料可以詐騙,笨蛋才從使用者那邊盜那丁點資料,當然,店家內有內鬼,這也是有可能的,在我看來,店家可以說是最弱的環節,但通常是死不承認,又不認真面對找出漏資料的原因,讓資料一直漏,可以說是詐騙集團的最主要幫兇
平台
比起從店家那裡盜資料,要從平台上拿到資料所需要的門檻高了一些,但說高其實也不高,如果平台有漏洞的話,有心人士發現,往往會將資料全倒出來,然後賣給詐騙集團,可能一筆資料只值幾塊錢這樣,大家的購物資料就這樣被以賤價賣給詐騙集團,所以,如果發現收到詐騙電話的人資料遍佈整個平台,就極有可能是平台有洞被撈資料了
除了官方的平台,也很常見到賣家使用外面賣的轉帳資料登入系統,也就是轉帳完買家進入他們的系統登入買了什麼東西,多少錢,然後轉帳時間、金額等等,像是Yahoo!奇摩拍賣就有不少賣家使用那樣的系統,我相信像很可能有相當大比例的資料是從那些系統流出去的,因為那些系統安全性比起Y拍和PCHome等平台,可能大多都很差,畢竟可能是菜鳥寫出來的網站,或許簡單的SQL Injection就能撈出一堆資料來,誰有興趣的話,或許可以抓資料做做看詐騙負評和轉帳資料登入系統的正相關性,我猜應該會相當高
銀行
有趣的是,除了把責任推給使用者,也有店家把責任推給銀行
評價意見: 貨收到後第二天接到一通詐騙電話,被我識破後還被那女生罵髒話))&^%&*^$*&()….貴公司的客戶資料應該加強保護.
店家回覆: 您好:有看到您的評價意見了,現在資訊發達個人資料容易外漏,我本人若接到不明來電一定先詢問「怎麼會有我的電話號碼」,所以我們是非常厭惡不明電話的騷擾~~~我們緹悠樂活只專心致力於我們的本業,顧客的個人資料都是我們的珍藏絕不外漏,這點您絕對可以放心,我想在網路購物或任何的申請都需填寫個人資料,另外銀行或電信業者都可能盜取我們的個人資料外賣,我個人是很少留資料給他人,也常懷疑銀行或電信業者把我個人資料外瀉,甚至公家機關不法人員也有可能,太多太多的可能了,但我們向您保證您的資料絕不是從我們手中流出去的,如果您釋懷的話請給我們好一些些的評價吧!! 祝您有個美好的一天~~~ ^╴╴╴^ (2009/09/03 09:45:57)
雖然這裡沒有提到有任何詐騙集團的手法,但通常都是說你買了xxx東西,多少錢,都一清二礎地跟你說,讓你取信他們確實是賣家,接著才開始誆你,如果說銀行把資料流出去,那銀行又是怎樣知道你買了什麼東西? 店家會把商品名稱讓銀行知道嗎?
98/ 09/ 04 刷卡消費 $169 – xxx 網路家庭國際資
以我在DA量販刷卡的記錄,銀行並不會記錄你買了什麼東西,頂多像這樣知道你在PCHome花過錢,所以像這種推卸責任的理由一樣很弱,再者銀行是最重視安全的單位之一,比起隨便請工讀生來處理訂單的店家安全等級要高太多了,不檢討自己居然懷疑到銀行頭上蠻可笑的
物流
當然把東西交到客戶手上的就是物流,所以也是推卸責任的好目標,請問大家在收到東西時,有多少店家會蠢到把買什麼東西、多少錢直接就寫在包裝外面? 我確實遇過,這些店家大概一點隱私權的概念都沒有,但是非常少數,所以基本上包裝上是不該寫買了什麼東西的,接著問題就是,店家將商品交給物流業者時,商品到底是包好的,還是沒包好的? 以我自己去郵局寄東西給買家來看,我們都是把東西包好再交給物流的,有人把東西交給物流在由他們包裝嗎? 或許有,但是如果不是的話,把責任推給物流就說不過去,不過或許有些跟物流之間會有商品報值的資料,這樣一來的確也是有可能會外流
數據會說話
當然,上面這些光說,沒有數據其實也只是空打嘴砲,為了能夠更清楚的瞭解資料外流的情況,我花了一點時間寫了一個小爬蟲爬了PCHome商店街所有店家的負評第一頁,在那之中找到有』詐騙』字眼的負評,而我發現有些是商品太久沒來,買家抱怨他們是不是被詐騙的留言,為此我手動過濾了一下,挑出確實是有說接到詐騙電話的負評,以下是列表
評價等級: 待加強 (2009/09/17 20:57:59) (最新一筆)
評價意見: 我上個月買一顆電池, 今天就接到詐騙電話, 為何對方會知道詳細交易內容? 煩請加強你們的交易資料保密, 謝謝!
評價等級: 待加強 (2009/08/29 01:31:54) (最新一筆)
評價意見: 商品品質:產品質量太差太粗糙,買回來一裝,把我上千元的手把控制器整個磨損!! 洞大小根本不對,難放入更難拔出!!太可怕了!! 貪圖東西便宜,到頭來卻是當垃圾丟了!! 而且還造成原來搖控器的損壞,最後實在是虧更大!!! 今晚還莫名其妙接到廠商打來說付款有問題,錢都從我信用卡裡扣了!還有啥問題?!! 麻煩廠商下次自行先查清楚,這年頭詐騙太多,沒事別亂打給顧客說款項有問題,害我差點報警!!
評價等級: 待加強 (2009/07/06 05:09:32) (最新一筆)
評價意見: 我本來要寫優良的! 但是很不巧…我今天接到詐騙集團的電話! 他很清楚告知我買了什麼東西,金額和數量! 當然方式還是…一樣! 又說你付的方式 變成分期付款! 不處理會被扣很多錢! 我不知道是Pchome還是貴公司的問題! 購買你家的東西…客戶資料會被洩漏光光! 請其他人要小心此情況!
店家回覆: 您好~真抱歉,因為我們的資料都是留在pchome後台的,會通知pchome加強控管,也會更改我們的後台的帳密及加強電腦掃毒,以查明原因,謝謝您的通知,對於造成您的困擾,深感抱歉!! (2009/07/06 08:23:30)
評價等級: 待加強 (2009/09/02 14:45:01) (最新一筆)
評價意見: 接獲詐騙集團電話..嚴重懷疑pchome商店街系統洩漏個人資料往後將會慎重考慮是否繼續在pchome商店街購物謝謝
店家回覆: 您好:有看到您的評價意見了,現在資訊發達個人資料容易外漏,我本人若接到不明來電一定先詢問「怎麼會有我的電話號碼」,所以我們是非常厭惡不明電話的騷擾~~~我們緹悠樂活只專心致力於我們的本業,顧客的個人資料都是我們的珍藏絕不外漏,這點您絕對可以放心,我想在網路購物或任何的申請都需填寫個人資料,另外銀行或電信業者都可能盜取我們的個人資料外賣,我個人是很少留資料給他人,也常懷疑銀行或電信業者把我個人資料外瀉,甚至公家機關不法人員也有可能,太多太多的可能了,但我們向您保證您的資料絕不是從我們手中流出去的,如果您釋懷的話請給我們好一些些的評價吧!! 祝您有個美好的一天~~~ ^╴╴╴^ (2009/09/03 09:45:57)
評價等級: 優良 (2009/08/24 03:10:28)
店家回覆: 您真是一位超級好買家!非常感謝您的光臨╭☆°蔓蒂 小鋪。祝福您事事順心喔~ (2009/08/24 09:54:49)
評價等級: 待加強 (2009/09/01 05:05:00)
評價意見: 客服態度不錯,但我的定安資料卻外流了,還遇到詐騙集團,整個不開心!
店家回覆: 您好,我們在Pchome的這個平台,有使用動態密碼及定期更新密碼、掃毒,不過現在歹徒也越來越厲害,不知道他是怎麼竊取資料的,也有可能是您的電腦在上傳資料時被竊取的,建議您也掃毒一下唷,我們已經立刻更新我們的密碼、掃毒,謝謝您特地來信告知喔!同時我們也已經通知Pchome有此情形!提醒您,並沒有什麼ATM按錯變成分期付款,還會每月扣款,如接到可疑電話,請馬上掛斷!並打165報警!也可來信or來電跟我們確認喔 (2009/09/01 09:19:47)
評價等級: 待加強 (2009/09/01 05:07:04) (最新一筆)
評價意見: 客服態度
店家回覆: 很抱歉~讓您覺得不開心~我們也已經向Pchome反應此情形 (2009/09/01 09:21:59)
評價等級: 待加強 (2009/09/16 01:49:54) (最新一筆)
評價意見: 商品是不錯用!但是已經購買一個多月了.竟還會接到詐騙電話!感受很差!!這是為什麼?請給個合理的理由!!謝謝!!
評價等級: 待加強 (2009/09/20 21:37:11) (最新一筆)
評價意見: 我跟你們 買牛肉幹乾這事! 為何詐騙集團會知道? 竟然被詐騙集團利用 打電話 給我來行騙! 我想請教! 我的個資為何從你那邊流出去? 我的地址!電話 交易項目 金額 信用卡公司,…都給詐騙集團知道了! 對我的安全有很大影響! 打電話給你們 卻是推的一甘二淨!! 超級不負責任!!
店家回覆: 非常的感恩PChome商店街在此特別提醒您,商店街的店家與PChome工作人員,均不會要求消費者至.提款機操作任何功能,請小心勿上當。如果接獲不明人士來信或來電,應立即撥打165防詐騙專線查詢或透過 PChome商店街客服中心 https://storessl.pchome.com.tw/adm/appeal.htm 查證。 PChome商店街與您一起努力維護網路交易安全! (2009/09/20 22:10:11)
評價等級: 待加強 (2007/11/13 11:10:38) (最新一筆)
評價意見: 我這才發現原來我接到的那通電話也是詐騙集團打來的!! 本人在此鄭重表達對貴公司以及PChome商店街的嚴重不滿與抗議!! 我們都是基於信任貴司與PChome商店街交易平台的安全性才在此消費, 現在我們的手機以及其他個人資料全被盜取, 請問貴公司以及PChome商店街要怎麼負責與解決??!
店家回覆: 不好意思 我們已經將此情況反應給pchome了並通知警方處理 造成您的不便請見諒 (2007/11/14 12:09:14)評價等級: 待加強 (2007/11/12 14:55:15) (最新一筆)
評價意見: 你們公司把我的交易資料外洩,害詐騙集團打來騷擾我, 他們明確說出我在何時,跟你們購買了這組電蚊拍,還問我使用狀況接著騙我說我的交易紀錄出問題,說你們的交易平台異常,還我一次付清的方式變成分期, 要我去ATM取得交易明細表,還有上面的經辦證號與時間,才能做取消分期, 後來我去查證才知道是詐騙, 請大家要小心在這家商店購物時的資料外洩問題!!!!
店家回覆: 我們已經將此情況回應給pchome及警方了 不好意思造成您的困擾 本館絕對不會打電話告知民眾要更改交易方式如有任何問題請來電本館進行確認的動作 (2007/11/12 18:27:51)
評價等級: 待加強 (2009/09/21 22:01:38) (最新一筆)
評價意見: 今天接到所謂客服來電,表明了解此次購物,但說到後來就成了詐騙電話,說我會被每月扣款…。我想你們系統也太差了,個資馬上外洩。
店家回覆: 非常謝謝您的支持!針對此詐騙電話,pchome已做防護,詐騙資料是由客戶端所收到的購買回件做詐騙,並非商店街外洩,請更改您本身的密碼,以防詐騙集團入侵你的信箱!希望以後還能再為您服務喔!^^ (2009/09/22 09:50:34)
評價等級: 待加強 (2008/01/06 23:51:55) (最新一筆)
評價意見: 該商家漠視消費者權益使客戶資料外洩導至接到詐騙電話,告知店家還回覆網路世界無孔不入.很抱歉.讓大家權益受損.只要大家小心別上當.謝謝。如果真的詐騙成功不知道該找誰,跟別的店家買比較安全。
店家回覆: 您好~我們不幸被選作為詐騙跳板,我們也很懊惱、難過;每天擔心顧客不小心上當受騙,而知道顧客在被詐騙集團騷擾後,我們馬上報警處理並通知165詐騙專線,並沒有漠視消費者權益,但警方說這種網路詐騙事件他們不處理,165詐騙專線也只是確認他們是詐騙集團,根本無法將他們繩之以法,我們也求助無門。最後我們只好致電跟上百位客戶提醒,並不是沒有在處理,希望您能見諒,謝謝您 (2008/01/07 14:30:13)
評價意見: 貨收到後第二天接到一通詐騙電話,被我識破後還被那女生罵髒話))&^%&*^$*&()….貴公司的客戶資料應該加強保護.
店家回覆: 大大 您好看到您的評價意見很驚嚇,網路駭客橫行這是無庸置疑不爭之事實,但是,我們公司從未發生(客戶資料被盜)這種事件,/駭客入侵/有可能是網路伺服器 (PChome主機)及買賣雙方的電腦被駭客值入木馬程式才會造成資料外流,經您反映告知我們公司已立即將管理網站的電腦主機請工程師協助迅速掃毒檢測,惟並無發現任何異常之處,公司同時將您評價意見及發生情況反映PChome相關部門之外,也特別對您發生以上事件表達關心之意。網路駭客真是猖狂橫行,唯有隨時保護好個資之外也建議您隨時為自已的電腦掃毒(請特別注意一般的防毒軟體是無法防止網路駭客的木馬攻擊和入侵竊盜及控制),避免開啟任何陌生的郵件和連結,發現網路變慢或有異常的情形一定要找電腦專業工程師檢查及維護。對以上的所發生之情事我們一定會再加強個資保護,如還有相關問題也歡迎您隨時反映連絡PChome客服投訴,或來電亞訊e機棒_科技生活館服務中心向服務人員告知。最後 敬祝 平安幸福 亞訊e機棒_科技生活館 經理 林茂森敬上 98/03/20 (2009/03/20 03:00:03)
評價等級: 普通 (2009/08/25 17:00:24)
評價意見: 我今天收到詐騙電話 對方聲稱 我於貴商店購買的耳機付款有問題. 會造成連續扣款. 我想了解貴商店的資料是如何洩漏出去的?
店家回覆: 謝謝您的評價~有您的支持與推薦,我們一定更加努力經營,以後有機會再為您服務囉!! (2009/08/25 17:38:19)
評價等級: 待加強 (2009/09/12 11:55:29) (最新一筆)
評價意見: 我接到詐騙電話 聲稱我於貴商店購買的 『JABRA BT-530~全新遠寬公司貨~ 原廠[一對二]A2DP 抗燥音藍牙耳機BT530″ 商品,誤辦理為分期.要求我要操作 ATM. 我於兩週前通知貴商店,並請貴商店了解資料是如何外洩的.可貴商店置之不理. 貴商店漠視此一資料外洩事件的態度令人感到匪夷所思. 貴商店既然沒有誠意想改善與了解此個人資料外洩的問題.我也不會再貴商店繼續購物.以免個人資料繼續外流.
店家回覆: 您好:我們十分重視您的購物權益,針對資料外洩一事我們因為跟PCHOME研商原因,且目前PCHOME也已經把購物通知的資料不再顯示,我們也針對公司內部所有電腦進行資安清查,並未發現有病毒侵害及惡意木馬的植入,我們已經於9/10日去函回覆給您~有關PCHPME系統人員的答覆,及我們的處理狀況, 並非置之不理尚請貴客見諒!讓您擔憂在此致上我們的歉意,這一次PCHOME的事件已經有許多商店都有受害,您和我們都並不是唯一的受害者,相信只要我們提高警覺,一定可以打擊犯罪~感恩 (2009/09/12 18:46:58)
以這些資料來看
DA量販有兩筆負評是收到詐騙電話,加上我自己一筆,不知道是太久沒給還是我給過了所以不能給,總共有三筆,在這麼短的時間內有三筆收到詐騙電話的負評,顯然肯定是從店家或平台流出去的,在我撈到的資料中,店家有六千多家,但是有詐騙反應的,通常都是少數、且在集中時間內,這樣表示,從平台漏出去的機會比較小,而從店家流出去的可能性極高,以DA量販來看,幾乎可以肯定是他們流出去的,至於是有意或無意的這就不知道了,而其它撈到的,像是電氣男購物城,也有短時間內兩筆,這同樣顯示資料是從他們那裡流出去的,為什麼其它店家都不會有這問題,而上列的這些店家都有呢? 當然就是你們自己的問題了,可是令人憤怒的是各種推託的理由都有,把責任推給使用者、推給銀行,就是死不承認資料是從自己身上流出去的,我知道這很難證明,但是數據和各種的可能性推出來都是如此
不是只有中木馬才會把資料流出去
很多店家都說,我們都有掃毒,但都沒發現異狀,其實會把資料流出去,通常都是用更簡單的方式,就是用釣魚(Phishing)的方式,是一種比木馬等其它方式取得帳密更簡單卻有效的方式,做法就是,造一個假的商店街登入頁面,看起來和原本的一模一樣,然後以各種理由把網址傳給店家,例如回復訊息、您的帳號疑似被盜用,需要更改帳密等等,店家收到信之後,點開連結要求你登入,於是傻傻的店家就真的把帳號密碼輸入了這個假的登入頁面接著送出,當然就落到了詐騙集團手裡,而且比起木馬會被發現,但是他拿到你的帳密後,偷偷登入看你買家的資料,你可能很難發現,其實平台可以提供登入的記錄,如果這麼做的話,也比較容易發現偷偷的登入,像是如果我明明沒有在昨天早上登入帳號,但是卻有這麼一筆記錄存在,很明顯地肯定是帳號密碼外流了
外流資料環節偵測系統
雖然有這些數據,但是其實還是不足以夠精確地找出資料是從哪裡流出去的,為此,我想可以用一套系統,以科學的方法找到並定位資料外流的環節,方法很簡單,就是每發現一次接到詐騙電話的反應,就在那訂單所有可能外洩的環節上,加上1分,在這麼多個環節中,分數遠高於其它環節的,即是外流資料的兇手,為何資料外流積分這麼高,不是你流出去的難到是阿飄流出去的嗎? 可惜的是似乎從來沒有人認真去找出資料是從哪裡流出去的,就任憑資料流阿流的,成為詐騙集團的最大幫兇
最後
使用者本身也有責任,接到詐騙電話,資料是從某店家那裡流出去的,請不用客氣給與最差的評價,連客戶資料都管不好,沒什麼好講的,務必讓大家知道這些沒辦法守好資料的商家,或是在其背後的任何一個環節,店家也要養成良好的習慣,不要亂裝來路不明的程式,有人寫信給你,連結最好不要隨便點,即使點開了,也要注意網址正不正確,有些甚至會申請看起來好像很像的域名來魚目混珠,千萬不要傻傻的點了就輸入帳號密碼試著登入,最好是自己從後台的書籤入口登入,避免點信件裡的連結接著登入系統,因為信件裡的連結很可能都是釣魚的連結,不然到時帳號密碼又被偷,不僅賠了商譽,買家也會成為受害者
Autostart script for TurboGears2
九月 20th, 2009When I complete a TurboGears2 application, I got a problem, how to keep my tg2 application always running on Webfaction? You know, there is scheduled downtime of virtul host. Therefore, if you just run 『paster serve –daemon production.ini』, once the machine down, your application is down, too. So you have to find a way to keep your tg2 application up. I can use crontab to check tg2 application every 5 minutes, but during the just-start-up 5 minutes, your application is not working, so it is not a good idea. I notice Webfaction use a autostart CGI script for TurboGears1 application. So I decide to use that autostart CGI script to run my tg2 application. You might ask, what is autostart and how it works? Autostart script is a simple script to keep your web application up. It is executed when mod_rewrite of apache can’t connect to your server. By using autostart script, the application runs on-demand! If there are no users browse your application, it’s no need to run your application, that saves your memory usage.
However, the autostart Webfaction provided is for tg1, so I modify it for tg2, here is the modified version of autostart:
<![CDATA[ #!/bin/env python2.4 import os # Test if the process is already running running = False # read status of tg2 application lines = os.popen( 'source /home/victorlin/webapps/tiange/tg2env/bin/activate;' 'cd /home/victorlin/webapps/tiange/tiange/tiange;' 'paster serve status production.ini').readlines() line = lines[0] if line.startswith('Server running in PID'): running = True print "Content-type: text/html\r\n" if running: print """<head><META HTTP-EQUIV="Refresh" CONTENT="2; URL=."></head><body> Site is starting ...<a href="." mce_href=".">click here<a></body>""" else: print """<head><META HTTP-EQUIV="Refresh" CONTENT="2; URL=."></head><body> Restarting site ...<a href="." mce_href=".">click here<a></body>""" os.system( 'source /home/victorlin/webapps/tiange/tg2env/bin/activate;' 'cd /home/victorlin/webapps/tiange/tiange/tiange;' 'paster serve daemon production.ini') ]]>
My review of Webfaction
九月 17th, 2009One year has past since I bought Webfaction’s virtual hosting service. Today, I want to share my experience.
Webfaction’s virtual hosting is more than virtual hosting
What does it mean 『virtual hosting is more than virtual hosting』? Well, normaly, people think virtual hostings are all like that, upload your application by ftp, there is only few types of web application you can run, might be php, asp or some out-of-date web stuff. And you configure your site with control panel like cPanel or whatever, and that’s all what you can do with your host, pretty little isn’t it? You buy the service, but they limit what you can do.
Webfaction, they give you full access to your SSH account, you can do almost everything you like with that. You can install whatever you like in your home directory, but indeed, they already installed most of popular tools for you.
For example, you can compile and install software you like in your home directory, You can even compile a server wrote in c language and run it on Webfaction! I wrote an article to demonstrate how. You can read the article here: WebFaction能不能跑自己的C/C++語言Server? (it is in traditional chinese)
Another example, as I said, it is more than virtual hosting, actually, that also means it is more than web hosting. What if you want to run a non-http server? You have to buy a VPS, right? No! You can use Webfaction virtual hosting to run your non-http server! I wrote a little chat room server for my Andriod chat room client, and it runs well on Webfaction! (It is not running now, because I have no time to maintain, and there is lots of problem in my client, it consume lots of cpu even it is running in background, I did’t handle the life cycle event well, so I just stop running it, until I got time to update)
Also, full access to SSH brings much more convenience to deal with your web applications. For example, it is a big trouble to deploy your web application by uploading all stuff by FTP. You have to remeber what pages are changed, and upload them manually, it is killing me. With Webfaction, I use SVN to deploy my web application. I do develop web application on my machine, then, I check out my web application on Webfaction, and everything is there! If you just modified some pages of you site, you can commit them and run 『svn update』 on your application directory, and that’s it! Well, if you don’t like svn, you have your choice! You can install git, or if you like mercurial more than git, you can install them, too.
Easy to use panel, install most popular web applications in few minutes
What if you don’t know anything about linux commands, SSH or those geek’s stuff? Well, actually, Webfaction give you extra right by allowing you accessing SSH. But if you don’t know how to use SSH, that’s okay, you can use their powerful and easy-to-use control panel to install most of poplular web application without typing any lines of command, just fill the form, and click, then everything is there.
There is so many popular applications that you can install with their control panel, e.g. wordpress, django, turbogears, trac, joomla, ruby on rails, subversion… and etc. They use a simple but powerful architecture to let you configure your websites.
As you can see in this figure, you mount domain names to application, that is, a website. It is pretty easy to use once you knows how it works.
Friendly customer support
Webfaction provide the best customer supporting I have never seen. When I encounter problems, they always give me good answers, they are willing to help you out. That makes me feel happy, they do care about how customers feel. Also, they know much about techniques, although they provide so many cool new stuff, they still know technical details about those stuff, and they can help you out with their sophisticated experience. Their nice customer supporting never let me feel regret to recommand.
It’s balance between convenience, freedom and price
You might ask, why I don’t just buy a VPS? Well, that’s obvious. First, I have not time to manage a VPS, it is not a happy job to upgrade your server, monitor it all the time. It is annoying to receive a database down warning message when you are sleeping. All I want is to run my sites, why I have to take care about so much details? Second, as you see, I got some new stuff to run, like TurboGears2, I also got some non-http servers to run, why I have to buy a VPS just because some new stuff or non-http server? Finally, the price is the matter, it’s not cheap to buy a good VPS. So the conclusion is, Webfaction is the balance between convenience, freedom and price. That’s why I choice it.
Finally
Finally, if you’re interested in WebFaction’s hosting service, you can just try it, they do provide 60-days money back guarantee, so you can get the money back if you don’t like it for any reason. And when you’re filling the signup form, you can fill my account 『victorlin』 in the 『Promo code or referrer』 field. I will be appreciate.













