The Insanity of Email and Malware
July 3, 2008 2:02 PMWe used to have one mail server that processed all our incoming email for the viruslab. It also served as our email honepot.
I have written several other blog entries on the amount of email we process and concluded every time that it is insane. We do process an immense amount of email every day. Several log analysis products we tried simply gave up or gave wildly inaccurate results when it tried to process our email logs.
Due to the volume of email we had to upgrade our infrastructure. We currently have multiple email servers handling the viruslab email. One of the servers are dedicated to incoming malware samples, the rest to spam. The server managing the malware samples do see some spam but it is probably a thousand times less than what the rest of the infrastructure is seeing.
To provide some scale: We process around 400,000 spam messages a day. What is surprising is that it only represents around 700 MB of email a day. So where did we get the 1 TB of email we processed last year? Did spam suddenly become smaller? The spam for this year may account for around 250 GB of mail if it stays at current levels.
My prediction for the amount of email we will process for this year is around 2 TB.
The difference is malware samples. We expect to receive 1.75 TB of samples this year by email. If you make the wild assumptions that the average piece of malware is around 200 kB in size and it is 40% larger as an email, then it would imply that we will receive and process around 6.7 million malware samples for this year from email.
Did I mention that our definition files now list more than 1 million signatures? And as can be seen from my previous blog about useless statistics the signatures only represents about half of what we detect. We have been and will keep on focusing very strongly in making the 50% into 70% or better.
I even include pictures in this blog entry.
The first picture illustrates the number of emails processed by the original email server for the year. Look at the sudden drop in April. That is when we introduced the new infrastructure.

The second picture shows the total size of the messages being processed. Note how there is virtually no change even though the volume has decreased significantly.