Bermain dengan Awk untuk log / file text processing

Salah satu bahasa pemograman di Linux yang sering dipakai oleh para professor, programmer dan peneliti di luar negeri adalah Awk. Ini semacam “little” programming language. Awk biasanya dipakai untuk analasis log yang panjang atau grab text lalu di-modify. Intinya, kalau anda mendapatkan data di file text dan butuh di-analisis namun tidak mau ribet, bisa memakai Awk.

Saya menggunakan Ubuntu Natty 11.04 dimana Awk sudah ter-install otomatis didalamnya. Untuk pengetahuan dasar, strutur penggunaan awk adalah

1
awk ''

. Dimana syntax pemograman bisa dilakukan di dalam kurung-kurawal {}. Contohnya adalah awk ‘{print $3}’. Mari kita coba & silahkan buka terminal / console anda.

1. Awk digunakan untuk print kata ke-n

1
echo 'this is for testing purpose only' | awk '{print $3}'

Ini akan menghasilkan result : “for”. Awk meng-analisis string yang di-echo. Lalu Awk meng-eksekusi fungsi “print $3″ dimana memunculkan kata ke-tiga dari string yang didapatkan.

2. Mem-parsing log files.
Berikut ini contoh file log dari NGINX salah satu domain saya, silahkan disimpan menjadi test.log.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
66.249.71.137 - - [27/Aug/2011:06:25:16 +0000] "GET /dashboard-interface-2010-ford-fusion-hybrid HTTP/1.1" 200 4285 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:06:26:49 +0000] "GET /why-people-buy-insurance HTTP/1.1" 200 5265 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:06:31:00 +0000] "GET /toyota-yaris-honda-fit-hyundai-accent-nissan-versa-chevrolet-aveo5-sub-compact-shootout HTTP/1.1" 200 4363 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:06:35:11 +0000] "GET /auto-show-test-drive-nissan-teana-part-2 HTTP/1.1" 200 10142 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
209.85.224.87 - - [27/Aug/2011:06:40:23 +0000] "GET /feed HTTP/1.1" 200 3635 "-" "FeedBurner/1.0 (http://www.FeedBurner.com)"
66.249.71.137 - - [27/Aug/2011:06:49:17 +0000] "GET /95-f150-door-repair HTTP/1.1" 200 5151 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:06:54:16 +0000] "GET /assets/image/nissan-march-0-11-07-25-03-59-28.jpg HTTP/1.1" 200 27262 "http://www.autopartsz.com/nissan-march-2003-54km-1200cc-auto" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:06:54:30 +0000] "GET /1932-ford-high-boy-street-rod-roadster.-sold- HTTP/1.1" 200 3975 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
202.96.51.155 - - [27/Aug/2011:06:55:50 +0000] "GET /aaa-auto-insurance-quotes-get-free-instant-quotes-here HTTP/1.1" 200 12229 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; TencentTraveler ; .NET CLR 1.1.4322; .NET CLR 2.0.50215)"
66.249.71.137 - - [27/Aug/2011:06:59:12 +0000] "GET /55-ford-fairlane-club-sedan-wfordomatic-sold HTTP/1.1" 200 3842 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:07:01:33 +0000] "GET /toyota-tacoma-product-information-speaker-adapter-baffle-mounts-2005-2006-2007-2008-2009-2010-2011 HTTP/1.1" 200 4074 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
77.88.43.26 - - [27/Aug/2011:07:01:36 +0000] "GET /professional-drivers-progressive-insurance-automotive-x-prize HTTP/1.1" 200 7676 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
66.249.71.137 - - [27/Aug/2011:07:07:33 +0000] "GET /stereo-replacement-installation-guide-for-toyota-tundra HTTP/1.1" 200 3598 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:07:20:05 +0000] "GET /how-to-replace-ford-explorer-ball-joints HTTP/1.1" 200 4868 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:07:20:37 +0000] "GET /2008-suzuki-reno-t4589-eastern-shore-toyota HTTP/1.1" 200 3011 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:07:24:16 +0000] "GET /ford-taurus-door-panel-and-door-ajar-switch-remove-and-replace-ford-taurus-1997 HTTP/1.1" 200 3239 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:07:31:50 +0000] "GET /viewed HTTP/1.1" 200 5501 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:07:32:22 +0000] "GET /watched HTTP/1.1" 200 5220 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:07:34:43 +0000] "GET /toyota-camry-kelley-blue-book HTTP/1.1" 200 6479 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:07:36:01 +0000] "GET /2008-lexus-rx-350-awd-4dr HTTP/1.1" 200 8538 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.137 - - [27/Aug/2011:07:39:25 +0000] "GET /2009-toyota-yaris-video-review HTTP/1.1" 200 3310 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
94.63.4.57 - - [27/Aug/2011:07:39:27 +0000] "GET /2010-ford-ranger-cruise-control-issue HTTP/1.1" 200 23175 "-" "Mozilla/4.0 (compatible; ICS)"
66.249.71.137 - - [27/Aug/2011:07:42:33 +0000] "GET /2007-dodge-ram-1500-regular-cab-st-pickup-8-ft-bed-one-owner-clean-auto-check-8995.-yelm-wa HTTP/1.1" 200 4416 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
209.85.224.94 - - [27/Aug/2011:07:42:49 +0000] "GET /feed HTTP/1.1" 200 3635 "-" "FeedBurner/1.0 (http://www.FeedBurner.com)"
66.249.71.137 - - [27/Aug/2011:07:46:12 +0000] "GET /2010-toyota-prius-gear-ring-light-making-upgrade-no.-1 HTTP/1.1" 200 3662 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Kita lihat baris pertama :
66.249.71.137 – - [27/Aug/2011:06:25:16 +0000] “GET /dashboard-interface-2010-ford-fusion-hybrid HTTP/1.1″ 200 4285 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

Mempunyai struktur yaitu : IP Address (spasi) – (spasi) – (spasi) Tanggal visits (spasi) dst.

Semisal kita ingin mendapatkan ip-address (kata pertama) dari log ini, maka bisa menggunakan print $1.

1
awk '{print $1}' test.log

Dan hasilnya :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
66.249.71.137
66.249.71.137
66.249.71.137
66.249.71.137
66.249.71.137
66.249.71.137
66.249.71.137
66.249.71.137
66.249.71.137
119.63.196.75
77.88.43.26
66.249.71.137
66.249.71.137
66.249.71.137
66.249.71.137
173.24.212.249
173.24.212.249

3. Lalu bagaimana kalau kita ingin menampilkan kata yang terakhir dari sebuah line ? Pada contoh baris pertama log ini, kita ingin menampilan “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”.

Maka kita bisa menggunakan $NF. Semisal :

1
awk '{print $1, $NF}' test.log

Dan hasilnya :

1
2
3
4
5
6
7
8
9
119.63.196.110 +http://www.baidu.com/search/spider.html)"
66.249.71.137 +http://www.google.com/bot.html)"
66.249.71.137 "Googlebot-Image/1.0"
76.92.113.147 Firefox/6.0"
76.92.113.147 Firefox/6.0"
66.249.71.137 "Googlebot-Image/1.0"
98.139.241.248 "YahooCacheSystem"
98.139.241.248 "YahooCacheSystem"
66.249.71.137 "Googlebot-Image/1.0"

4. Menampilkan line number
Ketika kita mem-parse log panjang, kita menginginkan adanya line number untuk mempermudah analsis. Maka kita bisa menggunakan NR. Contohnya :

1
awk '{ print NR ") " $1 " -> " $(NF-1) }' test.log

Hasilnya :

1
2
3
4
5
6
7
387) 204.236.155.105 -> Resolver,
388) 184.72.47.71 -> "UnwindFetchor/1.0
389) 204.236.147.26 -> Resolver,
390) 184.72.47.71 -> "UnwindFetchor/1.0
391) 50.18.121.47 -> "UnwindFetchor/1.0
392) 50.18.121.47 -> "UnwindFetchor/1.0
393) 46.20.47.43 -> "Mozilla/5.0

5. Formatting
Ketika kita hendak mem-parse date log dengan :

1
awk '{ print $4 }' test.log

Kita akan mendapatkan hasil :

1
2
3
4
5
6
7
8
9
10
11
[28/Aug/2011:03:57:29
[28/Aug/2011:04:04:18
[28/Aug/2011:04:06:08
[28/Aug/2011:04:06:10
[28/Aug/2011:04:06:10
[28/Aug/2011:04:06:21
[28/Aug/2011:04:10:58
[28/Aug/2011:04:10:58
[28/Aug/2011:04:10:58
[28/Aug/2011:04:16:15
[28/Aug/2011:04:21:48

Semisal kita hendak mendapatkan tanggal saja. Maka kita lihat delimiter antara tanggal dan waktu adalah “:” (titik dua). Kita bisa menggunakan delimiter didalam awk, contohnya :

1
awk '{print $4 }' test.log | awk -F: '{print $1}'

Akan menghasilkan :

1
2
3
4
5
6
7
8
9
10
11
[28/Aug/2011
[28/Aug/2011
[28/Aug/2011
[28/Aug/2011
[28/Aug/2011
[28/Aug/2011
[28/Aug/2011
[28/Aug/2011
[28/Aug/2011
[28/Aug/2011
[28/Aug/2011

Kita ingin tanggal yang muncul tanpa ada “[” didepan. Maka kita bisa menghilangkan karakter yang tidak diinginkan menggunakan sed. Contoh :

1
awk '{ print $4 }' test.log | awk -F: '{ print $1 }' | sed 's/\[//'

Akan menghasilkan :

1
2
3
4
5
6
7
8
9
10
27/Aug/2011
27/Aug/2011
27/Aug/2011
27/Aug/2011
27/Aug/2011
27/Aug/2011
28/Aug/2011
28/Aug/2011
28/Aug/2011
28/Aug/2011

Update!
Dari komen ferdianto, kita bisa memakai fungsi substr dari AWK, yaitu :

1
awk '{ print $4 }' test.log | awk -F: '{ print substr($1,1) }'

Fungsi substr adalah meng-ekstrak string. Bisa sama2x menghilangkan karakter di depan.
Namun, terdapat perbedaan antara substr dengan SED.

Ketika kita hendak mem-filter string terhadap karakter tertentu yang letaknya di belakang, di-depan maka kita bisa menggunakan SED u/ efektivitas.

Substr sendiri kinerjanya mirip dengan PHP substr dimana kita meng-ekstrak dari karakter tersebut.

6. Memilah berdasarkan karakter tertentu ( explode )

1
awk 'BEGIN { FS=","}; { print $1} ' test.log

Atau

1
awk -F, '{ print $1} ' test.log

7. Filtering untuk Email

Kita ingin mem-filter berdasarkan gmail dengan separator “,”

1
awk -F, '/@gmail/ { print $1; }' emails.txt
1
awk -F, '/@gmail/ { print $1 $2 }' emails.txt | sed 's/"//;s/"$//;s/""/,/'

Untuk menghilangkan space

1
awk '/@gmail/ { $1=$1; print $1 "," $2 }'

Untuk mengambil link dari sebuah html / text :

1
awk 'BEGIN{ RS="<A *HREF *= *\""} NR>2 {sub(/".*/,"");print; }' index.html

Anda baca sampai disini? Bagus! Sekadar informasi, AWK lebih tua dari Perl, dimana dibuat pada tahun 1977!

Referensi yang patut dibaca lebih lanjut :
1. http://gregable.com/2010/09/why-you-should-know-just-little-awk.html#
2. http://www.reddit.com/r/programming/comments/dkew8/why_you_should_know_just_a_little_bit_of_awk/
3. http://news.ycombinator.com/item?id=1738688

Anda sudah membaca sampai disini? Mantap! Follow twitter saya.

2 thoughts on “Bermain dengan Awk untuk log / file text processing

    • Fungsi substr awk berguna ketika ada kebutuhan meng-ekstrak. Cuman dalam contoh ini, saya mengenalkan metode filtering menggunakan sed. Sama2x bisa memotong, cuman beda cakupannya.

      Btw, saya update artikel-nya. Terima kasih :)

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>