Parsing web log files
The other day, I was trying to import the log files from NorthshoreCoop.org, so I ran VFP’s Import Wizard. Unfortunately, it doesn’t work well with the different delimiters in a typical Apache log. So, I gave Excel’s Import Wizard a chance at it, with identical results.
Time to do things the hard/fun way.
CREATE CURSOR crsrLog ( ; cClientIP C(15), ; cIdentity C(20), ; cRemoteUser C(40), ; cDateTime C(30), ; cRequestLine C(80), ; cStatusCode C(3), ; cSize C(10), ; cDomain C(80), ; cReferer C(80), ; cUserAgent C(80), ; cUnknown C(20)) lcLog = FILETOSTR(GETFILE()) *!* Read each line into an array element lnLineCount = ALINES(laLogLines, lcLog) FOR i = 1 TO lnLineCount APPEND BLANK lcLogLine = laLogLines(i) *!* Break the line down into space-delimited tokens. lnTokenCount = ALINES(laTokens, lcLogLine, .T., " ") lnField = 1 lcField = "" llInString = .F. FOR j = 1 TO lnTokenCount lcToken = laTokens(j) *!* Does the token start or end with a delimiter? llLeft = INLIST(LEFT(lcToken, 1), '"', '[') llRight = INLIST(RIGHT(lcToken, 1), '"', ']') DO CASE CASE llLeft = llRight IF llInString *!* If we're in the middle of a string, append the token to the string. lcField = lcField + " " + lcToken ELSE *!* Otherwise, just fill in the field and go to the next one. REPLACE (FIELD(lnField)) WITH lcToken lnField = lnField + 1 ENDIF *!* If the field starts with a delimiter, start building the string. CASE llLeft AND NOT llRight llInString = .T. lcField = lcToken *!* If the field ends with a delimiter, finish building the string, *!* and fill in the next field CASE NOT llLeft AND llRight lcField = lcField + " " + lcToken REPLACE (FIELD(lnField)) WITH lcField lnField = lnField + 1 llInString = .F. ENDCASE ENDFOR lnField = 1 ENDFOR
Posted by Garrett on May 8th, 2004 in Uncategorized | No Comments
