Parsing web log files

The other day, I was trying to import the log files from NorthshoreCoop.org, so I ran VFP’s Import Wizard. Unfortunately, it doesn’t work well with the different delimiters in a typical Apache log. So, I gave Excel’s Import Wizard a chance at it, with identical results.

Time to do things the hard/fun way.

CREATE CURSOR crsrLog ( ;
    cClientIP C(15), ;
    cIdentity C(20), ;
    cRemoteUser C(40), ;
    cDateTime C(30), ;
    cRequestLine C(80), ;
    cStatusCode C(3), ;
    cSize C(10), ;
    cDomain C(80), ;
    cReferer C(80), ;
    cUserAgent C(80), ;
    cUnknown C(20))

lcLog = FILETOSTR(GETFILE())

*!* Read each line into an array element
lnLineCount = ALINES(laLogLines, lcLog)

FOR i = 1 TO lnLineCount
    APPEND BLANK

    lcLogLine = laLogLines(i)

    *!* Break the line down into space-delimited tokens.
    lnTokenCount = ALINES(laTokens, lcLogLine, .T., " ")

    lnField = 1
    lcField = ""
    llInString = .F.

    FOR j = 1 TO lnTokenCount
        lcToken = laTokens(j)

        *!* Does the token start or end with a delimiter?
        llLeft = INLIST(LEFT(lcToken, 1), '"', '[')
        llRight = INLIST(RIGHT(lcToken, 1), '"', ']')

        DO CASE
            CASE llLeft = llRight
                IF llInString
                    *!* If we're in the middle of a string, append the token to the string.
                    lcField = lcField + " " + lcToken
                ELSE
                    *!* Otherwise, just fill in the field and go to the next one.
                    REPLACE (FIELD(lnField)) WITH lcToken
                    lnField = lnField + 1
                ENDIF 

            *!* If the field starts with a delimiter, start building the string.
            CASE llLeft AND NOT llRight
                llInString = .T.
                lcField = lcToken

            *!* If the field ends with a delimiter, finish building the string,
            *!* and fill in the next field
            CASE NOT llLeft AND llRight
                lcField = lcField + " " + lcToken
                REPLACE (FIELD(lnField)) WITH lcField
                lnField = lnField + 1
                llInString = .F.
        ENDCASE
    ENDFOR

    lnField = 1
ENDFOR

Posted by Garrett on May 8th, 2004 in Uncategorized | No Comments