pylogsparser-0.4/0000755000175000017500000000000011715707344012216 5ustar fbofbopylogsparser-0.4/normalizers/0000755000175000017500000000000011715707344014563 5ustar fbofbopylogsparser-0.4/normalizers/postfix.xml0000644000175000017500000001545611705765631017016 0ustar fbofbo Postfix log normalization. Postfix logs consist of a message UID and a list of keys and values. Normalized keys are "client", "to", "from", "orig_to", "relay", "size", "message-id" and "status". Ce normaliseur analyse les logs émis par le service Postfix. Les messages Postfix consistent en un UID de message et une liste variable de clés et de valeurs associées. Les clés extraites par ce normaliseur sont "client", "to", "from", "orig_to", "relay", "size", "message-id" et "status". mhu@wallix.com the hexadecimal message UID l'UID associé au message, exprimé sous forme d'un nombre hexadécimal [0-9A-F]{11} # find the component and trim the program if log.get("program", "").startswith("postfix"): log["component"] = log['program'][8:] log["program"] = "postfix" ACCEPTED = [ "client", "to", "from", "orig_to", "relay", "size", "message-id", "status" ] # re to trying to match client and relay address r=re.compile('(?P<host>[A-Za-z0-9\-\.]+)(?P<ip>\[.*\])?(?P<port>\:\d+)?$') couples = value.split(', ') for couple in couples: tagname, tagvalue = couple.split('=', 1) if tagname in ACCEPTED: tagvalue = tagvalue.strip('<>') log[tagname] = tagvalue if tagname == 'status': log[tagname] = log[tagname].split()[0] TRANSLATE = {"to": "message_recipient", "from": "message_sender", "size": "len", "message-id": "message_id"} for k,v in TRANSLATE.items(): if k in log.keys(): val = log[k] del log[k] log[v] = val if 'client' in log.keys(): host, ip, port = r.match(log['client']).groups() if host: log['source_host'] = host if ip: log['source_ip'] = ip.strip("[]") if port: log['source_port'] = port.strip(':') if 'relay' in log.keys(): host, ip, port = r.match(log['relay']).groups() if host: log['dest_host'] = host if ip: log['dest_ip'] = ip.strip("[]") if port: log['dest_port'] = port.strip(':') postfix.+ Generic postfix message with an UID and many key-values couples. Message Postfix générique comportant un UID et plusieurs couples clé-valeur. UID: KEYVALUES the Postfix message UID l'UID du message UID the Postfix key-value couples les couples clé-valeur du log KEYVALUES decode_postfix_key_value 74275790B06: to=<root@ubuntu>, orig_to=<root>, relay=none, delay=0.91, delays=0.31/0.07/0.53/0, dsn=5.4.4, status=bounced (Host or domain name not found. Name service error for name=ubuntu type=A: Host not found) 74275790B06 root@ubuntu root none bounced mail pylogsparser-0.4/normalizers/cisco-asa_header.xml0000644000175000017500000002076511710522746020465 0ustar fbofbo This normalizer is able to parse logs received via the syslog export facility from a Cisco ASA. The normalizer has been validated with Cisco ASA version 8.4. The standard export format (No EMBLEM format) with "Device ID" and "timestamp" options must be selected for this normalizer. Ce normaliseur reconnaît les logs Cisco ASA exportés via la facilité syslog. Ce normaliseur a été validé avec la version 8.4 de l'IOS Cisco ASA. Le format d'export standard (Non EMBLEM) doit être sélectionné avec les options "Device ID" et "timestamp". fbo@wallix.com Expression matching a syslog line priority, defined as 8*facility + severity. Expression correspondant à la priorité du message, suivant la formule 8 x facilité + gravité. \d{1,3} Expression matching a date in the DDD MMM dd hh:mm:ss YYYY format. Expression correspondant à la date au format DDD MMM dd hh:mm:ss YYYY. [A-Z]{1}[a-z]{2} [0-9]{1,2} [0-9]{4} \d{2}:\d{2}:\d{2} Expression matching the device ID. Expression correspondant à l'identifiant de l'équipement. [^: ]+ # define facilities FACILITIES = { 0: "kernel", 1: "user", 2: "mail", 3: "daemon", 4: "auth", 5: "syslog", 6: "print", 7: "news", 8: "uucp", 9: "ntp", 10: "secure", 11: "ftp", 12: "ntp", 13: "audit", 14: "alert", 15: "ntp" } for i in range(0, 8): FACILITIES[i+16] = "local%d" % i # define severities SEVERITIES = { 0: "emerg", 1: "alert", 2: "crit", 3: "error", 4: "warn", 5: "notice", 6: "info", 7: "debug" } facility = int(value) / 8 severity = int(value) % 8 if facility not in FACILITIES or severity not in SEVERITIES: raise ValueError('facility or severity is out of range') log["facility"] = "%s" % FACILITIES[facility] log["severity"] = "%s" % SEVERITIES[severity] log["facility_code"] = "%d" % facility log["severity_code"] = "%d" % severity SEVERITIES = { 0: "emerg", 1: "alert", 2: "crit", 3: "error", 4: "warn", 5: "notice", 6: "info", 7: "debug" } log["severity_code"] = "%s" % str(value) log["severity"] = "%s" % SEVERITIES[int(value)] Expression matching the Cisco ASA Syslog header Expression validant l'entête Syslog d'un équipement Cisco ASA <PRIORITY>DATE SOURCE : %ASA-SEVERITY-MNEMONIC: BODY the log's priority la priorité du log, égale à 8 x facilité + gravité PRIORITY decode_priority the log's date l'horodatage du log DATE MMM dd YYYY hh:mm:ss the log's source (device ID) l'équipement ASA à l'origine de l'événement SOURCE the log's severity la severité du log SEVERITY decode_asa_severity the Cisco ID of the event l'identifiant Cisco de l'évenement MNEMONIC the actual event message le message décrivant l'événement BODY cisco-asa <165>Jan 25 2012 18:31:09 ciscoasa : %ASA-5-111008: User 'enable_15' executed the 'logging host inside2 192.168.30.2 6/11508' command. local4 notice 5 ciscoasa 2012-01-25 18:31:09 111008 cisco-asa User 'enable_15' executed the 'logging host inside2 192.168.30.2 6/11508' command. pylogsparser-0.4/normalizers/UserAgent.xml0000644000175000017500000001056311715703401017175 0ustar fbofbo This normalizer extracts additional info from the useragent field in a HTTP request. Ce normaliseur extrait des données supplémentaires des du champ useragent présent dans les requêtes HTTP. mhu@wallix.com m = extras.robot_regex.search(value) if m: log["search_engine_bot"] = m.group().lower() known_os = {"Mac OS" : "Mac/Apple", "Windows" : "Windows", "Linux" : "Linux"} guess = "unknown" for i,j in known_os.items(): if i in value: guess = j break log['source_os'] = guess USERAGENT USERAGENT findBot guessOS Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html) baiduspider unknown Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10 Mac/Apple Nokia6680/1.0 (4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 unknown pylogsparser-0.4/normalizers/URLparser.xml0000644000175000017500000000731311715703401017156 0ustar fbofbo This normalizer extracts additional info from URLs such as domain, protocol, etc. Ce normaliseur extrait des données supplémentaires des URLs telles que le domaine, le protocole, etc. mhu@wallix.com parsed = urlparse.urlparse(value) if parsed.hostname: log['url_hostname'] = parsed.hostname log['url_domain'] = extras.get_domain(parsed.hostname) if parsed.path: log['url_path'] = parsed.path if parsed.scheme: log['url_proto'] = parsed.scheme URL URL decodeURL http://www.wallix.org/2011/09/20/how-to-use-linux-containers-lxc-under-debian-squeeze/ www.wallix.org http /2011/09/20/how-to-use-linux-containers-lxc-under-debian-squeeze/ wallix.org pylogsparser-0.4/normalizers/cisco-asa_msg.xml0000644000175000017500000010460311710541240020003 0ustar fbofbo This normalizer is able to parse logs from Cisco ASA devices. This normalizer performs log normalisation on log body extracts by cisco-asa_header parser. The normalizer has been tested with Cisco ASA IOS version 8.4. Ce normaliseur reconnaît les logs des équipements Cisco ASA. La normalisation est réalisée sur le corp du log extrait par le normaliseur cisco-asa_header. Ce normaliseur a été testé avec un Cisco ASA IOS en version 8.4. fbo@wallix.com Matches a word. Un mot. [^ ]+ Matches a protocol word. Un protocol. (?:(?:TCP)|(?:UDP)|(?:ICMP)|(?:tcp)|(?:udp)|(?:icmp)|\d+) Matches an action word. Une action. (?:(?:denied)|(?:permitted)|(?:Successful)|(?:Rejected)|(?:failed)|(?:succeeded)) Matches an authentication word. Un type d'autorisation. (?:(?:authentication)|(?:authorization)|(?:accounting)|(?:Authentication)|(?:Authorization)) log['protocol'] = value.lower() log['action'] = value.lower() log['type'] = value.lower() cisco-asa Format of log event_id 305011 Format du log event_id 305011 Built TYPE PROTOCOL translation from SINT:SIP/SPT to DINT:DIP/DPT Translation type Le type de translation TYPE The protocol of translated connection. Le protocole de la connexion translatée. PROTOCOL lower_protocol Inbound interface Interface d'entrée SINT Source IP address Adresse IP source SIP Source port Port source SPT Outbound interface Interface de sortie DINT Destination IP address Adresse IP de destination DIP Destination port Port de destination DPT Built dynamic TCP translation from inside:192.168.1.50/1107 to outside:172.22.1.254/1025 dynamic tcp inside 172.22.1.254 1025 192.168.1.50 Format of logs event_id 302015/302013 Format des logs event_id 302015/302013 Built (?:(?:inbound)|(?:outbound)) PROTOCOL connection ID for SINT:SIP/SPT \([^ ]+/\d+\) to DINT:DIP/DPT \([^ ]+/\d+\)(?: \(USER\))? The protocol of translated connection. Le protocole de la connexion translatée. PROTOCOL lower_protocol Connection ID Identifiant de connexion ID Inbound interface Interface d'entrée SINT Source IP address Adresse IP source SIP Source port Port source SPT Outbound interface Interface de sortie DINT Destination IP address Adresse IP de destination DIP Destination port Port de destination DPT User related to this event Utilisateur en rapport avec cet événement USER Built inbound UDP connection 732748 for outside:192.168.208.63/49804 (192.168.208.63/49804) to inside:192.168.150.70/53 (192.168.150.70/53) 732748 udp outside 192.168.208.63 192.168.150.70 53 Built inbound TCP connection 733280 for outside:192.168.208.63/51606 (192.168.208.63/51606) to inside:192.168.150.70/80 (192.168.150.70/80) (myuser) 733280 tcp outside 192.168.208.63 192.168.150.70 80 myuser Format of log event_id 106023 Format du log event_id 106023 Deny PROTOCOL src SINT:SIP(?:/SPT)? dst DINT:DIP(?:/DPT)?(?: \(.*\))? by access-group "GROUP" .* The protocol of translated connection. Le protocole de la connexion translatée. PROTOCOL lower_protocol Inbound interface Interface d'entrée SINT Source IP address Adresse IP source SIP Source port Port source SPT Outbound interface Interface de sortie DINT Destination IP address Adresse IP de destination DIP Destination port Port de destination DPT Group related to this event Groupe en rapport avec cet événement GROUP Deny icmp src outside:192.168.208.63 dst inside:192.168.150.77 (type 8, code 0) by access-group "OUTSIDE" [0xd3f63b90, 0x0] icmp outside 192.168.208.63 192.168.150.77 OUTSIDE Deny tcp src outside:192.168.208.63/51585 dst inside:192.168.150.77/288 by access-group "OUTSIDE" [0x5063b82f, 0x0] tcp outside 192.168.208.63 192.168.150.77 288 OUTSIDE Format of log event_id 106010 Format du log event_id 106010 Deny inbound protocol PROTOCOL src SINT:SIP dst DINT:DIP The protocol of translated connection. Le protocole de la connexion translatée. PROTOCOL lower_protocol Inbound interface Interface d'entrée SINT Source IP address Adresse IP source SIP Outbound interface Interface de sortie DINT Destination IP address Adresse IP de destination DIP Deny inbound protocol 47 src outside:192.168.0.1 dst outside:127.0.0.10 47 outside outside 192.168.0.1 127.0.0.10 Format of logs event_id 605005/605004 Format des logs event_id 605005/605004 Login ACTION from SIP/SPT to SINT:DIP/DPT for user "USER" Source IP address Adresse IP source SIP Inbound interface Interface d'entrée SINT Destination IP address Adresse IP de destination DIP Source port Port source SPT Destination port Port de destination DPT User related to this event Utilisateur en rapport avec cet événement USER Action taken by the device Action prise par l'équipement ACTION lower_action Login permitted from 192.168.202.51/3507 to inside:192.168.2.20/ssh for user "admin" inside 3507 192.168.2.20 admin permitted Format of logs event_id 113004/113005 Format des logs event_id 113004/113005 AAA user AAATYPE ACTION : (?:reason = [^:]+: )?server = DIP : user = USER AAA type AAA type AAATYPE lower_type Destination IP address Adresse IP de destination DIP User related to this event Utilisateur en rapport avec cet événement USER Action taken by the device Action prise par l'équipement ACTION lower_action AAA user authentication Successful : server = 10.1.206.27 : user = userx 10.1.206.27 userx authentication successful AAA user authentication Rejected : reason = AAA failure : server = 10.10.1.2 : user = vpn_user 10.10.1.2 vpn_user authentication rejected Format of logs event_id 109005/109006/109007/109008 Format des logs event_id 109005/109006/109007/109008 AAATYPE ACTION for user 'USER' from SIP/SPT to DIP/DPT on interface SINT AAA type AAA type AAATYPE lower_type Action taken by the device Action prise par l'équipement ACTION lower_action User related to this event Utilisateur en rapport avec cet événement USER Source IP address Adresse IP source SIP Source port Port source SPT Destination IP address Adresse IP de destination DIP Destination port Port de destination DPT Inbound interface Interface d'entrée SINT Authentication succeeded for user 'userjane' from 172.28.4.41/0 to 10.1.1.10/24 on interface outside 10.1.1.10 24 userjane authentication succeeded outside Authorization denied for user 'user1' from 192.168.208.63/57315 to 192.168.134.21/21 on interface outside 192.168.134.21 21 57315 user1 authorization denied outside Format of logs event_id 611101/611102 Format des logs event_id 611101/611102 User authentication ACTION: Uname: USER Action taken by the device Action prise par l'équipement ACTION lower_action User related to this event Utilisateur en rapport avec cet événement USER User authentication succeeded: Uname: alex alex succeeded Generic pattern matching logs like 109024/109025/201010/109023/... Règle générique pour les logs de type 109024/109025/201010/109023/... .+ from SIP/SPT to DIP/DPT (?:\(.+\) )?on interface DINT(?:(?: using PROTOCOL)|.+|$) Source IP address Adresse IP source SIP Source port Port source SPT Destination IP address Adresse IP de destination DIP Destination port Port de destination DPT Inbound interface Interface d'entrée DINT The protocol of translated connection. Le protocole de la connexion translatée. PROTOCOL lower_protocol User related to this event Utilisateur en rapport avec cet événement USER Authorization denied from 111.111.111.111/12345 to 222.222.222.222/12345 (not authenticated) on interface inside using https inside https Authorization denied (acl=RS1) for user 'username' from 10.10.10.9/137 to 10.10.10.255/137 on interface outside using UDP outside udp 10.10.10.9 137 User from 192.168.5.2/56985 to 192.168.100.2/80 on interface outside must authenticate before using this service outside 80 Generic pattern matching logs like 108003/410002/324005/421007/500005/109028/608002/... Règle générique pour les logs de type 108003/410002/324005/421007/500005/109028/608002/... .+ (?:from|for) (?:SINT:)?SIP/SPT to (?:DINT:)?DIP/DPT.* Source IP address Adresse IP source SIP Source port Port source SPT Destination IP address Adresse IP de destination DIP Destination port Port de destination DPT Inbound interface Interface d'entrée SINT Outbound interface Interface de sortie DINT Dropped 189 DNS responses with mis-matched id in the past 10 second(s): from outside:192.0.2.2/3917 to inside:192.168.60.1/53 outside inside Generic pattern trying to match a user id in a log Règle essayant de trouver un identifiant dans un log .+(?:(?:[U|u]ser =)|(?:Uname:)|(?:Username =)) USER.* User related to this event Utilisateur en rapport avec cet événement USER [aaa protocol] Unable to decipher response message Server = 10.10.3.2, User = fbo fbo pylogsparser-0.4/normalizers/GeoIPsource.xml0000644000175000017500000000724411645625573017505 0ustar fbofbo This filter evaluates the country of origin associated to the source_ip tag. Ce filtre détermine le pays d'origine associé à la valeur du tag source_ip. mhu@wallix.com country = country_code_by_address(value) if country: log['source_country'] = country This pattern simply checks the source_ip tag. Ce motif se contente d'analyser le tag source_ip. IP IP decodeCountryOfOrigin 8.8.8.8 US 77.207.23.14 FR pylogsparser-0.4/normalizers/xferlog.xml0000644000175000017500000004314211710220641016740 0ustar fbofbo This normalizer handles FTP logs in the xferlog format. This format is supported by a wide range of FTP servers like Wu-Ftpd, VSFTPd, ProFTPD or standard BSD ftpd. The "program" tag is therefore set to the generic value "ftpd". Ce normaliseur traite les logs au format xferlog. Le format xferlog est utilisé pour consigner les événements par de nombreux serveurs FTP, tels que Wu-Ftpd, ProFTPD ou la version BSD de ftpd. La métadonnée "program" reçoit de fait la valeur générique "ftpd". clo@wallix.com Expression matching a date in the DDD MMM dd hh:mm:ss YYYY format. [A-Z]{1}[a-z]{2} [A-Z]{1}[a-z]{2} \d{1,2} \d{2}:\d{2}:\d{2} \d{4} Expression matching a vsftpd field (any non-whitespace character). \S+ Expression matching a vsftpd field more accurately than the 'vsftpd field' tagType. Possible values are a or b, see the description of the tag [with the same name] for details. a|b Expression matching a vsftpd field more accurately than the 'vsftpd field' tagType. Possible values are _, C, U or T, see the description of the tag [with the same name] for details. _|C|T|U Expression matching a vsftpd field more accurately than the 'vsftpd field' tagType. Possible values are o or i, see the description of the tag [with the same name] for details. o|i Expression matching a vsftpd field more accurately than the 'vsftpd field' tagType. Possible values are a, g or r, see the description of the tag [with the same name] for details. a|g|r Expression matching a vsftpd field more accurately than the 'vsftpd field' tagType. Possible values are 0 or 1, see the description of the tag [with the same name] for details. 0|1 Expression matching a vsftpd field more accurately than the 'vsftpd field' tagType. Possible values are c or i, see the description of the tag [with the same name] for details. c|i decoder = {'a' : 'ascii', 'b' : 'binary'} log['transfer_type'] = decoder.get(value, 'UNKNOWN') decoder = {'C' : 'compressed', 'U' : 'uncompressed', 'T' : "tar'ed", "_" : "none"} log['special_action'] = decoder.get(value, 'UNKNOWN') decoder = {'o' : 'outgoing', 'i' : 'ingoing'} log['direction'] = decoder.get(value, 'UNKNOWN') decoder = {'a' : 'anonymous', 'g' : 'guest', 'r' : 'real'} log['access_mode'] = decoder.get(value, 'UNKNOWN') decoder = {'0' : 'none', '1' : 'RFC931'} log['authentication_method'] = decoder.get(value, 'UNKNOWN') decoder = {'c' : 'complete', 'i' : 'incomplete'} log['completion_status'] = decoder.get(value, 'UNKNOWN') DATE\s+TSF_TIME\s+RMT_HOST\s+BYT_COUNT\s+FILENAME\s+TSF_TYPE\s+SPE_ACT_FLAG\s+DIR\s+ACC_MODE\s+USERNAME\s+SVC_NAME\s+AUTH_METHOD\s+AUTHENTICATED_USER_ID\s+COMPLETION_STATUS The current local time in the form "DDD MMM dd hh:mm:ss YYYY", where DDD is the day of the week, MMM is the month, dd is the day of the month, hh is the hour, mm is the min-utes, ss is the seconds, and YYYY is the year. DATE DDD MMM dd hh:mm:ss YYYY The total time of the transfer in seconds. TSF_TIME The remote host name. RMT_HOST The amount of transferred bytes. BYT_COUNT The canonicalized (all symbolic links are resolved) abso-lute pathname of the transferred file. In case of the chrooted FTP session this field can be interpreted as the pathname in the chrooted environment(the default interpretation) or as the one in the realfile system. The second type of interpretation can be enabled by the command-line options of the ftpd. FILENAME he single character that indicates the type of the trans-fer. The set of possible values is: 'a' (an ascii transfer) or 'b' (a binary transfer). TSF_TYPE decode_transfer_type One or more single character flags indicating any special action taken. The set of possible values is: '_'(no action was taken), 'C'(the file was compressed [not in use]), 'U'(the file was uncompressed [not in use]) or 'T'(the file was tar'ed [not in use]). SPE_ACT_FLAG decode_special_action_flag The direction of the transfer. The set of possible values is: 'o'(the outgoing transfer) or 'i'(the incoming transfer) DIR decode_direction The method by which the user is logged in. The set of possible values is: 'a'[anonymous](the anonymous guest user), 'g'[guest](the real but chrooted user [this capability is guided by ftpchroot(5) file]) or 'r'[real](the real user). ACC_MODE decode_access_mode The user's login name in case of the real user, or the user's identification string in case of the anonymous user (by convention it is an email address of the user). USERNAME The name of the service being invoked. The ftpd (utility uses the 'ftp' keyword). SVC_NAME The used method of the authentication. The set of possible values is: '0' None or '1' RFC931 Authentication (not in use). AUTH_METHOD decode_authentication_method The user id returned by the authentication method. The '*' symbol is used if an authenticated user id is not available. AUTHENTICATED_USER_ID The single character that indicates the status of the transfer. The set of possible values is: 'c' a complete transfer or 'i' an incomplete transfer. COMPLETION_STATUS decode_completion_status Thu Mar 4 08:12:30 2004 1 202.114.40.242 37 /incoming/index.html a _ o a guest@my.net ftp 0 * c 1 202.114.40.242 37 /incoming/index.html a ascii _ none o outgoing a anonymous guest@my.net ftp 0 none * complete c ftpd file transfer ftpd pylogsparser-0.4/normalizers/MSExchange2007MessageTracking.xml0000644000175000017500000004037411705765631022602 0ustar fbofbo This parser defines how to normalize specific MS Exchange flat files based on observed behavior of MS Exchange 2007 (trial version); while it would have to be confirmed that it is consistent with other versions, it is likely that it won't cause any trouble. This parser describes the format of Exchange 2007's Message Tracking Log (something similar to Postfix logs), a CSV-like flat file that can be found at C:\Program Files\Microsoft\Exchange Server\TransportRoles\Logs\MessageTracking on a standard install. Ce normaliseur analyse certains fichiers de logs générés par MS Exchange 2007. Bien que ce normaliseur ait été écrit par rétro-analyse du comportement d'une version d'évaluation, il devrait être adapté aux versions complètes d'Exchange. Ce normaliseur décrit le format du "Message Tracking Log" (un journal d'événements similaire à celui d'un serveur Postfix), un fichier plat de type CSV qui se trouve à l'emplacement suivant dans une installation standard : C:\Program Files\Microsoft\Exchange Server\TransportRoles\Logs\MessageTracking . mhu@wallix.com The log's specific dateformat Le format d'horodatage spécifique à ce type de log \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[.]\d{3}Z the message ID l'identifiant de message <.+@.+> the source context le contexte de la source "?(?:[^,]+, )*[^,]+"? log['message_id'] = value[1:-1] if value.startswith('"'): value = value[1:-1] d = dict( [ u.split(':', 1) for u in value.split(', ') ] ) # convert camelCase fields into underscore names r = re.compile('[A-Z][a-z0-9]+') rdate = re.compile(""" (?P<year>\d{4})- (?P<month>\d{2})- (?P<day>\d{2}) T(?P<hour>\d{2}): (?P<minute>\d{2}): (?P<second>\d{2})\. (?P<microsecond>\d{1,3})?Z""", re.VERBOSE) for name, v in d.items(): new_name = name words = r.findall(name) if words: new_name = '_'.join(words) new_value = v if rdate.match(v): m = rdate.match(v).groupdict() m.setdefault('microsecond', 0) m = dict( [ (u, int(v)) for u,v in m.items() ] ) m['microsecond'] = m['microsecond'] * 1000 new_value = datetime( **m ).ctime() del d[name] d[new_name.lower()] = new_value log.update(d) The Message Tracking Log Format as described in the first line of the log file Le format du Message Tracking Log, tel qu'il apparaît en première ligne du journal d'événements DATE,CLIENT_IP,CLIENT_HOSTNAME,SERVER_IP,SERVER_HOSTNAME,CONTEXT_EXCHANGE_SOURCE,CONNECTOR_ID,EXCHANGE_SOURCE,EVENT_ID,INTERNAL_MESSAGE_ID,MESSAGE_ID,RECIPIENT_ADDRESS,RECIPIENT_STATUS,TOTAL_BYTES,RECIPIENT_COUNT,RELATED_RECIPIENT_ADDRESS,REFERENCE,MESSAGE_SUBJECT,SENDER_ADDRESS,RETURN_PATH,MESSAGE_INFO the log's timestamp l'horodatage de l'événement DATE ISO8601 the client's IP address l'adresse IP du client CLIENT_IP the client's hostname le nom d'hôte du client CLIENT_HOSTNAME the server's IP address l'adresse IP du serveur SERVER_IP the server's hostname le nom d'hôte du serveur SERVER_HOSTNAME the source context le contexte de la source CONTEXT_EXCHANGE_SOURCE decode_MTLSourceContext the connector ID l'identifiant du connecteur CONNECTOR_ID EXCHANGE_SOURCE the event ID l'identifiant d'événement EVENT_ID the internal message ID l'identifiant interne du message INTERNAL_MESSAGE_ID the message ID l'identifiant du message MESSAGE_ID decode_MTLMessageID the recipient's address l'adresse du destinataire RECIPIENT_ADDRESS the recipient's status le statut du destinataire RECIPIENT_STATUS total bytes in the transaction le nombre de bits total pour la transaction TOTAL_BYTES RECIPIENT_COUNT RELATED_RECIPIENT_ADDRESS REFERENCE the message's subject le sujet du message MESSAGE_SUBJECT the sender's address l'adresse de l'expéditeur SENDER_ADDRESS the return path l'adresse de retour RETURN_PATH some additional information about the message de l'information supplémentaire sur le message MESSAGE_INFO MS Exchange 2007 Message Tracking 2010-04-19T12:29:07.390Z,10.10.14.73,WIN2K3DC,,WIN2K3DC,"MDB:ada3d2c3-6f32-45db-b1ee-a68dbcc86664, Mailbox:68cf09c1-1344-4639-b013-3c6f8a588504, Event:1440, MessageClass:IPM.Note, CreationTime:2010-04-19T12:28:51.312Z, ClientType:User",,STOREDRIVER,SUBMIT,,<C6539E897AEDFA469FE34D029FB708D43495@win2k3dc.qa.ifr.lan>,,,,,,,Coucou !,user7@qa.ifr.lan,, MS Exchange 2007 Message Tracking 10.10.14.73 WIN2K3DC WIN2K3DC 68cf09c1-1344-4639-b013-3c6f8a588504 User STOREDRIVER SUBMIT C6539E897AEDFA469FE34D029FB708D43495@win2k3dc.qa.ifr.lan Coucou ! user7@qa.ifr.lan mail pylogsparser-0.4/normalizers/common_tagTypes.xml0000644000175000017500000001244311710220606020443 0ustar fbofbo ]> Matches everything and anything. Chaîne de caractères de longueur arbitraire. .* Matches a variable-length integer. Entier positif. \d+ Matches an EPOCH timestamp or a positive decimal number. Horodatage au format EPOCH, ou nombre décimal positif. \d+(?:.\d*)? Expression matching syslog dates. Date au format syslog. [A-Z][a-z]{2} [ 0-9]\d \d{2}:\d{2}:\d{2} Matches an URL. Correspond à une URL (http/https). http[s]?://[^ "'*]+ Matches a MAC address. Correspond à une adresse MAC. [0-9a-fA-F]{2}:(?:[0-9a-fA-F]{2}:){4}[0-9a-fA-F]{2} Matches an E-mail address. Correspond à une adresse e-mail. [a-zA-Z0-9+_\-\.]+@[0-9a-zA-Z][.-0-9a-zA-Z]*.[a-zA-Z]+ Matches a numeric IP. Correspond à une adresse IP numérique. (?<![.0-9])(?:\d{1,3}.){3}\d{1,3}(?![.0-9]) Matches a date written in Zulu Time Correspond à une date exprimée au format "Zulu" ou UTC. \d{4}-\d{2}-\d{2}(?:T\d{1,2}:\d{2}(?::\d{2}(?:[.]\d{1,5})?)?)? pylogsparser-0.4/normalizers/arkoonFAST360.xml0000644000175000017500000004545411710225177017513 0ustar fbofbo fbo@wallix.com .*$ \d+ # define facilities FACILITIES = { 0: "kernel", 1: "user", 2: "mail", 3: "daemon", 4: "auth", 5: "syslog", 6: "print", 7: "news", 8: "uucp", 9: "ntp", 10: "secure", 11: "ftp", 12: "ntp", 13: "audit", 14: "alert", 15: "ntp" } for i in range(0, 8): FACILITIES[i+16] = "local%d" % i # define severities SEVERITIES = { 0: "emerg", 1: "alert", 2: "crit", 3: "error", 4: "warn", 5: "notice", 6: "info", 7: "debug" } facility = int(value) / 8 severity = int(value) % 8 if facility not in FACILITIES or severity not in SEVERITIES: raise ValueError('facility or severity is out of range') log["facility"] = "%s" % FACILITIES[facility] log["severity"] = "%s" % SEVERITIES[severity] log["facility_code"] = "%d" % facility log["severity_code"] = "%d" % severity # Key that must be found in log mandatory_keys = ('id', 'time', 'gmtime', 'fw', 'aktype', ) key_modifiers = {'pri' : 'priority', 'op' : 'method', 'aktype': 'event_id', 'src': 'source_ip', 'dst': 'dest_ip', 'port_src': 'source_port', 'port_dest': 'dest_port', 'dstname': 'dest_host', 'intf_in': 'inbound_int', 'intf_out': 'outbound_int'} def extract_fw(data): ip_re = re.compile("(?<![.0-9])((?:[0-9]{1,3}[.]){3}[0-9]{1,3})(?![.0-9])") if ip_re.match(data['fw']): data['local_ip'] = data['fw'] else: data['local_host'] = data['fw'] def extract_protocol(data): if 'proto' in data.keys(): if data['proto'].find('/') > 0: nump, protocol = data['proto'].split('/') else: protocol = data['proto'] data['protocol'] = protocol del data['proto'] def quote_stripper(data): for k in data.keys(): data[k] = data[k].strip('"') def extract_date(data): data['date'] = datetime.utcfromtimestamp(float(data['gmtime'])) for key in ('time', 'gmtime'): del data[key] def alert_description_modify(data): messages = [ re.compile("TCP from (?P<source_ip>.+):(?P<source_port>.+) to (?P<dest_ip>.+):(?P<dest_port>.+)\s+\[(?P<description>.*)\]"), re.compile("UDP from (?P<source_ip>.+):(?P<source_port>.+) to (?P<dest_ip>.+):(?P<dest_port>.+)\s+\[(?P<description>.*)\]"), re.compile('ICMP:(?P<dest_port>.+)\.(?P<source_port>.+) from (?P<source_ip>.+) to (?P<dest_ip>.+) \[(?P<description>.*)\]'), re.compile('PROTO:(?P<protocol>.+) from (?P<source_ip>.+) to (?P<dest_ip>.+) \[(?P<description>.*)\]'), re.compile('Unsequenced packet on non-TCP proto from (?P<source_ip>.+):(?P<source_port>.+)'), re.compile('Unsequenced TCP packet from (?P<source_ip>.+):(?P<source_port>.+) to (?P<dest_ip>.+):(?P<dest_port>.+)'), re.compile('ACK unsequenced packet on non-TCP proto from (?P<source_ip>.+):(?P<source_port>.+)'), re.compile('ACK unsequenced TCP packet from (?P<source_ip>.+):(?P<source_port>.+) to (?P<dest_ip>.+):(?P<dest_port>.+)'), re.compile('Bad flags on non-TCP proto from (?P<source_ip>.+):(?P<source_port>.+)'), re.compile('Bad TCP flags (?P<flags>.+) from (?P<source_ip>.+):(?P<source_port>.+) to (?P<dest_ip>.+):(?P<dest_port>.+)'), re.compile('Bad packet from (?P<source_ip>.+):(?P<source_port>.+) to (?P<dest_ip>.+):(?P<dest_port>.+) \[(?P<description>.*)\]'), re.compile('Land attack from (?P<source_ip>.+) to (?P<dest_ip>.+)'), re.compile('New value: (?P<source_ip>.+)/(?P<network_mask>.+) \[(?P<ports>.+)\]'), ] if 'alert_desc' in data.keys(): for m in messages: values = m.match(data['alert_desc']) if values: data.update(values.groupdict()) def profile_modifier(data): profiles = { 1: 'FTP_BADFILES', 2: 'FTP_SCAN', 3: 'FTP', 4: 'HTTP', 5: 'HTTP_BADURL', 6: 'HTTP_COLDFUSION', 7: 'HTTP_FRONTPAGE', 8: 'HTTP_IIS', 9: 'HTTP_PHP', 10: 'HTTP_NETSCAPE', 11: 'HTTP_TOMCAT', 12: 'HTTP_APACHE', 13: 'HTTP_WINDOWS', 14: 'HTTP_ORACLE', 15: 'HTTP_TALENTSOFT', 16: 'HTTP_LOTUS', 17: 'HTTP_UNIX', 18: 'HTTP_CISCO', 19: 'HTTP_WEBLOGIC', 20: 'HTTP_MYSQL', 21: 'HTTP_MACOS', 22: 'HTTP_VIRUSWALL', 23: 'SMTP', 24: 'IMAP4', 25: 'POP3', 26: 'DNS'} if 'profile' in data.keys(): data['profile'] = profiles.get(int(data['profile']), data['profile']) def reason_modify(data): messages = [ re.compile('Virus (?P<virus_name>.+) found in (?P<file_name>.+)'), re.compile('File (?P<file_name>.+) encrypted'), re.compile('File (?P<file_name>.+): analyze error'), re.compile('Denied by rule (?P<rule_name>.+)'), re.compile('Denied by rule (?P<rule_name>.+), put mail in quarantine') ] if 'reason' in data.keys(): for m in messages: values = m.match(data['reason']) if values: data.update(values.groupdict()) kvre = '(?P<key>[A-Za-z_\-]{2,})=(?P<val>[^" ]+|"[^"]*")' reg = re.compile(kvre) data = reg.findall(value) data = dict(data) # Verify it is the expected log if not set(mandatory_keys).issubset(set(data.keys())): return log if data['id'] != 'firewall': return log # Remove quoted values quote_stripper(data) # Set tag body data['body'] = value # Add a date field from gmtime field extract_date(data) # Extract useful fields from alert description alert_description_modify(data) # Convert IDPS profile profile_modifier(data) # SMTP reason modifier reason_modify(data) # Apply keys modifiers for k, v in key_modifiers.items(): if k in data.keys(): s_val = data[k] del data[k] data[v] = s_val # Process fw tag extract_fw(data) # Process proto field extract_protocol(data) # Remove tag with empty value for k, v in data.items(): if not v: del data[k] # Convert tag name with hyphen to underscore for k, v in data.items(): if k.find('-') > -1: del data[k] k = k.replace('-', '_') data[k] = v log['program'] = 'arkoon' log.update(data) (?:<PRIORITY>[^\s]+:\s)?AKLOG\s*-\s*KEYVALUES KEYVALUES extractAKkv PRIORITY decode_priority AKLOG - id=firewall time="2004-02-25 17:38:51" pri=4 fw=myArkoon aktype=ALERT gmtime=1077727131 alert_type="Blocked by application control" user="userName" alert_level="Low" alert_desc="TCP from 10.10.192.61:33027 to 10.10.192.156:25 [default rule]" arkoon ALERT 4 myArkoon userName Low 10.10.192.156 25 10.10.192.61 33027 default rule Low id=firewall time="2004-02-25 17:38:51" pri=4 fw=myArkoon aktype=ALERT gmtime=1077727131 alert_type="Blocked by application control" user="userName" alert_level="Low" alert_desc="TCP from 10.10.192.61:33027 to 10.10.192.156:25 [default rule]" firewall AKLOG-id=firewall time="2004-02-25 17:38:57" fw=myArkoon aktype=IP gmtime=1077727137 ip_log_type=ENDCONN src=10.10.192.61 dst=10.10.192.255 proto="137/udp" protocol=17 port_src=137 port_dest=137 intf_in=eth0 intf_out= pkt_len=78 nat=NO snat_addr=0 snat_port=0 dnat_addr=0 dnat_port=0 user="userName" pri=3 rule="myRule" action=DENY reason="Blocked by filter" description="dst addr received from Internet is private" arkoon IP 3 myArkoon userName 10.10.192.255 10.10.192.61 Blocked by filter ENDCONN eth0 udp 2004-02-25 16:38:57 id=firewall time="2004-02-25 17:38:57" fw=myArkoon aktype=IP gmtime=1077727137 ip_log_type=ENDCONN src=10.10.192.61 dst=10.10.192.255 proto="137/udp" protocol=17 port_src=137 port_dest=137 intf_in=eth0 intf_out= pkt_len=78 nat=NO snat_addr=0 snat_port=0 dnat_addr=0 dnat_port=0 user="userName" pri=3 rule="myRule" action=DENY reason="Blocked by filter" description="dst addr received from Internet is private" firewall AKLOG-id=firewall time="2004-02-25 17:38:57" fw=myArkoon aktype=IDPSMATCH gmtime=1077727137 src=10.10.192.61 dst=10.10.192.255 proto="137/udp" protocol=17 port_src=137 port_dest=137 profile=1 sid=123 score=50 arkoon IDPSMATCH myArkoon 10.10.192.255 10.10.192.61 50 FTP_BADFILES udp 2004-02-25 16:38:57 id=firewall time="2004-02-25 17:38:57" fw=myArkoon aktype=IDPSMATCH gmtime=1077727137 src=10.10.192.61 dst=10.10.192.255 proto="137/udp" protocol=17 port_src=137 port_dest=137 profile=1 sid=123 score=50 firewall AKLOG-id=firewall time="2004-02-25 17:38:57" fw=myArkoon aktype=IDPSALERT gmtime=1077727137 src=10.10.192.61 dst=10.10.192.255 proto="137/udp" protocol=17 port_src=137 port_dest=137 profile=1 endcnx_score=100 ch=1 reaction=0 arkoon IDPSALERT myArkoon 10.10.192.255 10.10.192.61 137 137 1 FTP_BADFILES udp 2004-02-25 16:38:57 id=firewall time="2004-02-25 17:38:57" fw=myArkoon aktype=IDPSALERT gmtime=1077727137 src=10.10.192.61 dst=10.10.192.255 proto="137/udp" protocol=17 port_src=137 port_dest=137 profile=1 endcnx_score=100 ch=1 reaction=0 firewall AKLOG-id=firewall time="2004-02-25 17:42:54" fw=myArkoon pri=6 aktype=HTTP gmtime=1077727374 src=10.10.192.61 proto=http user="userName" op="GET" dstname=www arg="http://www/ HTTP/1.1" ref="" agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020623 Debian/1.0.0-0.woody.1" rcvd=355 result=407 arkoon HTTP myArkoon 10.10.192.61 http://www/ HTTP/1.1 407 GET www http 2004-02-25 16:42:54 id=firewall time="2004-02-25 17:42:54" fw=myArkoon pri=6 aktype=HTTP gmtime=1077727374 src=10.10.192.61 proto=http user="userName" op="GET" dstname=www arg="http://www/ HTTP/1.1" ref="" agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020623 Debian/1.0.0-0.woody.1" rcvd=355 result=407 firewall <134>IP-Logs: AKLOG - id=firewall time="2010-10-04 10:38:37" gmtime=1286181517 fw=doberman.jurassic.ta aktype=IP ip_log_type=NEWCONN src=172.10.10.107 dst=204.13.8.181 proto="http" protocol=6 port_src=2619 port_dest=80 intf_in=eth7 intf_out=eth2 pkt_len=48 nat=HIDE snat_addr=10.10.10.199 snat_port=16176 dnat_addr=0 dnat_port=0 tcp_seq=1113958286 tcp_ack=0 tcp_flags="SYN" user="" vpn-src="" pri=6 rule="surf_normal" action=ACCEPT arkoon IP doberman.jurassic.ta http surf_normal ACCEPT id=firewall time="2010-10-04 10:38:37" gmtime=1286181517 fw=doberman.jurassic.ta aktype=IP ip_log_type=NEWCONN src=172.10.10.107 dst=204.13.8.181 proto="http" protocol=6 port_src=2619 port_dest=80 intf_in=eth7 intf_out=eth2 pkt_len=48 nat=HIDE snat_addr=10.10.10.199 snat_port=16176 dnat_addr=0 dnat_port=0 tcp_seq=1113958286 tcp_ack=0 tcp_flags="SYN" user="" vpn-src="" pri=6 rule="surf_normal" action=ACCEPT firewall pylogsparser-0.4/normalizers/sshd.xml0000644000175000017500000001267611705765631016264 0ustar fbofbo This normalizer can parse connection messages logged by a SSH server. Ce normaliseur analyse les événements de connexion à un serveur SSH. mhu@wallix.com matches the action logged for a connection correspond à l'action de connexion Failed|Accepted if value == "Failed": log['action'] = 'fail' else: log['action'] = 'accept' A generic sshd log line. Une notification standard de connexion à un serveur SSH. ACTION METHOD for(?: invalid user)? USER from IP port [0-9]+ ssh[0-9] the outcome of the connection attempt le résultat de la tentative de connexion ACTION decode_action the connection method (password or key) la méthode de connexion utilisée (mot de passe ou clé asymétrique) METHOD the user requesting the connection l'utilisateur à l'origine de la connexion USER the inbound connection's IP address l'IP entrante de la connexion IP Failed password for admin from 218.49.183.17 port 49468 ssh2 fail password admin 218.49.183.17 access control pylogsparser-0.4/normalizers/apache.xml0000644000175000017500000003400611705765631016533 0ustar fbofbo Apache normalizer. This parser supports log formats defined in apache's documentation, see http://httpd.apache.org/docs/current/logs.html . Ce normaliseur analyse les logs émis par les serveurs web Apache. Seuls les formats décrits dans la documentation Apache sont supportés en standard : cf http://httpd.apache.org/docs/current/logs.html . mhu@wallix.com Matches apache's common time format. Une expression correspondant au format d'horodatage par défaut d'Apache. \[\d{1,2}/.{3}/\d{4}:\d{1,2}:\d{1,2}:\d{1,2}(?: [+-]\d{4})?\] IP address or None. Une adresse IP, ou un champ vide. (?:(?:\d{1,3}\.){3}\d{1,3})|- Integer or float, or None. Une valeur numérique entière ou décimale, ou un champ vide. [\d.,]+|- DN, user name ... Un "mot", ou un champ vide. [\w.-]+|- try: path = value.split(' ')[1].split('?')[0] log['url_path'] = path log['method'] = value.split(' ')[0] except: pass Common Log Format. Structure des logs selon le schéma "Common Log Format". %h %l %u %t "%r" %>s %b$ the remote host initiating the request l'hôte distant à l'initiative de la requête %h the remote logname used to initiate the request l'identifiant distant à l'initiative de la requête %l the remote user initiating the request l'utilisateur distant à l'initiative de la requête %u the time at which the request was issued - please note that the timezone information is not carried over la date à laquelle la requête a été émise. Veuillez noter que l'information de fuseau horaire n'est pas prise en compte %t dd/MMM/YYYY:hh:mm:ss the first line of the request la première ligne de la requête %r decode_url_path the final status code for the request le code de statut final pour la requête %>s the size of the response in bytes, including HTTP headers la taille de la réponse émise en octets, en-têtes HTTP inclus %b 127.0.0.1 - - [20/Jul/2009:00:29:39 +0300] "GET /index/helper/test HTTP/1.1" 200 889 127.0.0.1 - - GET /index/helper/test HTTP/1.1 200 889 apache /index/helper/test GET web server "Combined" Log Format. Structure des logs selon le schéma "Combined". %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"$ the remote host initiating the request l'hôte distant à l'initiative de la requête %h the remote logname used to initiate the request l'identifiant distant à l'initiative de la requête %l the remote user initiating the request l'utilisateur distant à l'initiative de la requête %u the time at which the request was issued - please note that the timezone information is not carried over la date à laquelle la requête a été émise. Veuillez noter que l'information de fuseau horaire n'est pas prise en compte %t dd/MMM/YYYY:hh:mm:ss the first line of the request la première ligne de la requête %r decode_url_path the final status code for the request le code de statut final pour la requête %>s the size of the response in bytes, including HTTP headers la taille de la réponse émise en octets, en-têtes HTTP inclus %b the contents of the "Referer" request header le contenu de l'en-tête "Referer" de la requête %{Referer}i the contents of the "User-agent" request header le contenu de l'en-tête "User-agent" de la requête %{User-agent}i 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)" 127.0.0.1 - frank GET /apache_pb.gif HTTP/1.0 200 2326 apache http://www.example.com/start.html Mozilla/4.08 [en] (Win98; I ;Nav) /apache_pb.gif GET web server apache pylogsparser-0.4/normalizers/wabauth.xml0000644000175000017500000002670711705765631016756 0ustar fbofbo This normalizer is used to parse Wallix Admin Bastion authentication logs. nba@wallix.com primary_authentication primary_authentication session opened session closed A user name as defined in Wallix Admin Bastion [^: ]+ an IP [^: ]+ A arbitrary string [^: ]* an even raised when a user is trying to authenticate itself on the WAB type='PRIMARY_AUTHENTICATION' timestamp='[^']+' username='USERNAME' client_ip='CLIENT_IP' diagnostic='DIAG' The even type PRIMARY_AUTHENTICATION The user being used. USERNAME The ip of the client being connected. CLIENT_IP Connexion attempt result. DIAG type='primary_authentication' timestamp='2011-12-20 16:21:50.427830' username='admin' client_ip='10.10.4.25' diagnostic='SUCCESS' primary_authentication admin 10.10.4.25 SUCCESS access control type='SESSION_OPENED' username='USERNAME' secondary='ACCOUNT@RESOURCE' client_ip='CLIENT_IP' src_protocol='SOURCE_PROTO' dst_protocol='DEST_PROTO' message='MESSAGE' The even type SESSION_OPENED The user being used. USERNAME The target account used. ACCOUNT The target/resource accessed RESOURCE The ip of the client being connected. CLIENT_IP The protocol used by the client to connect to the wab SOURCE_PROTO the protocol used by the WAB to connect to the target/resource. DEST_PROTO Other comment. MESSAGE type='session opened' username='admin' secondary='root@debian32' client_ip='10.10.4.25' src_protocol='SFTP_SESSION' dst_protocol='SFTP_SESSION' message='' root 10.10.4.25 SFTP_SESSION debian32 SFTP_SESSION session opened admin access control an even raised when a user is trying to authenticate itself on the WAB type='SESSION_CLOSED' username='USERNAME' secondary='ACCOUNT@RESOURCE' client_ip='CLIENT_IP' src_protocol='SOURCE_PROTO' dst_protocol='DEST_PROTO' message='MESSAGE' The even type SESSION_CLOSED The user being used. USERNAME The target account used. ACCOUNT The target/resource accessed RESOURCE The ip of the client being connected. CLIENT_IP The protocol used by the client to connect to the wab SOURCE_PROTO the protocol used by the WAB to connect to the target/resource. DEST_PROTO Other comment. MESSAGE type='session closed' username='admin' secondary='root@debian32' client_ip='10.10.4.25' src_protocol='SFTP_SESSION' dst_protocol='SFTP_SESSION' message='' root 10.10.4.25 SFTP_SESSION debian32 SFTP_SESSION session closed admin access control pylogsparser-0.4/normalizers/Fail2ban.xml0000644000175000017500000002231311705765631016726 0ustar fbofbo This normalizer can parse Fail2ban logs (version 0.8.4). Ce normaliseur traite les logs de l'applicatif Fail2ban (version 0.8.4). mhu@wallix.com \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} fail2ban \w+ (?:Ban)|(?:Unban) timestamp, milliseconds = value.split(',', 1) newdate = datetime(int(timestamp[:4]), int(timestamp[5:7]), int(timestamp[8:10]), int(timestamp[11:13]), int(timestamp[14:16]), int(timestamp[17:19])) log["date"] = newdate.replace(microsecond = int(milliseconds) * 1000 ) An information message about the application's general status. Un message informatif concernant le statut général de l'application. TIMESTAMP PROGRAM\.COMPONENT\s*: INFO\s+BODY TIMESTAMP decodeF2bTimeStamp the program, set to "fail2ban" le programme, cette métadonnée est toujours évaluée à "fail2ban" PROGRAM the program's component emitting the log le composant du programme à l'origine du message COMPONENT the body of the message le descriptif de l'événement BODY 2011-09-27 05:02:26,908 fail2ban.server : INFO Changed logging target to /var/log/fail2ban.log for Fail2ban v0.8.4 fail2ban server Changed logging target to /var/log/fail2ban.log for Fail2ban v0.8.4 TIMESTAMP PROGRAM\.COMPONENT\s*: WARNING\s+\[PROTOCOL\] ACTION SOURCE_IP TIMESTAMP decodeF2bTimeStamp the program, set to "fail2ban" le programme, cette métadonnée est toujours évaluée à "fail2ban" PROGRAM the program's component emitting the log le composant du programme à l'origine du message COMPONENT the protocol for which an action was taken le protocole pour lequel une action a été appliquée PROTOCOL the action taken : ban, or unban l'action appliquée : bannissement (ban) ou levée du bannissement (unban) ACTION the IP address for which the action was taken l'adresse IP à l'origine de l'action appliquée SOURCE_IP 2011-09-26 15:12:58,388 fail2ban.actions: WARNING [ssh] Ban 213.65.93.82 fail2ban actions ssh Ban 213.65.93.82 access control pylogsparser-0.4/normalizers/squid.xml0000644000175000017500000002201211705765631016431 0ustar fbofbo This normalizer parses messages issued by the Squid proxy server. Please note that only Squid's "native log format" is supported by this normalizer. Ce normaliseur analyse les messages émis par les proxys Squid. Seul le format "natif" des logs Squid est supporté par ce normaliseur. mhu@wallix.com single lexeme without inner spaces unité sémantique sans espace intersticiel [^ ]+ if value != "-": log["user"] = value This pattern parses Squid's native log format. Cette structure décrit le format "natif" des logs Squid. EPOCH +ELAPSED IP CODE/REQUESTSTATUS SIZE METHOD URL USER PEERSTATUS/PEERHOST MIMETYPE the log EPOCH timestamp l'horodatage du log au format EPOCH EPOCH EPOCH the user concerned by the request l'utilisateur concerné par la requête USER decode_user the elapsed time for the request le temps écoulé pour la requête ELAPSED the remote host's IP l'adresse IP de l'hôte distant IP the code returned by the proxy le code de la réponse émise par le proxy CODE the request's status le statut de la requête REQUESTSTATUS the size of the request's result la taille du résultat de la requête SIZE the request's method la méthode associée à la requête METHOD the requested URL l'URL requêtée URL the peer's status le statut du pair PEERSTATUS the peer's host l'hôte du pair PEERHOST the MIME type of the result of the request le type MIME du résultat de la requête MIMETYPE 1259844091.407 307 82.238.42.70 TCP_MISS/200 1015 GET http://www.ietf.org/css/ietf.css fbo DIRECT/64.170.98.32 text/css TCP_MISS 307 82.238.42.70 GET text/css 64.170.98.32 DIRECT 1015 200 http://www.ietf.org/css/ietf.css fbo web proxy squid pylogsparser-0.4/normalizers/deny_traffic.xml0000644000175000017500000003710711705765631017754 0ustar fbofbo clo@wallix.com [-a-z0-9]+ GET|OPTIONS|HEAD|POST|PUT|DELETE|TRACE|CONNECT HTTP/[0-9]+[.][0-9]+ \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}[+]\d{2} [^,]+ DENYALL_UID,DATE,LOCAL_IP,APP_ID,HOST_HEADER,REMOTE_IP,REMOTE_PORT,FORWARDED_FOR,VIA,REMOTE_USER,USER_AGENT,HTTPS_FLAGS,SSL_PROTOCOL,DN,CERTIFICATE_START,CERTIFICATE_END,HTTP_METHOD,URL_ADDRESS,URL_OPTIONS,HTTP_PROTOCOL_VERSION,HTTP_RESPONSE_CODE,RESPONSE_TIME,BYTES_SENT,BYTES_RECEIVED,REFERER,XCACHE,GZRATIO DENYALL_UID DATE YYYY-MM-DD hh:mm:ss LOCAL_IP APP_ID HOST_HEADER REMOTE_IP REMOTE_PORT FORWARDED_FOR VIA REMOTE_USER USER_AGENT HTTPS_FLAGS SSL_PROTOCOL DN CERTIFICATE_START CERTIFICATE_END HTTP_METHOD URL_ADDRESS URL_OPTIONS HTTP_PROTOCOL_VERSION HTTP_RESPONSE_CODE RESPONSE_TIME BYTES_SENT BYTES_RECEIVED REFERER XCACHE GZRATIO 1,2011-01-24 18:07:55+01,192.168.80.10,d74ca776-265b-11e0-a54a-000c298895c5,192.168.80.10,192.168.80.1,57548,,,,Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2.13) Gecko/20101203 AskTbTRL2/3.9.1.14019 Firefox/3.6.13,0,,,,,GET,/,,HTTP/1.1,200,215872,1625,409,,, 1 192.168.80.10 d74ca776-265b-11e0-a54a-000c298895c5 192.168.80.10 192.168.80.1 57548 Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2.13) Gecko/20101203 AskTbTRL2/3.9.1.14019 Firefox/3.6.13 0 GET / HTTP/1.1 200 215872 1625 409 web proxy pylogsparser-0.4/normalizers/netfilter.xml0000644000175000017500000001536211705765631017312 0ustar fbofbo Netfilter log normalization. Netfilter logs consist of a list of keys and values. Normalized keys are "in", "out", "mac", "src", "spt", "dst", "dpt", "len", "proto". Ce normaliseur analyse les logs émis par le composant kernel Netfilter. Les messages Netfilter consistent en une liste de clés et de valeurs associèes. Les clés extraites par ce normaliseur sont "in", "out", "mac", "src", "spt", "dst", "dpt", "len", "proto". fbo@wallix.com Some typical fields used for log identification. Quelques champs propres aux logs NETFILTER. IN=.* OUT=.* SRC=.* DST=.* ACCEPTED = [ "in", "out", "mac", "src", "spt", "dst", "dpt", "len", "proto" ] # Retreive elements separeted by space elms = value.split() candidates = [elm for elm in elms if not elm.find('=') == -1 and not elm.endswith('=')] kv_dict = dict([x.split('=') for x in candidates]) for k,v in kv_dict.items(): kl = k.lower() if kl in ACCEPTED: log[kl] = v TRANSLATE = {'in': 'inbound_int', 'out': 'outbound_int', 'src': 'source_ip', 'dst': 'dest_ip', 'proto': 'protocol', 'spt': 'source_port', 'dpt': 'dest_port'} for k, v in TRANSLATE.items(): if k in log.keys(): val = log[k] del log[k] log[v] = val if 'mac' in log.keys(): log['dest_mac'] = log['mac'][:17] log['source_mac'] = log['mac'][18:-6] del log['mac'] log['program'] = 'netfilter' kernel (?:USERPREFIX )?KEYVALUES a user defined log prefix un préfixe défini par l'utilisateur USERPREFIX Generic Netfilter message with many key-values couples Message Netfilter générique comportant plusieurs couples clé-valeur KEYVALUES decode_netfilter_key_value *UDP_IN Blocked* IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:15:5d:20:c2:06:08:00 SRC=69.10.39.115 DST=255.255.255.255 LEN=166 TOS=0x00 PREC=0x00 TTL=128 ID=22557 PROTO=UDP SPT=55439 DPT=6112 netfilter *UDP_IN Blocked* eth0 ff:ff:ff:ff:ff:ff 00:15:5d:20:c2:06 69.10.39.115 255.255.255.255 166 UDP 55439 6112 firewall pylogsparser-0.4/normalizers/symantec.xml0000644000175000017500000013435511705765631017145 0ustar fbofbo This normalizer parses messages issued by the Symantec Antivirus. Ce normaliser analyse les messages émis par Symantec Antivirus. fbo@wallix.com ^(?:[A-F0-9]{2}){6}$ [1-4] \d{1,3} [012]? date_params = [ int(value[2*i : 2*(i+1)], 16) for i in range(6) ] date_params[0] += 1970 date_params[1] += 1 log['date'] = datetime(*date_params) # the list below is compliant up to Symantec Endpoint Protection 11.0 events = [ '', 'GL_EVENT_IS_ALERT', 'GL_EVENT_SCAN_STOP', 'GL_EVENT_SCAN_START', 'GL_EVENT_PATTERN_UPDATE', 'GL_EVENT_INFECTION', 'GL_EVENT_FILE_NOT_OPEN', 'GL_EVENT_LOAD_PATTERN', 'GL_STD_MESSAGE_INFO NOT USED', 'GL_STD_MESSAGE_ERROR NOT USED', 'GL_EVENT_CHECKSUM', 'GL_EVENT_TRAP', 'GL_EVENT_CONFIG_CHANGE', 'GL_EVENT_SHUTDOWN', 'GL_EVENT_STARTUP', 'UNDOCUMENTED', 'GL_EVENT_PATTERN_DOWNLOAD', 'GL_EVENT_TOO_MANY_VIRUSES', 'GL_EVENT_FWD_TO_QSERVER', 'GL_EVENT_SCANDLVR', 'GL_EVENT_BACKUP', 'GL_EVENT_SCAN_ABORT', 'GL_EVENT_RTS_LOAD_ERROR', 'GL_EVENT_RTS_LOAD', 'GL_EVENT_RTS_UNLOAD', 'GL_EVENT_REMOVE_CLIENT', 'GL_EVENT_SCAN_DELAYED', 'GL_EVENT_SCAN_RESTART', 'GL_EVENT_ADD_SAVROAMCLIENT_TOSERVER', 'GL_EVENT_REMOVE_SAVROAMCLIENT_FROMSERVER', 'GL_EVENT_LICENSE_WARNING', 'GL_EVENT_LICENSE_ERROR', 'GL_EVENT_LICENSE_GRACE', 'GL_EVENT_UNAUTHORIZED_COMM', 'GL_EVENT_LOG_FWD_THRD_ERR', 'GL_EVENT_LICENSE_INSTALLED', 'GL_EVENT_LICENSE_ALLOCATED', 'GL_EVENT_LICENSE_OK', 'GL_EVENT_LICENSE_DEALLOCATED', 'GL_EVENT_BAD_DEFS_ROLLBACK', 'GL_EVENT_BAD_DEFS_UNPROTECTED', 'GL_EVENT_SAV_PROVIDER_PARSING_ERROR', 'GL_EVENT_RTS_ERROR', 'GL_EVENT_COMPLIANCE_FAIL', 'GL_EVENT_COMPLIANCE_SUCCESS', 'GL_EVENT_SECURITY_SYMPROTECT_POLICYVIOLATION', 'GL_EVENT_ANOMALY_START', 'GL_EVENT_DETECTION_ACTION_TAKEN', 'GL_EVENT_REMEDIATION_ACTION_PENDING', 'GL_EVENT_REMEDIATION_ACTION_FAILED', 'GL_EVENT_REMEDIATION_ACTION_SUCCESSFUL', 'GL_EVENT_ANOMALY_FINISH', 'GL_EVENT_COMMS_LOGIN_FAILED', 'GL_EVENT_COMMS_LOGIN_SUCCESS', 'GL_EVENT_COMMS_UNAUTHORIZED_COMM', 'GL_EVENT_CLIENT_INSTALL_AV', 'GL_EVENT_CLIENT_INSTALL_FW', 'GL_EVENT_CLIENT_UNINSTALL', 'GL_EVENT_CLIENT_UNINSTALL_ROLLBACK', 'GL_EVENT_COMMS_SERVER_GROUP_ROOT_CERT_ISSUE', 'GL_EVENT_COMMS_SERVER_CERT_ISSUE', 'GL_EVENT_COMMS_TRUSTED_ROOT_CHANGE', 'GL_EVENT_COMMS_SERVER_CERT_STARTUP_FAILED', 'GL_EVENT_CLIENT_CHECKIN', 'GL_EVENT_CLIENT_NO_CHECKIN', 'GL_EVENT_SCAN_SUSPENDED', 'GL_EVENT_SCAN_RESUMED', 'GL_EVENT_SCAN_DURATION_INSUFFICIENT', 'GL_EVENT_CLIENT_MOVE', 'GL_EVENT_SCAN_FAILED_ENHANCED', 'GL_EVENT_MAX_EVENT_NUMBER', 'GL_EVENT_HEUR_THREAT_NOW_WHITELISTED', 'GL_EVENT_INTERESTING_PROCESS_DETECTED_START', 'GL_EVENT_LOAD_ERROR_COH', 'GL_EVENT_LOAD_ERROR_SYKNAPPS', 'GL_EVENT_INTERESTING_PROCESS_DETECTED_FINISH', 'GL_EVENT_HPP_SCAN_NOT_SUPPORTED_FOR_OS', 'GL_EVENT_HEUR_THREAT_NOW_KNOWN', 'UNDOCUMENTED', 'UNDOCUMENTED', 'GL_EVENT_MAX_EVENT_NUMBER'] # not super sure about the last one ... ? The kb article seems pasted out of code log['event_id'] = events[int(value)] categories = ['','Infection', 'Summary', 'Pattern', 'Security'] log['category'] = categories[int(value)] loggers = { '0' : 'Scheduled', '1' : 'Manual', '2' : 'Real Time', '6' : 'Console', '7' : 'VPDOWN', '8' : 'System', '9' : 'Startup', '101' : 'Client', '102' : 'Forwarded', '65637' : 'Manual Scan', '131173' : 'Real Time', '524389' : 'System', '720997' : 'Defwatch', '6619237' : 'Client', } log['event_logger_type'] = loggers.get(value, value) actions = ['', 'Quarantine infected file', 'Rename infected file', 'Delete infected file', 'Leave alone (log only)', 'Clean virus from file', 'Clean or delete macros'] try: trans = actions[int(value)] except: trans = "Unknown action" log['primary_action_configuration'] = trans actions = ['', 'Quarantine infected file', 'Rename infected file', 'Delete infected file', 'Leave alone (log only)', 'Clean virus from file', 'Clean or delete macros'] try: trans = actions[int(value)] except: trans = "Unknown action" log['secondary_action_configuration'] = trans actions = ['', 'Quarantined', 'Renamed', 'Deleted', 'Left alone', 'Cleaned', 'Cleaned or macros deleted', 'Saved file as...', 'Sent to Intel (AMS)', 'Moved to backup location', 'Renamed backup file', 'Undo action in Quarantine View', 'Write protected or lack of permissions - Unable to act on file', 'Backed up file', 'Pending analysis', 'First action was partially successful; second action was Leave Alone. Results of the second action are not mentioned.', 'A process needs to be terminated to remove a risk', 'Prevent a risk from being loggged or a user interface from being displayed', 'Performing a request to restart the computer', 'Shows as Cleaned by Deletion in the Risk History in the UI and the Logs in the SSC', 'Auto-Protect prevented a file from being created; reported "Access denied."'] log['action'] = actions[int(value)] virus_index = hex(int(value))[2:] expanded_threat_index = 0 if len(virus_index) >= 2: expanded_threat_index = int(virus_index[-2], 16) virus_types = { '1' : 'VEBOOTVIRUS', '3' : 'VEBOOT1VIRUS', '5' : 'VEBOOT2VIRUS', '9' : 'VEBOOT3VIRUS', '100' : 'VEFILEVIRUS', '300' : 'VEMUTATIONVIRUS', '500' : 'VEFILEMACROVIRUS', '900' : 'VEFILE2VIRUS', '1100' : 'VEFIL3VIRUS', '10000' : 'VEMEMORYVIRUS', '30000' : 'VEMEMOSVIRUS', '50000' : 'VEMEMMCBVIRUS', '90000' : 'VEMEMHIGHESTVIRUS', '1000000' : 'VEVIRUSBEHAVIOR', '3000000' : 'VEVIRUS1BEHAVIOR', '8000000' : 'VEFILECOMPRESSED', '10000000' : 'VEHURISTIC', } expanded_threats = ['', ' + VE_NON_VIRAL_MALICIOUS', ' + VE_RESERVED_MALICIOUS', ' + VE_HEURISTIC', ' + VE_SECURITY_RISK_ON', ' + VE_HACKER_TOOLS', ' + VE_SPYWARE', ' + VE_TRACKWARE', ' + VE_DIALERS', ' + VE_REMOTE_ACCESS', ' + VE_ADWARE', ' + VE_JOKE_PROGRAMS', ' + VE_SECURITY_RISK_OFF', ' + UNDOCUMENTED', ' + UNDOCUMENTED', ' + UNDOCUMENTED',] res = virus_types.get(virus_index, "UNDOCUMENTED") +\ expanded_threats[expanded_threat_index] log['virus_type'] = res flags = { '4194304': 'EB_ACCESS_DENIED', '268435456': 'EB_NO_LOG', '536870912': 'EB_FROM_CLIENT', '134217728': 'EB_LAST_ITEM', '16777216': 'EB_LOG', '33554432': 'EB_REAL_CLIENT', '4095': 'EB_FA_OVERLAYS', '4190208': 'EB_N_OVERLAYS', '67108864': 'EB_FIRST_ITEM', '8388608': 'EB_REPORT'} log['eventblock_action'] = flags.get(value, "UNDOCUMENTED") status = {'0' : 'QF_NONE', '1' : 'QF_FAILED', '2' : 'QF_OK'} log['quarantine_attempt_status'] = status.get(value) def bits(x): # helper function to decrypt the mask if x == 0: return () else: top_pow = int(math.log(x, 2)) return (top_pow,) + bits(x - 2**top_pow) flags = ['FA_READ', 'FA_WRITE', 'FA_EXEC', 'FA_IN_TABLE', 'FA_REJECT_ACTION', 'FA_ACTION_COMPLETE', 'FA_DELETE_WHEN_COMPLETE', 'FA_CLIENT_REQUEST', 'FA_OWNED_BY_USER', 'FA_DELETE', 'FA_OWNED_BY_QUEUE', 'FA_FILE_IN_CACHE', 'FA_SCAN', 'FA_GET_TRAP_DATA', 'FA_USE_TRAP_DATA', 'FA_FILE_NEEDS_SCAN', 'FA_BEFORE_OPEN', 'FA_AFTER_OPEN', 'FA_SCAN_BOOT_SECTOR', 'FA_COMING_FROM_NAVAP', 'FA_BACKUP_TO_QUARANTINE'] if value != '0': ret = ' + '.join( [ flags[i] for i in bits(int(value)) ] ) else: ret = value log['operation_flags'] = ret if value == '0': ret = "No" elif value == '1': ret = "Yes" else: ret = "UNDOCUMENTED" log['compressed_file'] = ret status = {'0' : 'VECLEANABLE', '1' : 'VENOCLEANPATTERN', '2' : 'VENOTCLEANABLE'} log['cleanable'] = status.get(value, "UNDOCUMENTED") status = {'0' : 'VEDELETABLE', '1' : 'VENOTDELETABLE'} log['deletable'] = status.get(value, "UNDOCUMENTED") def bits(x): # helper function to decrypt the mask if x == 0: return () else: top_pow = int(math.log(x, 2)) return (top_pow,) + bits(x - 2**top_pow) meanings = { 0: 'The file could not be opened', 1: 'The file was wiped clean of data', 2: 'The file was truncated to 0 bytes', 3: 'The file could not be deleted', 8: 'Flag created files due to special handling', 9: 'The just created infected file was deleted', 10: 'Dir2-type infected files are not quarantined', 11: 'Dir2-type infected files are deleted if the file is being created', 12: 'Dir2-type infected files are not deleted', 16: 'File was deleted due to the DESTROY flag', } if value == '0': ret = "No information" else: ret = " + ".join( [ meanings[i] for i in bits(int(value)) ] ) log['action1_status'] = ret def bits(x): # helper function to decrypt the mask if x == 0: return () else: top_pow = int(math.log(x, 2)) return (top_pow,) + bits(x - 2**top_pow) meanings = { 0: 'The file could not be opened', 1: 'The file was wiped clean of data', 2: 'The file was truncated to 0 bytes', 3: 'The file could not be deleted', 8: 'Flag created files due to special handling', 9: 'The just created infected file was deleted', 10: 'Dir2-type infected files are not quarantined', 11: 'Dir2-type infected files are deleted if the file is being created', 12: 'Dir2-type infected files are not deleted', 16: 'File was deleted due to the DESTROY flag', } if value == '0': ret = "No information" else: ret = " + ".join( [ meanings[i] for i in bits(int(value)) ] ) log['action2_status'] = ret Pattern definition for Symantec Antivirus version 8 Definition de pattern pour Symantec Antivirus version 8 DATE,EVENT_NUMBER,CATEGORY,EVENT_LOGGER_TYPE,COMPUTER,USERNAME,VIRUS_NAME,VIRUS_LOCATION,PRIMARY_ACTION_CONFIGURATION,SECONDARY_ACTION_CONFIGURATION,ACTION_TAKEN,VIRUS_TYPE,EVENTBLOCK_ACTION,BODY,SCAN_ID,UNKNOWN1,GROUP_ID,EVENT_DATA,QUARANTINED_FILE_ID,VIRUS_ID,QUARANTINE_ATTEMPT_STATUS,OPERATION_FLAGS,UNKNOWN2,COMPRESSED_FILE,VIRUS_DEPTH_IN_COMPRESSED_FILE,AMOUNT_OF_REMAINING_INFECTED_FILES,VIRUS_DEFINITIONS_VERSION,VIRUS_DEFINITION_SEQUENCE_NUMBER,CLEANABLE,DELETABLE,BACKUP_ID,PARENT,GUID,CLIENT_GROUP,ADDRESS,SERVER_GROUP,DOMAIN_NAME,MAC_ADDRESS,VERSION DATE decode_date Category Catégorie EVENT_NUMBER event_translator CATEGORY category_translator EVENT_LOGGER_TYPE logger_translator COMPUTER Utilisateur Utilisateur USERNAME Virus name Nom du virus' VIRUS_NAME Virus location Emplacement du virus VIRUS_LOCATION PRIMARY_ACTION_CONFIGURATION action1_translator SECONDARY_ACTION_CONFIGURATION action2_translator Action taken Action effectuée ACTION_TAKEN action0_translator Virus type Type du virus VIRUS_TYPE virustype_translator EVENTBLOCK_ACTION flag_translator Message body describing the event Corps du message, décrivant l'événement BODY Scan identifier Identifiant du scan SCAN_ID UNKNOWN1 GROUP_ID EVENT_DATA QUARANTINED_FILE_ID Virus identifier Identifiant du virus VIRUS_ID Quarantine attempt status Statut de la tentative de mise en quarantaine QUARANTINE_ATTEMPT_STATUS quarantinest_translator OPERATION_FLAGS access_translator UNKNOWN2 COMPRESSED_FILE compressed_translator VIRUS_DEPTH_IN_COMPRESSED_FILE Amount of remaining infected files Nombre de fichiers encore infectés AMOUNT_OF_REMAINING_INFECTED_FILES Virus definition file version Version du fichier de définitions des virus VIRUS_DEFINITIONS_VERSION VIRUS_DEFINITION_SEQUENCE_NUMBER CLEANABLE clean_translator DELETABLE delete_translator BACKUP_ID PARENT GUID CLIENT_GROUP ADDRESS SERVER_GROUP DOMAIN_NAME MAC Address Adresse MAC MAC_ADDRESS Version Version VERSION symantec 200A13080122,23,2,8,TRAVEL00,SYSTEM,,,,,,,16777216,"Symantec AntiVirus Realtime Protection Loaded.",0,,0,,,,,0,,,,,,,,,,SAMPLE_COMPUTER,,,,Parent,GROUP,,8.0.93330 symantec 2002-11-19 08:01:34 Summary TRAVEL00 GROUP System GL_EVENT_RTS_LOAD EB_LOG 0 0 SAMPLE_COMPUTER 0 Parent SYSTEM 8.0.93330 antivirus Pattern definition for Symantec Antivirus version 9 Definition de pattern pour Symantec Antivirus version 9 DATE,EVENT_NUMBER,CATEGORY,EVENT_LOGGER_TYPE,COMPUTER,USERNAME,VIRUS_NAME,VIRUS_LOCATION,PRIMARY_ACTION_CONFIGURATION,SECONDARY_ACTION_CONFIGURATION,ACTION_TAKEN,VIRUS_TYPE,EVENTBLOCK_ACTION,BODY,SCAN_ID,UNKNOWN1,GROUP_ID,EVENT_DATA,QUARANTINED_FILE_ID,VIRUS_ID,QUARANTINE_ATTEMPT_STATUS,OPERATION_FLAGS,UNKNOWN2,COMPRESSED_FILE,VIRUS_DEPTH_IN_COMPRESSED_FILE,AMOUNT_OF_REMAINING_INFECTED_FILES,VIRUS_DEFINITIONS_VERSION,VIRUS_DEFINITION_SEQUENCE_NUMBER,CLEANABLE,DELETABLE,BACKUP_ID,PARENT,GUID,CLIENT_GROUP,ADDRESS,SERVER_GROUP,DOMAIN_NAME,MAC_ADDRESS,VERSION,REMOTE_MACHINE,REMOTE_MACHINE_IP,ACTION1_STATUS,ACTION2_STATUS,LICENSE_FEATURE_NAME,LICENSE_FEATURE_VER,LICENSE_SERIAL_NUM,LICENSE_FULFILLMENT_ID,LICENSE_START_DT,LICENSE_EXPIRATION_DT,LICENSE_LIFECYCLE,LICENSE_SEATS_TOTAL,LICENSE_SEATS,ERR_CODE,LICENSE_SEATS_DELTA,STATUS,DOMAIN_GUID,LOG_SESSION_GUID,VBIN_SESSION_GUID,LOGIN_DOMAIN DATE decode_date Category Catégorie EVENT_NUMBER event_translator CATEGORY category_translator EVENT_LOGGER_TYPE logger_translator COMPUTER Utilisateur Utilisateur USERNAME Virus name Nom du virus' VIRUS_NAME Virus location Emplacement du virus VIRUS_LOCATION PRIMARY_ACTION_CONFIGURATION action1_translator SECONDARY_ACTION_CONFIGURATION action2_translator Action taken Action effectuée ACTION_TAKEN action0_translator Virus type Type du virus VIRUS_TYPE virustype_translator EVENTBLOCK_ACTION flag_translator Message body describing the event Corps du message, décrivant l'événement BODY Scan identifier Identifiant du scan SCAN_ID UNKNOWN1 GROUP_ID EVENT_DATA QUARANTINED_FILE_ID Virus identifier Identifiant du virus VIRUS_ID Quarantine attempt status Statut de la tentative de mise en quarantaine QUARANTINE_ATTEMPT_STATUS quarantinest_translator OPERATION_FLAGS access_translator UNKNOWN2 COMPRESSED_FILE compressed_translator VIRUS_DEPTH_IN_COMPRESSED_FILE Amount of remaining infected files Nombre de fichiers encore infectés AMOUNT_OF_REMAINING_INFECTED_FILES Virus definition file version Version du fichier de définitions des virus VIRUS_DEFINITIONS_VERSION VIRUS_DEFINITION_SEQUENCE_NUMBER CLEANABLE clean_translator DELETABLE delete_translator BACKUP_ID PARENT GUID CLIENT_GROUP ADDRESS SERVER_GROUP DOMAIN_NAME MAC Address Adresse MAC MAC_ADDRESS Version Version VERSION REMOTE_MACHINE REMOTE_MACHINE_IP ACTION1_STATUS action1s9_translator ACTION2_STATUS action2s9_translator LICENSE_FEATURE_NAME LICENSE_FEATURE_VER LICENSE_SERIAL_NUM LICENSE_FULFILLMENT_ID LICENSE_START_DT LICENSE_EXPIRATION_DT LICENSE_LIFECYCLE LICENSE_SEATS_TOTAL LICENSE_SEATS ERR_CODE LICENSE_SEATS_DELTA STATUS DOMAIN_GUID LOG_SESSION_GUID VBIN_SESSION_GUID LOGIN_DOMAIN symantec 200A13080122,23,2,8,TRAVEL00,SYSTEM,,,,,,,16777216,"Symantec AntiVirus Realtime Protection Loaded.",0,,0,,,,,0,,,,,,,,,,SAMPLE_COMPUTER,,,,Parent,GROUP,,9.0.93330,,,,,,,,,,,,,,,,,,,, symantec 2002-11-19 08:01:34 Summary TRAVEL00 GROUP System GL_EVENT_RTS_LOAD EB_LOG 0 0 SAMPLE_COMPUTER 0 Parent SYSTEM 9.0.93330 antivirus pylogsparser-0.4/normalizers/RefererParser.xml0000644000175000017500000000742611645625573020072 0ustar fbofbo This normalizer extracts additional info from URLs such as domain, protocol, etc. Ce normaliseur extrait des données supplémentaires des URLs telles que le domaine, le protocole, etc. mhu@wallix.com parsed = urlparse.urlparse(value) if parsed.hostname: log['referer_hostname'] = parsed.hostname # naive approach if len(parsed.hostname.split('.')) < 2: domain = None else: domain = '.'.join(parsed.hostname.split('.')[1:]) log['referer_domain'] = domain or parsed.hostname if parsed.path: log['referer_path'] = parsed.path URL URL decodeURL http://www.wallix.org/2011/09/20/how-to-use-linux-containers-lxc-under-debian-squeeze/ www.wallix.org /2011/09/20/how-to-use-linux-containers-lxc-under-debian-squeeze/ wallix.org pylogsparser-0.4/normalizers/syslog.xml0000644000175000017500000002062611673644166016640 0ustar fbofbo This normalizer is used to parse syslog lines, as defined in RFC3164. Priority, when present, is broken into the facility and severity codes. Ce normaliseur traite les événements au format syslog, tel qu'il est défini dans la RFC3164. Si le message contient une information de priorité, celle-ci est décomposée en deux valeurs : facilité et gravité. mhu@wallix.com Expression matching a syslog line priority, defined as 8*facility + severity. Expression correspondant à la priorité du message, suivant la formule 8 x facilité + gravité. \d{1,3} Expression matching the log's source. Expression correspondant à la source du message. [^: ]+ Expression matching the log's program. Expression correspondant au programme notifiant l'événement. [^: []* # define facilities FACILITIES = { 0: "kernel", 1: "user", 2: "mail", 3: "daemon", 4: "auth", 5: "syslog", 6: "print", 7: "news", 8: "uucp", 9: "ntp", 10: "secure", 11: "ftp", 12: "ntp", 13: "audit", 14: "alert", 15: "ntp" } for i in range(0, 8): FACILITIES[i+16] = "local%d" % i # define severities SEVERITIES = { 0: "emerg", 1: "alert", 2: "crit", 3: "error", 4: "warn", 5: "notice", 6: "info", 7: "debug" } facility = int(value) / 8 severity = int(value) % 8 if facility not in FACILITIES or severity not in SEVERITIES: raise ValueError('facility or severity is out of range') log["facility"] = "%s" % FACILITIES[facility] log["severity"] = "%s" % SEVERITIES[severity] log["facility_code"] = "%d" % facility log["severity_code"] = "%d" % severity A syslog line with optional priority (sent through network), source, program and optional PID. Une ligne de log encapsulée par syslog comprenant une priorité (optionnelle), une source, un programme et un PID (optionnel). (?:<PRIORITY>)?DATE SOURCE PROGRAM(?:\[PID\])?: BODY the log's priority la priorité du log, égale à 8 x facilité + gravité PRIORITY decode_priority the log's date l'horodatage du log par le démon syslog DATE MMM dd hh:mm:ss the log's source l'équipement d'origine de l'événement SOURCE the log's program le programme à l'origine de l'événement PROGRAM the program's process ID le PID du programme PID the actual event message le message décrivant l'événement BODY <29>Jul 18 08:55:35 naruto dhclient[2218]: bound to 10.10.4.11 -- renewal in 2792 seconds. daemon notice naruto dhclient 2218 bound to 10.10.4.11 -- renewal in 2792 seconds. A syslog line with optional priority (sent through network), source, and no information about program and PID. Une ligne de log encapsulée par syslog comprenant une priorité (optionnelle), une source, et pas d'information sur le programme. (?:<PRIORITY>)?DATE SOURCE BODY the log's priority la priorité du log, égale à 8 x facilité + gravité PRIORITY decode_priority the log's date l'horodatage du log par le démon syslog DATE MMM dd hh:mm:ss the log's source l'équipement d'origine de l'événement SOURCE the actual event message le message décrivant l'événement BODY <29>Jul 18 08:55:35 naruto bound to 10.10.4.11 -- renewal in 2792 seconds. daemon notice naruto bound to 10.10.4.11 -- renewal in 2792 seconds. pylogsparser-0.4/normalizers/dhcpd.xml0000644000175000017500000002050711705765631016375 0ustar fbofbo This normalizer is used to parse DHCPd messages. Ce normaliseur analyse les messages émis par les serveurs DHCPd. mhu@wallix.com Expression matching a single word or lexeme. Expression correspondant à un mot sans espace intersticiel. [^ ]+ Expression matching the action notified by the DCHP daemon. Expression correspondant à l'action DHCP. DHCP[A-Z]+ log["action"] = value[4:] Generic DHCP discovery message. Structure générique d'un message de découverte DHCP. DHCPACTION from MACADDRESS via ADDRESS DHCPACTION decode_action MACADDRESS ADDRESS DHCPDISCOVER from 02:1c:25:a3:32:76 via 183.213.184.122 DISCOVER 02:1c:25:a3:32:76 183.213.184.122 address assignation Generic DHCP inform message. Message générique informatif. DHCPACTION from IP DHCPACTION decode_action IP DHCPINFORM from 183.231.184.122 INFORM 183.231.184.122 address assignation Other DHCP messages : offer, request, acknowledge, non-acknowledge, decline, release. Autres messages DHCP : offre de bail, requête, confirmation, réfutation, refus, libération de bail. DHCPACTION [a-z]+ IP [a-z]+ MACADDRESS via VIA DHCPACTION decode_action IP MACADDRESS VIA DHCPOFFER on 183.231.184.122 to 00:13:ec:1c:06:5b via 183.213.184.122 OFFER 183.231.184.122 00:13:ec:1c:06:5b 183.213.184.122 address assignation DHCPREQUEST for 183.231.184.122 from 00:13:ec:1c:06:5b via 183.213.184.122 REQUEST 183.231.184.122 00:13:ec:1c:06:5b 183.213.184.122 address assignation pylogsparser-0.4/normalizers/bitdefender.xml0000644000175000017500000005301611705765631017567 0ustar fbofbo This normalizer parses BitDefender (Mail servers UNIX) logs. Ce normaliseur analyse les logs de BitDefender (version Mail servers UNIX). fbo@wallix.com .* action, action_info = value.split() log['action'] = action log['action_info'] = action_info log['stamp'] = value r1 = re.compile('.*, hit signature: (?P<sign>.*), .*') m1 = r1.match(value) if m1: log['reason_detail'] = m1.groupdict()['sign'] log['reason'] = 'signature' return r2 = re.compile('.*, blacklisted, .*') m2 = r2.match(value) if m2: log['reason'] = 'blacklisted' return r3 = re.compile('.*, URI DNSBL: \[(?P<reporter>.*)\], .*') m3 = r3.match(value) if m3: log['reason_detail'] = m3.groupdict()['reporter'] log['reason'] = 'URI DNSBL' return r4 = re.compile('.*, spam url, .*') m4 = r4.match(value) if m4: log['reason'] = 'spam url' return r5 = re.compile('.*, SQMD Hits: (?P<hits>.*) , .*') m5 = r5.match(value) if m5: log['reason_detail'] = m5.groupdict()['hits'] log['reason'] = 'SQMD Hits' return log['body'] = log['raw'].split(': ', 1)[1] Logs contained in spam.log file. Logs contenus dans le fichier spam.log. DATE BDMAILD SPAM: sender: SENDER, recipients: RECIPIENTS, sender IP: SADDR, subject: "SUBJECT", score: SCORE, stamp: "STAMP", agent: AGENT, action: ACTION, header recipients: HRECIPS, headers: HEADERS, group: "GROUP" The time at which the spam was detected. La date à laquelle le spam a été détécté. DATE MM/dd/YYYY hh:mm:ss extract_body The mail sender. L'expéditeur de mail. SENDER The mail recipients list. La liste des mails destinataires. RECIPIENTS Client IP address. L'adresse IP du client. SADDR The mail subject. Le sujet du mail. SUBJECT SCORE Spam identification informations. Informations d'identifications du spam. STAMP extract_spam_reason AGENT Action taken by BitDefender. Action prise par BitDefender. ACTION decode_action HRECIPS HEADERS GROUP bitdefender 12/08/2010 11:18:42 BDMAILD SPAM: sender: bounces+333785.61449158.669496@icpbounce.com, recipients: jack@corp.com, sender IP: 127.0.0.1, subject: "=?iso-8859-1?Q?N=B07_sur_7_de_votre_s=E9rie_sur_le_management_du_changeme?= =?iso-8859-1?Q?nt?=", score: 1000, stamp: " v1, build 2.8.60.118893, rbl score: 0(0), hit signature: AUTO_B_IPX_20100613_110223_1_555, total: 1000(775)", agent: Smtp Proxy 3.1.3, action: drop (move-to-quarantine;drop), header recipients: ( "jack@corp.com" ), headers: ( "Received: from localhost [127.0.0.1] by BitDefender SMTP Proxy on localhost [127.0.0.1] for localhost [127.0.0.1]; Wed, 8 Dec 2010 11:18:42 +0100 (CET)" "Received: from paris.office.corp.com (unknown [10.10.1.254]) by as-bd-64.ifr.lan (Postfix) with ESMTP id 305B28A001 for <jack@corp.com>; Wed, 8 Dec 2010 11:18:42 +0100 (CET)" "Received: from smtp16.icpbounce.com (smtp16.icpbounce.com [216.27.93.110]) by paris.office.corp.com (Postfix) with ESMTP id 746D86A423B for <jack@corp.com>; Wed, 8 Dec 2010 11:17:48 +0100 (CET)" "Received: from drone21.rtp.icpbounce.com (agent004.colo.icontact.com [172.27.2.15]) by smtp16.icpbounce.com (Postfix) with ESMTP id 4C5653C7327 for <jack@corp.com>; Wed, 8 Dec 2010 05:15:46 -0500 (EST)" "Received: from localhost.localdomain (unknown [127.0.0.1]) by drone21.rtp.icpbounce.com (Postfix) with ESMTP id 8ED7022BD6 for <jack@corp.com>; Wed, 8 Dec 2010 05:10:39 -0500 (EST)" ), group: "Default" 2010-12-08 11:18:42 1000 Smtp Proxy 3.1.3 bounces+333785.61449158.669496@icpbounce.com jack@corp.com =?iso-8859-1?Q?N=B07_sur_7_de_votre_s=E9rie_sur_le_management_du_changeme?= =?iso-8859-1?Q?nt?= v1, build 2.8.60.118893, rbl score: 0(0), hit signature: AUTO_B_IPX_20100613_110223_1_555, total: 1000(775) drop (move-to-quarantine;drop) Default AUTO_B_IPX_20100613_110223_1_555 signature antivirus 10/20/2011 10:01:19 BDMAILD SPAM: sender: debimelva@albaad.com, recipients: djoume@corp.com;lchapuis@cpr.com;matallah@corp.com;mhoulbert@corp.com;rca@corp.com;sales@corp.com;sset@corp.com;steph@corp.com;vbe@corp.com, sender IP: 127.0.0.1, subject: "Replica watches - THE MOST POPULAR MODELS All our replica watches have the same look and feel of the original product", score: 1000, stamp: " v1, build 2.10.1.12405, rbl score: 0(0), hit signature: S_REPL_IPX_080830_02, total: 1000(750)", agent: Smtp Proxy 3.1.3, action: drop (move-to-quarantine;drop), header recipients: ( "<sset@corp.com>" ), headers: ( "Received: from localhost [127.0.0.1] by BitDefender SMTP Proxy on localhost [127.0.0.1] for localhost [127.0.0.1]; Thu, 20 Oct 2011 10:01:19 +0200 (CEST)" "Received: from paris.office.corp.com (go.corp.lan [10.10.1.254]) by as-bd-64.ifr.lan (Postfix) with ESMTP id 5AB6E1C7; Thu, 20 Oct 2011 10:01:19 +0200 (CEST)" "Received: from wfxamsklgv25z.py5nq1lz4i.com (unknown [190.234.5.86]) by paris.office.corp.com (Postfix) with SMTP id 006366A4895; Thu, 20 Oct 2011 09:54:40 +0200 (CEST)" ), group: "Default" 2011-10-20 10:01:19 1000 debimelva@albaad.com djoume@corp.com;lchapuis@cpr.com;matallah@corp.com;mhoulbert@corp.com;rca@corp.com;sales@corp.com;sset@corp.com;steph@corp.com;vbe@corp.com drop Default v1, build 2.10.1.12405, rbl score: 0(0), hit signature: S_REPL_IPX_080830_02, total: 1000(750) S_REPL_IPX_080830_02 signature antivirus 10/20/2011 16:07:40 BDMAILD SPAM: sender: 2363840z15263@bounce.crugeman.net, recipients: presse@corp.com, sender IP: 127.0.0.1, subject: "Conventions collectives nationales", score: 1000, stamp: " v1, build 2.10.1.12405, SQMD Hits: Spam FuzzyHit CRT_BGU , rbl score: 0(0), apm score: 500, SQMD: 6e74b86f401125abf381712e9dcc808e.fuzzy.fzrbl.org, total: 1000(750)", agent: Smtp Proxy 3.1.3, action: drop (move-to-quarantine;drop), header recipients: ( "<presse@corp.com>" ), headers: ( "Received: from localhost [127.0.0.1] by BitDefender SMTP Proxy on localhost [127.0.0.1] for localhost [127.0.0.1]; Thu, 20 Oct 2011 16:07:39 +0200 (CEST)" "Received: from paris.office.corp.com (go.corp.lan [10.10.1.254]) by as-bd-64.ifr.lan (Postfix) with ESMTP id BE4641C7 for <presse@corp.com>; Thu, 20 Oct 2011 16:07:39 +0200 (CEST)" "Received: from mx01.crugeman.net (mx01.crugeman.net [195.43.150.178]) by paris.office.corp.com (Postfix) with ESMTP id DF33E6A42A4 for <presse@corp.com>; Thu, 20 Oct 2011 16:01:10 +0200 (CEST)" "Received: by mx01.crugeman.net (Postfix, from userid 0) id C57BE89416; Thu, 20 Oct 2011 16:01:09 +0200 (CEST)" ), group: "Default" 2011-10-20 16:07:40 1000 presse@corp.com drop Default v1, build 2.10.1.12405, SQMD Hits: Spam FuzzyHit CRT_BGU , rbl score: 0(0), apm score: 500, SQMD: 6e74b86f401125abf381712e9dcc808e.fuzzy.fzrbl.org, total: 1000(750) Spam FuzzyHit CRT_BGU SQMD Hits bitdefender antivirus Logs contained in update.log file. Logs contenus dans le fichier update.log. DATE BDLIVED INFO: .* The time at which the event was detected. La date à laquelle l'événement a été détécté. DATE MM/dd/YYYY hh:mm:ss extract_body bitdefender 10/24/2011 15:33:30 BDLIVED INFO: Downloading files for location 'antispam_sig_nx' from 'upgrade.bitdefender.com' 2011-10-24 15:33:30 bitdefender antivirus Logs contained in mail.log file. Logs contenus dans le fichier mail.log. DATE BDMAILD INFO: .* The time at which the event was detected. La date à laquelle l'événement a été détécté. DATE MM/dd/YYYY hh:mm:ss extract_body bitdefender 10/24/2011 13:33:11 BDMAILD INFO: cannot use an empty footer 2011-10-24 13:33:11 bitdefender antivirus Logs contained in error.log file. Logs contenus dans le fichier error.log. DATE BDSCAND ERROR: .* The time at which the event was detected. La date à laquelle l'événement a été détécté. DATE MM/dd/YYYY hh:mm:ss extract_body bitdefender 10/24/2011 04:31:39 BDSCAND ERROR: failed to initialize the AV core 2011-10-24 04:31:39 bitdefender antivirus pylogsparser-0.4/normalizers/snare.xml0000644000175000017500000004305611673644166016432 0ustar fbofbo This normalizer handles event logs sent by Snare agent for Windows. Ce normaliser analyse les logs envoyés par Snare agent for Windows clo@wallix.com String containing Windows' authorized characters for computers, users, etc... Chaîne contenant les caractères autorisés de Windows pour les noms d'ordinateur, utilisateurs etc.. [^\t]+|(?:N/A) 'MSWinEventLog' for Snare for Windows. 'MSWinEventLog' pour Snare for Windows. MSWinEventLog Criticality tag is a number between 0 and 4. Le tag criticité est un nombre entre 0 et 4. [0-4] After different tests, event_log_source is a string containing [a-zA-Z-_]+, but it may be able to contain more than those characters. Change regexp in that case. Après plusieurs tests, event_log_source est une chaîne contenant [a-zA-Z-_]+ mais elle pourrait contenir d'autres caracrères, il faudra changer la regexp dans ce cas. [a-zA-Z]+(?:[ -][a-zA-Z]+)* This is the type of SID used. Type de SID utilisé. (?:[A-Z][a-zA-Z]+(?: [A-Z][a-zA-Z]+)*)|(?:N/A) String that can be anyone of the following regexp. Chaîne pouvant être une des regexp suivantes. Success Audit|Failure Audit|Error|Warning|Information String of the different Windows' audit event category. Chaîne d'une des catégories d'audit event de Windows. [^\t]+|(?:N/A) Hexadecimal number. Nombre héxadécimal. [0-9a-fA-F]{32} url = "http://www.microsoft.com/technet/support/ee/SearchResults.aspx?Type=0&Message=" log['technet_link'] = url + str(value) This is the Snare log format. Description du format des logs Snare. SNARE_EVENT_LOG_TYPE\s+CRITICALITY\s+SOURCE_NAME\s+SNARE_EVENT_COUNTER\s+[a-zA-Z]{3}. \w+ [0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{3}\s+EVENT_ID\s+EXPANDED_SOURCENAME\sUSER_NAME\s+SID_TYPE\s+EVENT_LOGTYPE\s+COMPUTER_NAME\s+CATEGORY_STRING\s+DATA_STRING(?:\s+MD5_CHECKSUM)? 'MSWinEventLog' for Snare for Windows. 'MSWinEventLog' pour Snare for Windows. SNARE_EVENT_LOG_TYPE This is determinated by the Alert level given to the objective by the user. (In Snare Agent) Niveau d'alerte configuré depuis les objectives de l'agent Snare. CRITICALITY This is the Windows Event Log from which the event record was derived. Nom du journal Windows d'où vient l'évènement. SOURCE_NAME Based on the internal Snare event counter. Basé sur le compteur interne des évènements de Snare. SNARE_EVENT_COUNTER This is the Windows event ID. ID de l'évènement Windows. EVENT_ID add_technet_link This is the Windows Event Log from which the event record was derived. Nom du journal Windows d'où vient l'évènement. EXPANDED_SOURCENAME This is the Windows' user name. Nom d'utilisateur Windows. USER_NAME This is the type of SID used. Type de SID utilisé. SID_TYPE This can be anyone of 'Success Audit', 'Failure Audit', 'Error', 'Information', 'Warning'. Peut être un des suivants: 'Success Audit', 'Failure Audit', 'Error', 'Information', 'Warning'. EVENT_LOGTYPE This is the Windows computer name. Nom de l'ordinateur. COMPUTER_NAME This is the Windows computer name. Nom de l'ordinateur. CATEGORY_STRING This contains the data string. Contient la chaîne de donnée. DATA_STRING This is a MD5 checksum. Empreinte MD5. MD5_CHECKSUM clo-PC MSWinEventLog 0 Security 191 mer. août 24 14:20:19 201 4688 Microsoft-Windows-Security-Auditing WORKGROUP\CLO-PC$ N/A Success Audit clo-PC Création du processus Un nouveau processus a été créé. Sujet : ID de sécurité : S-1-5-18 Nom du compte : CLO-PC$ Domaine du compte : WORKGROUP ID d’ouverture de session : 0x3e7 Informations sur le processus : ID du nouveau processus : 0x654 Nom du nouveau processus : C:\Windows\servicing\TrustedInstaller.exe Type d’élévation du jeton : Type d’élévation de jeton par défaut (1) ID du processus créateur : 0x1c8 Le type d’élévation du jeton indique le type de jeton qui a été attribué au nouveau processus conformément à la stratégie de contrôle du compte d’utilisateur. Le type 1 est un jeton complet sans aucun privilège supprimé ni aucun groupe désactivé. Un jeton complet est uniquement utilisé si le contrôle du compte d’utilisateur est désactivé, ou si l’utilisateur est le compte d’administrateur intégré ou un compte de service. Le type 2 est un jeton aux droits élevés sans aucun privilège supprimé ni aucun groupe désactivé. Un jeton aux droits élevés est utilisé lorsque le contrôle de compte d’utilisateur est activé et que l’utilisateur choisit de démarrer le programme en tant qu’administrateur. Un jeton aux droits élevés est également utilisé lorsqu’une application est configurée pour toujours exiger un privilège administratif ou pour toujours exiger les privilèges maximum, et que l’utilisateur est membre du groupe Administrateurs. Le type 3 est un jeton limité dont les privilèges administratifs sont supprimés et les groupes administratifs désactivés. Le jeton limité est utilisé lorsque le contrôle de compte d’ utilisateur est activé, que l’application n’exige pas le privilège administratif et que l’utilisateur ne choisit pas de démarrer le programme en tant qu’administrateur. 133 MSWinEventLog 0 Security 191 4688 Microsoft-Windows-Security-Auditing WORKGROUP\CLO-PC$ N/A Success Audit clo-PC Création du processus Un nouveau processus a été créé. Sujet : ID de sécurité : S-1-5-18 Nom du compte : CLO-PC$ Domaine du compte : WORKGROUP ID d’ouverture de session : 0x3e7 Informations sur le processus : ID du nouveau processus : 0x654 Nom du nouveau processus : C:\Windows\servicing\TrustedInstaller.exe Type d’élévation du jeton : Type d’élévation de jeton par défaut (1) ID du processus créateur : 0x1c8 Le type d’élévation du jeton indique le type de jeton qui a été attribué au nouveau processus conformément à la stratégie de contrôle du compte d’utilisateur. Le type 1 est un jeton complet sans aucun privilège supprimé ni aucun groupe désactivé. Un jeton complet est uniquement utilisé si le contrôle du compte d’utilisateur est désactivé, ou si l’utilisateur est le compte d’administrateur intégré ou un compte de service. Le type 2 est un jeton aux droits élevés sans aucun privilège supprimé ni aucun groupe désactivé. Un jeton aux droits élevés est utilisé lorsque le contrôle de compte d’utilisateur est activé et que l’utilisateur choisit de démarrer le programme en tant qu’administrateur. Un jeton aux droits élevés est également utilisé lorsqu’une application est configurée pour toujours exiger un privilège administratif ou pour toujours exiger les privilèges maximum, et que l’utilisateur est membre du groupe Administrateurs. Le type 3 est un jeton limité dont les privilèges administratifs sont supprimés et les groupes administratifs désactivés. Le jeton limité est utilisé lorsque le contrôle de compte d’ utilisateur est activé, que l’application n’exige pas le privilège administratif et que l’utilisateur ne choisit pas de démarrer le programme en tant qu’administrateur. 133 MSWinEventLog 0 Security 313 ven. août 26 15:42:40 201 4689 Microsoft-Windows-Security-Auditing AUTORITE NT\SERVICE LOCAL N/A Success Audit a-zA-Z0-9_ Fin du processus Un processus est terminé. Sujet : ID de sécurité : S-1-5-19 Nom du compte : SERVICE LOCAL Domaine du compte : AUTORITE NT ID d’ouverture de session : 0x3e5 Informations sur le processus : ID du processus : 0xdf4 Nom du processus : C:\Windows\System32\taskhost.exe État de fin : 0x0 189 MSWinEventLog 0 Security 313 4689 Microsoft-Windows-Security-Auditing AUTORITE NT\SERVICE LOCAL N/A Success Audit a-zA-Z0-9_ Fin du processus Un processus est terminé. Sujet : ID de sécurité : S-1-5-19 Nom du compte : SERVICE LOCAL Domaine du compte : AUTORITE NT ID d’ouverture de session : 0x3e5 Informations sur le processus : ID du processus : 0xdf4 Nom du processus : C:\Windows\System32\taskhost.exe État de fin : 0x0 189 >13<Aug 31 15:42:55 clo-vbox-win-7 MSWinEventLog 1 Security 103 mer. août 31 15:42:54 201 4776 Microsoft-Windows-Security-Auditing clo N/A Failure Audit clo-vbox-win-7 Validation des informations d’identification L’ordinateur a tenté de valider les informations d’identification d’un compte. Package d’authentification : MICROSOFT_AUTHENTICATION_PACKAGE_V1_0 Compte d’ouverture de session : clo Station de travail source : CLO-VBOX-WIN-7 Code d’erreur : 0xc000006e 77 MSWinEventLog 1 Security 103 4776 Microsoft-Windows-Security-Auditing clo N/A Failure Audit clo-vbox-win-7 Validation des informations d’identification L’ordinateur a tenté de valider les informations d’identification d’un compte. Package d’authentification : MICROSOFT_AUTHENTICATION_PACKAGE_V1_0 Compte d’ouverture de session : clo Station de travail source : CLO-VBOX-WIN-7 Code d’erreur : 0xc000006e 77 EventLog pylogsparser-0.4/normalizers/dansguardian.xml0000644000175000017500000010112611705765631017750 0ustar fbofbo This normalizer parses DansGuardian's access.log file. This file logs every request made, whether allowed or denied, and gives the reason why a specific action was taken. Ce normaliseur traite le contenu du fichier access.log utilisé par DansGuardian pour consigner les requêtes d'accès et le résultat associé. mhu@wallix.com the standard value format le format utilisé pour les différents champs du log [^\t]+ the date as it is logged by DansGuardian la date telle qu'elle est consignée par DansGuardian \d{4}\.\d{1,2}\.\d{1,2} \d{1,2}:\d{1,2}:\d{1,2} one of the HTTP methods defined by W3C une méthode HTTP parmi celles définies par le W3C GET|HEAD|CHECKOUT|SHOWMETHOD|PUT|DELETE|POST|LINK|UNLINK|CHECKIN|TEXTSEARCH|SPACEJUMP|SEARCH the possible actions taken by dansguardian les actions que peut appliquer dansguardian (?:[*][A-Z]+[*] ?)* r = re.compile("(?P<year>\d{4})\.(?P<month>\d{1,2})\.(?P<day>\d{1,2}) (?P<hour>\d{1,2}):(?P<minute>\d{1,2}):(?P<second>\d{1,2})") m = r.match(value).groupdict() m = dict( [(u, int(m[u])) for u in m.keys() ] ) log['date'] = datetime(**m) if "DENIED" in value: log['action'] = "DENIED" elif "EXCEPTION" in value: log['action'] = "EXCEPTION" if "INFECTED" in value: log['scan_result'] = "infected" # the following tags are not the "official" designation. There is no official designation actually. if "SCANNED" in value: log['scan_result'] = "clean" if "CONTENTMOD" in value: log['content_modified'] = "true" else: log['content_modified'] = "false" if "URLMOD" in value: log['url_modified'] = "true" else: log['url_modified'] = "false" if "HEADERMOD" in value: log['header_modified'] = "true" else: log['header_modified'] = "false" The standard access.log line pattern. Le format usuel d'une ligne de log issue du fichier access.log. WHEN WHO FROM WHERE WHAT WHY HOW SIZE NAUGHTINESS (?:CATEGORY)? FILTERGROUPNUMBER RETURNCODE (?:MIMETYPE)? (?:CLIENTNAME)? (?:FILTERGROUPNAME)? (?:USERAGENT)? the log's date la date du log WHEN decode_DGDate a user or a computer, if an "authplugin" has identified it un nom d'utilisateur ou d'équipement, s'il a été identifié par un "authplugin" WHO the IP address of the requestor l'adresse IP d'origine de la requête FROM the complete requested URL l'URL de la requête WHERE the list of actions taken by dansguardian, as it appears in the log file. This list is refined in relevant tags such as "action", "scan_result", "url_modified", "content_modified" and "header_modified" when applicable. la liste des actions prises par dansguardian, telle qu'elle apparaît dans le fichier de log. Cette liste sert à définir d'autres tags : "action", "scan_result", "url_modified", "content_modified" et "header_modified" quand cela est pertinent. WHAT decode_DGactions why the actions were taken la raison pour laquelle les actions ont été appliquées WHY the HTTP request verb la méthode HTTP HOW the size in bytes of the document, if fetched la taille du document en bytes, si il a été récupéré SIZE the sum of all the weighted phrase scores le score total d'inadéquation NAUGHTINESS the contents of the #listcategory tag, if any, in the list that is most relevant to the action le contenu éventuel de la métadonnée #listcategory la plus pertinente par rapport à l'action CATEGORY the filter group the request was assigned to le groupe de filtrage auquel la requête a été assignée FILTERGROUPNUMBER the HTTP return code le code HTTP de la réponse RETURNCODE the MIME type, if relevant, of the returned document le type MIME, si applicable, de la réponse MIMETYPE if configured, the result of a reverse DNS IP lookup on the requestor's IP address si activée, la résolution DNS inversée de l'IP d'origine de la requête CLIENTNAME the name of the filter group le nom du groupe de filtrage FILTERGROUPNAME the browser's user agent string la valeur du champ "user agent" exposée par le navigateur USERAGENT 2011.12.13 7:38:50 10.10.42.23 10.10.42.23 http://backports.debian.org/debian-backports/dists/squeeze-backports/main/binary-i386/Packages.diff/2011-12-02-1137.04.gz *DENIED* Type de fichier interdit: .gz GET 0 0 Banned extension 2 403 application/x-gzip limited_access - 10.10.42.23 10.10.42.23 http://backports.debian.org/debian-backports/dists/squeeze-backports/main/binary-i386/Packages.diff/2011-12-02-1137.04.gz *DENIED* DENIED Type de fichier interdit: .gz GET 0 0 Banned extension 2 403 application/x-gzip limited_access - web proxy 2011.12.13 12:10:48 10.10.42.23 10.10.42.23 http://safebrowsing-cache.google.com/safebrowsing/rd/ChNnb29nLW1hbHdhcmUtc2hhdmFyEAEY9p8EIPafBDIF9g8BAAE *EXCEPTION* Site interdit trouvé. GET 326 0 2 200 - limited_access - 10.10.42.23 10.10.42.23 http://safebrowsing-cache.google.com/safebrowsing/rd/ChNnb29nLW1hbHdhcmUtc2hhdmFyEAEY9p8EIPafBDIF9g8BAAE *EXCEPTION* EXCEPTION Site interdit trouvé. GET 326 0 2 200 - limited_access - web proxy A variation on the access.log line pattern, as it appears in dansguardian's sourcecode. Une variation sur le format usuel, telle qu'elle apparaît dans le code source de dansguardian. WHEN\tWHO\tFROM\tWHERE\tWHAT\tWHY\tHOW\tSIZE\tNAUGHTINESS\t(?:CATEGORY)?\tFILTERGROUPNUMBER\tRETURNCODE\t(?:MIMETYPE)?\t(?:CLIENTNAME)?\t(?:FILTERGROUPNAME)?\t(?:USERAGENT)? the log's date la date du log WHEN decode_DGDate a user or a computer, if an "authplugin" has identified it un nom d'utilisateur ou d'équipement, s'il a été identifié par un "authplugin" WHO the IP address of the requestor l'adresse IP d'origine de la requête FROM the complete requested URL l'URL de la requête WHERE the list of actions taken by dansguardian, as it appears in the log file. This list is refined in relevant tags such as "action", "scan_result", "url_modified", "content_modified" and "header_modified" when applicable. la liste des actions prises par dansguardian, telle qu'elle apparaît dans le fichier de log. Cette liste sert à définir d'autres tags : "action", "scan_result", "url_modified", "content_modified" et "header_modified" quand cela est pertinent. WHAT decode_DGactions why the actions were taken la raison pour laquelle les actions ont été appliquées WHY the HTTP request verb la méthode HTTP HOW the size in bytes of the document, if fetched la taille du document en bytes, si il a été récupéré SIZE the sum of all the weighted phrase scores le score total d'inadéquation NAUGHTINESS the contents of the #listcategory tag, if any, in the list that is most relevant to the action le contenu éventuel de la métadonnée #listcategory la plus pertinente par rapport à l'action CATEGORY the filter group the request was assigned to le groupe de filtrage auquel la requête a été assignée FILTERGROUPNUMBER the HTTP return code le code HTTP de la réponse RETURNCODE the MIME type, if relevant, of the returned document le type MIME, si applicable, de la réponse MIMETYPE if configured, the result of a reverse DNS IP lookup on the requestor's IP address si activée, la résolution DNS inversée de l'IP d'origine de la requête CLIENTNAME the name of the filter group le nom du groupe de filtrage FILTERGROUPNAME the browser's user agent string la valeur du champ "user agent" exposée par le navigateur USERAGENT A CSV version on the access.log line pattern, as it appears in dansguardian's sourcecode. Une version CSV de la ligne de log, telle qu'elle apparaît dans le code source de dansguardian. "WHEN","WHO","FROM","WHERE","WHAT","WHY","HOW","SIZE","NAUGHTINESS","(?:CATEGORY)?","FILTERGROUPNUMBER","RETURNCODE","(?:MIMETYPE)?","(?:CLIENTNAME)?","(?:FILTERGROUPNAME)?","(?:USERAGENT)?" the log's date la date du log WHEN decode_DGDate a user or a computer, if an "authplugin" has identified it un nom d'utilisateur ou d'équipement, s'il a été identifié par un "authplugin" WHO the IP address of the requestor l'adresse IP d'origine de la requête FROM the complete requested URL l'URL de la requête WHERE the list of actions taken by dansguardian, as it appears in the log file. This list is refined in relevant tags such as "action", "scan_result", "url_modified", "content_modified" and "header_modified" when applicable. la liste des actions prises par dansguardian, telle qu'elle apparaît dans le fichier de log. Cette liste sert à définir d'autres tags : "action", "scan_result", "url_modified", "content_modified" et "header_modified" quand cela est pertinent. WHAT decode_DGactions why the actions were taken la raison pour laquelle les actions ont été appliquées WHY the HTTP request verb la méthode HTTP HOW the size in bytes of the document, if fetched la taille du document en bytes, si il a été récupéré SIZE the sum of all the weighted phrase scores le score total d'inadéquation NAUGHTINESS the contents of the #listcategory tag, if any, in the list that is most relevant to the action le contenu éventuel de la métadonnée #listcategory la plus pertinente par rapport à l'action CATEGORY the filter group the request was assigned to le groupe de filtrage auquel la requête a été assignée FILTERGROUPNUMBER the HTTP return code le code HTTP de la réponse RETURNCODE the MIME type, if relevant, of the returned document le type MIME, si applicable, de la réponse MIMETYPE if configured, the result of a reverse DNS IP lookup on the requestor's IP address si activée, la résolution DNS inversée de l'IP d'origine de la requête CLIENTNAME the name of the filter group le nom du groupe de filtrage FILTERGROUPNAME the browser's user agent string la valeur du champ "user agent" exposée par le navigateur USERAGENT dansguardian pylogsparser-0.4/normalizers/named-2.xml0000644000175000017500000005605411705765631016544 0ustar fbofbo fbo@wallix.com \S+ \S+ \S+ \d+-\w+-\d{4} \d+:\d+:\d+\.\d+ default:|general:|database:|security:|config:|resolver:|xfer-in:|xfer-out:|notify:|client:|unmatched:|network:|update:|update-security:|queries:|dispatch:|dnssec:|lame-servers:|edns-disabled: emerg:|alert:|crit:|error:|warn:|notice:|info:|debug: log['category'] = value.rstrip(':') # define severities SEVERITIES = [ "emerg", "alert", "crit", "error", "warn", "notice", "info", "debug" ] severity = value.rstrip(':') log["severity"] = severity log["severity_code"] = SEVERITIES.index(severity) (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?client IP#PORT: transfer of 'ZONE/CLASS': TYPE ACTION$ Client IP address related to this request Adresse IP du client ayant généré la requête IP UDP client port Port UDP du client PORT DNS Zone related to this request Zone DNS concernée par la requête ZONE Action prise par le serveur Action taken by server ACTION Requested DNS Class (CLASS) Classe DNS de la requête CLASS Requested DNS recording Type (TYPE) Type (TYPE) d'enregistrement DNS demandé TYPE DATE dd-MMM-YYYY hh:mm:ss Subsystem category Catégorie de sous-système CATEGORY decode_named_category Message severity Sévérité du message SEVERITY decode_named_severity zone_transfer named client 10.10.4.4#35129: transfer of 'qa.ifr.lan/IN': AXFR started 10.10.4.4 zone_transfer qa.ifr.lan IN started AXFR named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?transfer of 'ZONE/CLASS' from IP#PORT: ACTION of transfer IP PORT ZONE ACTION CLASS DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity zone_transfer named transfer of 'localdomain/IN' from 192.168.1.3#53: end of transfer 192.168.1.3 zone_transfer localdomain IN end named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?transfer of 'ZONE/CLASS' from IP#PORT: failed while receiving responses: REFUSED IP PORT ZONE CLASS DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity zone_transfer_failure named failed transfer of 'ns2.domain.de/IN' from 192.168.0.5#53: failed while receiving responses: REFUSED 192.168.0.5 zone_transfer_failure ns2.domain.de IN failed named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?lame server resolving 'DOMAIN' \(in 'ZONE'\?\): IP#PORT IP PORT ZONE DOMAIN DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity lame_server_report named lame server resolving 'www.cadenhead.org' (in 'cadenhead.org'?): 67.19.3.218#53 67.19.3.218 lame_server_report cadenhead.org www.cadenhead.org named name resolution pylogsparser-0.4/normalizers/s3.xml0000644000175000017500000003044611715703401015627 0ustar fbofbo S3 log normalization. S3 logs consist of a list of values. Normalized keys are "bucket_owner", "bucket", "date", "ip", "requestor", "requestid", "operation", "key", "http_method", "http_target", "http_proto", "http_sta\ tus", "s3err", "sent", "object_size", "total_request_time", "turn_around_time", "referer", "user_agent" and "version_id". Ce normaliseur analyse les logs émis par S3. Les messages S3 consistent en une liste de valeurs. Les clés extraites par ce normaliseur sont "bucket_owner", "bucket", "date", "ip", "requestor", "requestid", "operation", "key", "http_method", "http_target", "http_proto", "http_status", "s3err", "sent", "object_size", "total_request_time", "turn_around_time", "referer", "user_agent" et "version_id". olivier.hervieu@tinyclues.com Matches S3 common time format. Une expression correspondant au format d'horodatage des logs S3. \[\d{1,2}/.{3}/\d{4}:\d{1,2}:\d{1,2}:\d{1,2}(?: [+-]\d{4})?\] Matches S3 quoted strings. Permet de parser les chaines de caracters S3. \".*\" \S+ value = value[1:-1].split(' ') log['http_method'] = value[0] log['http_target'] = value[1] log['protocol'] = value[2] value = value[1:-1] log['referer'] = value value = value[1:-1] log['user_agent'] = value Generic s3 log pattern. Parseur générique des logs S3. OWNER NAME DATE IP REQUESTOR REQUESTID OP KEY HTTP_METHOD HTTP_STATUS S3ERR SENT SIZE TOTAL TAT REF AGENT VID The canonical user id of the owner of the source bucket Identifiant canonique du propriétaire du bucket OWNER the bucket name le nom du bucket NAME the time at which the request was issued. Please note that the timezone information is not carried over la date à laquelle la requête a été émise. L'information de fuseau horaire n'est pas prise en compte DATE dd/MMM/YYYY:hh:mm:ss The apparent Internet address of the requester. Adresse IP apparente de la requête. IP The canonical user id of the requester. Identifiant canonique du requeteur. REQUESTOR request id id de la requête REQUESTID operation type type de l'opération OP The "key" part of the request, URL encoded, or "-" if the operation does not take a key parameter. KEY The Request-URI part of the HTTP request message. HTTP_METHOD split_s3_info The numeric HTTP status code of the response. Code numérique de retour de la requête HTTP. HTTP_STATUS The Amazon S3 Error Code, or "-" if no error occurred. Code d'erreur S3 ou "-". S3ERR The number of response bytes sent, excluding HTTP protocol overhead, or "-" if zero. SENT The total size of the object in question. SIZE The number of milliseconds the request was in flight from the server's perspective. TOTAL The number of milliseconds that Amazon S3 spent processing your request. TAT The value of the HTTP Referrer header, if present REF refer_unquote The value of the HTTP User-Agent header. AGENT agent_unquote The version ID in the request, or "-" if the operation does not take a versionId parameter. VID s3 DEADBEEF testbucket [19/Jul/2011:13:17:11 +0000] 10.194.22.16 FACEDEAD CAFEDECA REST.GET.ACL - "GET /?acl HTTP/1.1" 200 - 951 - 397 - "-" "Jakarta Commons-HttpClient/3.0" - DEADBEEF testbucket 10.194.22.16 FACEDEAD CAFEDECA REST.GET.ACL - GET /?acl HTTP/1.1 200 - 951 - 397 - - Jakarta Commons-HttpClient/3.0 - pylogsparser-0.4/normalizers/normalizer.template0000644000175000017500000000714711705765631020515 0ustar fbofbo pylogsparser-0.4/normalizers/named.xml0000644000175000017500000021766611705765631016415 0ustar fbofbo fbo@wallix.com \S+ \S+ \S+ \d+-\w+-\d{4} \d+:\d+:\d+\.\d+ default:|general:|database:|security:|config:|resolver:|xfer-in:|xfer-out:|notify:|client:|unmatched:|network:|update:|update-security:|queries:|dispatch:|dnssec:|lame-servers:|edns-disabled: emerg:|alert:|crit:|error:|warn:|notice:|info:|debug: view \S+: log['category'] = value.rstrip(':') # define severities SEVERITIES = [ "emerg", "alert", "crit", "error", "warn", "notice", "info", "debug" ] severity = value.rstrip(':') log["severity"] = severity log["severity_code"] = SEVERITIES.index(severity) view = value.rstrip(':').split()[-1] log["view"] = view (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?client IP#PORT: (?:VIEW )?query: DOMAIN CLASS TYPE \S+$ Client IP address related to this request Adresse IP du client ayant généré la requête IP UDP client port Port UDP du client PORT Domain requested by client Domaine concerné par la requête du client DOMAIN Requested DNS Class (CLASS) Classe DNS de la requête CLASS Requested DNS recording Type (TYPE) Type (TYPE) d'enregistrement DNS demandé TYPE DATE dd-MMM-YYYY hh:mm:ss Subsystem category Catégorie de sous-système CATEGORY decode_named_category Message severity Sévérité du message SEVERITY decode_named_severity DNS view related to this request Vue DNS associée à cette requête VIEW decode_named_view client_query named client 10.10.4.4#39583: query: tpf.qa.ifr.lan IN SOA + client_query tpf.qa.ifr.lan 10.10.4.4 39583 SOA IN named name resolution client 10.10.4.4#39583: view external: query: tpf.qa.ifr.lan IN SOA + client_query tpf.qa.ifr.lan 10.10.4.4 39583 SOA IN named external name resolution 28-Feb-2000 15:05:32.863 client 10.10.4.4#39583: query: tpf.qa.ifr.lan IN SOA + client_query tpf.qa.ifr.lan 10.10.4.4 39583 SOA IN named 2000-02-28 15:05:32.863000 name resolution 28-Feb-2000 15:05:32.863 general: client 10.10.4.4#39583: query: tpf.qa.ifr.lan IN SOA + client_query tpf.qa.ifr.lan 10.10.4.4 39583 SOA IN named 2000-02-28 15:05:32.863000 general name resolution 28-Feb-2000 15:05:32.863 general: client 10.10.4.4#39583: view external: query: tpf.qa.ifr.lan IN SOA + client_query tpf.qa.ifr.lan 10.10.4.4 39583 SOA IN named 2000-02-28 15:05:32.863000 general external name resolution queries: client 10.10.4.4#39583: query: tpf.qa.ifr.lan IN SOA + client_query tpf.qa.ifr.lan 10.10.4.4 39583 SOA IN named queries name resolution 28-Feb-2000 15:05:32.863 general: crit: client 10.10.4.4#39583: query: tpf.qa.ifr.lan IN SOA + client_query tpf.qa.ifr.lan 10.10.4.4 39583 SOA IN named 2000-02-28 15:05:32.863000 general crit 2 name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?client IP#PORT: query 'DOMAIN/TYPE/CLASS' denied$ IP PORT DOMAIN CLASS TYPE DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity client_query_denied denied named client 127.0.0.1#44063: query 'www.example.com/A/IN' denied client_query_denied www.example.com 127.0.0.1 44063 A IN named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?client IP#PORT: query denied$ IP PORT DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity client_query_denied denied named client 127.0.0.1#1126: query denied client_query_denied 127.0.0.1 1126 denied named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?client IP#PORT: query \(cache\) denied$ IP PORT DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity client_query_denied denied named client 127.0.0.1#1126: query (cache) denied client_query_denied 127.0.0.1 1126 denied named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?client IP#PORT: query \(cache\) 'DOMAIN/TYPE/CLASS' denied$ IP PORT DOMAIN CLASS TYPE DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity client_query_denied denied named client 219.135.228.103#17635: query (cache) 'mycompany.com.cn/MX/IN' denied client_query_denied mycompany.com.cn 219.135.228.103 17635 MX IN named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?createfetch: DOMAIN TYPE$ DOMAIN TYPE DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity fetch_request named createfetch: 126.92.194.77.zen.spamhaus.org A fetch_request 126.92.194.77.zen.spamhaus.org A named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?zone ZONE/CLASS: transferred serial SERIAL$ DNS Zone related to this request Zone DNS concernée par la requête ZONE CLASS Transaction serial number Numéro de série de la transaction SERIAL DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity zone_transfer named zone localdomain/IN: transferred serial 2006070304 zone_transfer localdomain IN 2006070304 named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?client IP#PORT: zone transfer 'ZONE/CLASS' denied$ IP PORT ZONE CLASS DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity zone_transfer_denied denied named client 219.135.228.103#17635: zone transfer 'somedomain.com/IN' denied 219.135.228.103 17635 zone_transfer_denied somedomain.com IN named denied name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?client IP#PORT: bad zone transfer request: 'ZONE/CLASS': .* IP PORT ZONE CLASS DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity zone_transfer_bad named client 192.168.198.130#4532: bad zone transfer request: 'www.abc.com/IN': non-authoritative zone (NOTAUTH) 192.168.198.130 4532 zone_transfer_bad www.abc.com IN named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?zone ZONE/CLASS: refresh: failure trying master IP#PORT: timed out$ DNS master IP address Adresse IP du master DNS IP DNS master PORT Port du master DNS PORT ZONE CLASS DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity zone_transfer_timeout named failure zone example.com/IN: refresh: failure trying master 1.2.3.4#53: timed out 1.2.3.4 53 zone_transfer_timeout example.com IN named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?zone ZONE/CLASS: refresh: retry limit for master IP#PORT exceeded$ IP PORT ZONE CLASS DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity zone_refresh_limit named retry zone somedomain.com.au/IN: refresh: retry limit for master 1.2.3.4#53 exceeded 1.2.3.4 53 zone_refresh_limit somedomain.com.au IN named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?client IP#PORT: updating zone 'ZONE/CLASS': ACTION an \S+ at 'DOMAIN' TYPE$ IP PORT ZONE CLASS TYPE Action prise par le serveur Action taken by server ACTION DOMAIN DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity zone_update named client 127.0.0.1#32839: updating zone 'home.whootis.com/IN': adding an RR at 'pianogirl.home.whootis.com' TXT 127.0.0.1 zone_update home.whootis.com IN TXT adding pianogirl.home.whootis.com named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?client IP#PORT: updating zone 'ZONE/CLASS': update failed: .* IP PORT ZONE CLASS DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity zone_update_failure named failed client 10.10.1.8#53147: updating zone 'clima-tech.com/IN': update failed: rejected by secure update (REFUSED) 10.10.1.8 zone_update_failure clima-tech.com IN failed named name resolution (?:DATE )?(?:CATEGORY )?(?:SEVERITY )?client IP#PORT: update 'ZONE/CLASS' denied IP PORT ZONE CLASS DATE dd-MMM-YYYY hh:mm:ss CATEGORY decode_named_category SEVERITY decode_named_severity zone_update_failure named denied client 10.10.1.8#53147: update 'clima-tech.com/IN' denied 10.10.1.8 zone_update_failure clima-tech.com IN denied named name resolution pylogsparser-0.4/normalizers/common_callBacks.xml0000644000175000017500000002501211710220376020522 0ustar fbofbo ]> dd matches the number of the day (1, 2, 3, etc...) MM matches the name of the month (Jan, Feb, Mar, etc...) YYYY matches the year (2012) hh:mm:ss matches the time (23:54:42) r = re.compile('(?P<month>\d{2})/(?P<day>\d{2})/(?P<year>\d{4}) (?P<hour>\d{2}):(?P<minute>\d{2}):(?P<second>\d{2})') date = r.search(value).groupdict() date = dict([(u, int(date[u])) for u in date.keys()]) newdate = datetime(**date) log['date'] = newdate dd matches the number of the day (1, 2, 3, etc...) MMM matches the name of the month (Jan, Feb, Mar, etc...) YYYY matches the year (2012) hh:mm:ss matches the time (23:54:42) english_months = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12} ctf = re.compile("(?P<day>\d+)/(?P<month>[a-zA-Z]+)/(?P<year>\d+):(?P<hour>\d+):(?P<minute>\d+):(?P<second>\d+)") m = ctf.search(value) if m: vals = m.groupdict() vals['month'] = english_months[vals['month']] vals = dict( [ (u, int(vals[u])) for u in vals.keys() ]) newdate = datetime(**vals) log['date'] = newdate else: raise Exception, "invalid date string %s" % value MMM matches the name of the month (Jan, Feb, Mar, etc...) dd matches the number of the day (1, 2, 3, etc...) hh:mm:ss matches the time (23:54:42) MONTHS = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] now = datetime.now() currentyear = now.year # Following line may throw a lot of ValueError newdate = datetime(currentyear, MONTHS.index(value[0:3]) + 1, int(value[4:6]), int(value[7:9]), int(value[10:12]), int(value[13:15])) if newdate > datetime.today(): newdate = newdate.replace(year = newdate.year - 1) log['date'] = newdate MMM matches the name of the month (Jan, Feb, Mar, etc...) dd matches the number of the day (1, 2, 3, etc...) hh:mm:ss matches the time (23:54:42) YYYY matches the year (2012) MONTHS = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] # Following line may throw a lot of ValueError newdate = datetime(int(value[7:11]), MONTHS.index(value[0:3]) + 1, int(value[4:6]), int(value[12:14]), int(value[15:17]), int(value[18:20])) log['date'] = newdate DDD matches the name of the day (Mon, Tue, Wed, etc...) MMM matches the name of the month (Jan, Feb, Mar, etc...) dd matches the number of the day (1, 2, 3, etc...) hh:mm:ss matches the time (23:54:42) YYYY matches the year (2012) reg = re.compile(u'(?P<month>[A-Z]{1}[a-z]{2}) (?P<day>\d{1,2}) (?P<hours>\d{2}):(?P<minutes>\d{2}):(?P<seconds>\d{2}) (?P<year>\d{4})') month = {'Jan' : 1, 'Feb' : 2, 'Mar' : 3, 'Apr' : 4, 'May' : 5, 'Jun' : 6, 'Jul' : 7, 'Aug' : 8, 'Sep' : 9, 'Oct' : 10, 'Nov' : 11, 'Dec' : 12} date = reg.search(value).groupdict() year = int(date.get('year')) month = month.get(date.get('month', None), None) day = int(date.get('day')) hours = int(date.get('hours')) minutes = int(date.get('minutes')) seconds = int(date.get('seconds')) newdate = datetime(year, month, day, hours, minutes, seconds) log['date'] = newdate YYYY matches the year (2012) MM matches the number of the month (01, 02, 03 etc...) DD matches the number of the day (01, 02, 03, etc...) hh:mm:ss matches the time (23:54:42) reg = re.compile('(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2}) (?P<hours>\d{2}):(?P<minutes>\d{2}):(?P<seconds>\d{2})') date = reg.search(value).groupdict() year= int(date.get('year')) month = int(date.get('month')) day = int(date.get('day')) hours = int(date.get('hours')) minutes = int(date.get('minutes')) seconds = int(date.get('seconds')) newdate = datetime(year, month, day, hours, minutes, seconds) log['date'] = newdate MM matches the number of the month (01, 02, 03 etc...) DD matches the number of the day (01, 02, 03, etc...) YY matches the year (12) hh:mm:ss matches the time (23:54:42) The year is set arbitrarily in the XXIst century. reg = re.compile('(?P<month>\d{2})/(?P<day>\d{2})/(?P<year>\d{2}), (?P<hours>\d{1,2}):(?P<minutes>\d{2}):(?P<seconds>\d{2})') date = reg.search(value) date = date.groupdict() year= int(date.get('year')) month = int(date.get('month')) day = int(date.get('day')) hours = int(date.get('hours')) minutes = int(date.get('minutes')) seconds = int(date.get('seconds')) newdate = datetime(2000 + year, month, day, hours, minutes, seconds) log['date'] = newdate YY matches the year (12) MM matches the number of the month (01, 02, 03 etc...) DD matches the number of the day (01, 02, 03, etc...) hh:mm:ss matches the time (23:54:42) reg = re.compile('(?P<year>[0-9]{2})(?P<month>[0-9]{2})(?P<day>[0-9]{2}) (?P<hours>(?:[0-9]{2}| [0-9])):(?P<minutes>[0-9]{2}):(?P<seconds>[0-9]{2})') date = reg.search(value) date = date.groupdict() year= int(date.get('year')) month = int(date.get('month')) day = int(date.get('day')) hours = int(date.get('hours')) minutes = int(date.get('minutes')) seconds = int(date.get('seconds')) newdate = datetime(2000 + year, month, day, hours, minutes, seconds) log["date"] = newdate Converts a combined date and time in UTC expressed according to the ISO 8601 standard. Also commonly referred to as "Zulu Time". Precision can be up to the millisecond. r = re.compile(""" (?P<year>\d{4})- (?P<month>\d{2})- (?P<day>\d{2}) T(?P<hour>\d{2}): (?P<minute>\d{2}): (?:(?P<second>\d{2}) (?:\.(?P<microsecond>\d{3}))?)?Z""", re.VERBOSE) m = r.match(value).groupdict() m = dict( [ (u, v and int(v) or 0) for u,v in m.items() ] ) m['microsecond'] = m['microsecond'] * 1000 log['date'] = datetime(**m) Converts an EPOCH timestamp to a human-readable date. log['date'] = datetime.utcfromtimestamp(float(value)) Converts a date as in 28-Feb-2010 23:15:54 . This format is used in BIND9 logs among others. Precision can be up to the millisecond. MONTHS = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] r = re.compile('(?P<day>\d+)-(?P<month>\w+)-(?P<year>\d{4}) (?P<hour>\d+):(?P<minute>\d+):(?P<second>\d+)(?:\.(?P<microsecond>\d+))?') m = r.match(value).groupdict() m['month'] = MONTHS.index(m['month']) + 1 m = dict( [ (u, v and int(v) or 0) for u,v in m.items() ] ) m['microsecond'] = m['microsecond'] * 1000 log['date'] = datetime(**m) pylogsparser-0.4/normalizers/normalizer.dtd0000644000175000017500000001706611705765631017456 0ustar fbofbo pylogsparser-0.4/normalizers/LEA.xml0000644000175000017500000002024311705765631015711 0ustar fbofbo This normalizer handles LEA (Log Export API) normalization. The LEA format is used by CheckPoint products to export logs to a LogBox. The formatting with | as a fields separator is due to the use of FW1-LogGrabber for log fetching. Due to the dynamic nature of this logging format, please refer to your product's documentation to find out more about tagging. Ce normaliseur analyse les logs émis en utilisant l'API d'export de logs (LEA). Cette API peut être utilisée pour la réception de logs en provenance d'équipements CheckPoint. Le formatage des champs séparés par le caractère | est dû à la récupération des logs via l'utilitaire FW1-LogGrabber. En raison de la nature dynamique de ce format de log, les tags extraits peuvent varier en fonction des événements consignés. Veuillez vous référer à la documentation de votre équipement exposant LEA pour de plus amples informations. mhu@wallix.com LEA fields as "key=value", separated by | Champs descriptifs au format "clé=valeur", séparés par le caractère | (?:[^ =]+=[^|]+|)*[^ =]+=[^|]+ # These are the only tags we extract KNOWN = [ ("loc", "id"), "product", "i/f_dir", "i/f_name", "orig", "type", "action", ("proto", "protocol"), "rule", "src", "dst", ("s_port", "source_port"), ("service", "dest_port"), ("uuid", "lea_uuid") ] def src_dst_extract(data): ip_re = re.compile("(?<![.0-9])((?:[0-9]{1,3}[.]){3}[0-9]{1,3})(?![.0-9])") if ip_re.match(data['src']): data['source_ip'] = data['src'] else: data['source_host'] = data['src'] if ip_re.match(data['dst']): data['dest_ip'] = data['dst'] else: data['dest_host'] = data['dst'] if ip_re.match(data['orig']): data['local_ip'] = data['orig'] else: data['local_host'] = data['orig'] del data['src'] del data['dst'] del data['orig'] def int_extract(data): if 'i/f_dir' in data.keys(): if data['i/f_dir'] == 'inbound': data['inbound_int'] = data['i/f_name'] if data['i/f_dir'] == 'outbound': data['outbound_int'] = data['i/f_name'] del data['i/f_dir'] del data['i/f_name'] dic = {} body = value.split('|') for l in body: key, val = l.split("=", 1) dic[key] = val # keep only known tags for t in KNOWN: if isinstance(t, basestring): t = (t,t) old, new = t if old in dic.keys(): log[new] = dic[old] # improve body readability log['body'] = log['body'].replace("|", " ") # Try to retrieve the date try: log['date'] = datetime.utcfromtimestamp(int(dic['time'])) except: try: log['date'] = datetime.strptime(dic['time'], "%Y-%m-%d %H:%M:%S") except: # cannot parse it, keep it safe log['time'] = dic['time'] src_dst_extract(log) int_extract(log) lea A list of key-value couples, separated by a | character. L'événement est décrit à l'aide d'une série de couples clé-valeur, séparés par le caractère |. LEAFIELDS a list of key-value couples, separated by a | character, needing some post-processing la liste des couples clé-valeur, à passer à une fonction de post-traitement LEAFIELDS decode_LEA loc=3707|time=1199716450|action=accept|orig=fw1|i/f_dir=inbound|i/f_name=PCnet1|has_accounting=0|uuid=<47822e42,00000001,7b040a0a,000007b6>|product=VPN-1 & FireWall-1|__policy_id_tag=product=VPN-1 & FireWall-1[db_tag={9F95C344-FE3F-4E3E-ACD8-60B5194BAAB4};mgmt=fw1;date=1199701916;policy_name=Standard]|src=naruto|s_port=56840|dst=fw1|service=https|proto=tcp|rule=1 3707 accept VPN-1 & FireWall-1 PCnet1 fw1 tcp 1 naruto fw1 56840 https firewall pylogsparser-0.4/normalizers/deny_event.xml0000644000175000017500000006205511705765631017457 0ustar fbofbo clo@wallix.com [-0-9a-z]* \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}[.]\d+ [^,]* Ugly hack for IPv6, IPv4 addresses (?:((([0-9A-Fa-f]{1,4}:){7}(([0-9A-Fa-f]{1,4})|:))|(([0-9A-Fa-f]{1,4}:){6}(:|((25[0-5]|2[0-4]\d|[01]?\d{1,2})(\.(25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})|(:[0-9A-Fa-f]{1,4})))|(([0-9A-Fa-f]{1,4}:){5}((:((25[0-5]|2[0-4]\d|[01]?\d{1,2})(\.(25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})?)|((:[0-9A-Fa-f]{1,4}){1,2})))|(([0-9A-Fa-f]{1,4}:){4}(:[0-9A-Fa-f]{1,4}){0,1}((:((25[0-5]|2[0-4]\d|[01]?\d{1,2})(\.(25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})?)|((:[0-9A-Fa-f]{1,4}){1,2})))|(([0-9A-Fa-f]{1,4}:){3}(:[0-9A-Fa-f]{1,4}){0,2}((:((25[0-5]|2[0-4]\d|[01]?\d{1,2})(\.(25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})?)|((:[0-9A-Fa-f]{1,4}){1,2})))|(([0-9A-Fa-f]{1,4}:){2}(:[0-9A-Fa-f]{1,4}){0,3}((:((25[0-5]|2[0-4]\d|[01]?\d{1,2})(\.(25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})?)|((:[0-9A-Fa-f]{1,4}){1,2})))|(([0-9A-Fa-f]{1,4}:)(:[0-9A-Fa-f]{1,4}){0,4}((:((25[0-5]|2[0-4]\d|[01]?\d{1,2})(\.(25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})?)|((:[0-9A-Fa-f]{1,4}){1,2})))|(:(:[0-9A-Fa-f]{1,4}){0,5}((:((25[0-5]|2[0-4]\d|[01]?\d{1,2})(\.(25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})?)|((:[0-9A-Fa-f]{1,4}){1,2})))|(((25[0-5]|2[0-4]\d|[01]?\d{1,2})(\.(25[0-5]|2[0-4]\d|[01]?\d{1,2})){3})))(%.+)?)? TYPES = { "1" : "Resources", "2" : "System", "3" : "Configuration", "4" : "Security", "5" : "Backend", "6" : "Acceleration" } log['alert_type'] = TYPES.get(value, "Unknown") SUBTYPES = { "1.1" : "CPU", "1.2" : "Memory", "2.1" : "Access", "2.2" : "Device Operations", "3.1" : "Configuration change", "3.2" : "Backup and Restore", "4.1" : "HTTP Security", "4.2" : "XML Security", "4.3" : "Authentication", "5.1" : "Backend availability", "5.2" : "Backend performances", "6.1" : "Server Load-Balancing", "6.2" : "Caching" } MESSAGES = { # Resources "1.1.6.0" : "CPU utilization below 60%", "1.1.4.0" : "CPU utilization over 60%", "1.1.2.0" : "CPU utilization over 80%", "1.2.6.0" : "Memory utilization below 70%", "1.2.4.0" : "Memory utilization over 70%", "1.2.2.0" : "Memory utilization over 90%", # System "2.1.6.0" : "User logout", "2.1.4.0" : "User successful login", "2.1.2.0" : "User failed login attempts", "2.2.6.0" : "Instance started", "2.2.5.0" : "rWeb started", "2.2.4.0" : "Instance stopped", "2.2.2.0" : "rWeb stopped", # Configuration "3.1.5.0" : "Configuration change successful", "3.1.3.0" : "Configuration change failed", "3.2.5.0" : "Configuration backup successful", "3.2.5.1" : "Configuration restore successful", "3.2.3.0" : "Configuration backup failure", "3.2.2.0" : "Configuration restore failure", # Security "4.1.4.0" : "Attack blocked by Blacklist", "4.1.4.1" : "Attack blocked by Whitelist", "4.1.4.2" : "Attack blocked by Scoringlist", "4.1.4.3" : "Attack blocked by UBT (DoS protection)", "4.1.4.4" : "Attack blocked by UBT (Site Crawling)", "4.1.4.5" : "Attack blocked by UBT (Brute Force)", "4.1.4.6" : "Attack blocked by UBT (Cookie Theft)", "4.1.4.7" : "Attack blocked by UBT (Direct Access)", "4.1.4.8" : "Attack blocked by UBT (Restricted Access)", "4.1.4.9" : "Attack blocked by Stateful engine (Link Tracking)", "4.1.4.10" : "Attack blocked by Stateful engine (Parameter Tracking)", "4.1.4.11" : "Attack blocked by Stateful engine (Cookie Tracking)", "4.1.4.12" : "Attack blocked by Canonization engine (URI Wrong Encoding)", "4.1.4.13" : "Attack blocked by Canonization engine (URI Decoding)", "4.1.4.14" : "Attack blocked by Canonization engine (Parameter Decoding)", "4.1.4.15" : "Attack blocked by HTTP requests filter (Forbidden Method)", "4.1.4.16" : "Attack blocked by HTTP requests filter (Header Size)", "4.1.4.17" : "Attack blocked by HTTP requests filter (Body Size)", "4.1.4.18" : "Attack blocked by HTTP requests filter (Number of Request Fields)", "4.1.4.19" : "Attack blocked by HTTP requests filter (Size of Request Fields)", "4.1.4.20" : "Attack blocked by HTTP requests filter (Number of Request Lines)", "4.1.4.21" : "Attack blocked by HTTP responses filter", "4.2.4.0" : "Attack blocked by Blacklist", "4.2.4.1" : "Attack blocked by Scoringlist", "4.2.4.2" : "Attack blocked by XML Schema validation engine", "4.2.4.3" : "Attack blocked by Stateful engine", "4.2.4.4" : "Attack blocked by Canonization engine", "4.2.4.5" : "Attack blocked by Attachment validation engine", "4.2.4.6" : "Attack blocked by Source filtering engine", "4.3.5.0" : "Authentication successful", "4.3.3.1" : "Authentication failed", # Backend "5.1.6.0" : "Server available", "5.1.1.0" : "Server error response", "5.1.0.0" : "Server not response", "5.2.6.0" : "Response time < 70% of maximum allowed", "5.2.4.0" : "Response time 70% of maximum allowed", "5.2.2.0" : "Response time 90% of maximum allowed", # Acceleration "6.1.6.0" : "Server back in farm", "6.1.2.0" : "Server down, removed from farm", "6.1.0.0" : "All servers down", "6.2.6.0" : "Cache utilization < 70%", "6.2.4.0" : "Cache 70% full", "6.2.2.0" : "Cache 90% full", "6.2.1.0" : "Cache 100% full, increase cache size", } event_subtype = log['alert_type_id'] + "." + log['alert_subtype_id'] log['alert_subtype'] = SUBTYPES.get(event_subtype, "Unknown") event_id = event_subtype + "." + log['severity_code'] + "." + log['alert_id'] log['event'] = MESSAGES.get(event_id, "Unknown") SEVERITIES=["Emerg", "Alert", "Crit", "Error", "Warn", "Notice", "Info", "Debug"] try: log['severity'] = SEVERITIES[int(value)] except: # no big deal if we don't get this one pass EVENT_UID,START_DATE,END_DATE,ACKDATE,ACKUSER,IP_DEVICE,IP_SOURCE,TARGET_IP,ALERT_TYPE_ID,ALERT_SUBTYPE_ID,SEVERITY,ALERT_ID,ALERT_VALUE,USER,INTERFACE,OBJECT_NAME,PARAMETER_CHANGED,PREVIOUS_VALUE,NEW_VALUE,UNKNOWN1,UNKNOWN2,UNKNOWN3,UUID_BLACKLIST,UUID_POLICY,UUID_APP,ACTION,HTTP_METHOD_USED,URL,PARAMETERS,URI,ATTACK_ID,ATTACK_USER,AUTH_MECHANISM,UNKNOWN4,UNKNOWN5,UNKNOWN6,UNKNOWN7 EVENT_UID START_DATE YYYY-MM-DD hh:mm:ss END_DATE ACKDATE ACKUSER IP_DEVICE IP_SOURCE TARGET_IP ALERT_TYPE_ID decode_alert_type ALERT_SUBTYPE_ID SEVERITY decode_severity ALERT_ID ALERT_VALUE USER INTERFACE OBJECT_NAME PARAMETER_CHANGED PREVIOUS_VALUE NEW_VALUE UNKNOWN1 UNKNOWN2 UNKNOWN3 UNKNOWN4 UNKNOWN5 UNKNOWN6 UNKNOWN7 UUID_BLACKLIST UUID_POLICY UUID_APP ACTION HTTP_METHOD_USED URL PARAMETERS URI ATTACK_ID ATTACK_USER AUTH_MECHANISM 228,2011-01-24 18:08:06.957252,2011-01-24 18:08:06.957252,,,192.168.80.10,192.168.80.1,,4,1,4,0,,,,,,,,,,,11111111-1111-1111-1111-111111111111,7ed198ca-26d5-11e0-a46f-000c298895c5,d74ca776-265b-11e0-a54a-000c298895c5,deny,GET,/cgi-bin/badstore.cgi?searchquery=1%27+OR+1%3D1+%23&action=search&x=0&y=0,GET /cgi-bin/badstore.cgi?searchquery=1' OR 1=1 #&action=search&x=0&y=0,(uri) ,11230-0 ,,,,,, 228 192.168.80.10 192.168.80.1 4 1 4 0 Security HTTP Security Attack blocked by Blacklist 11111111-1111-1111-1111-111111111111 7ed198ca-26d5-11e0-a46f-000c298895c5 d74ca776-265b-11e0-a54a-000c298895c5 deny GET /cgi-bin/badstore.cgi?searchquery=1%27+OR+1%3D1+%23&action=search&x=0&y=0 GET /cgi-bin/badstore.cgi?searchquery=1' OR 1=1 #&action=search&x=0&y=0 (uri) 11230-0 firewall decode_message pylogsparser-0.4/normalizers/pam.xml0000644000175000017500000002050211705765631016063 0ustar fbofbo This normalizer can parse messages issued by the Pluggable Authentication Module (PAM). Ce normaliseur analyse les messages émis par le module d'authentification par greffons (PAM). mhu@wallix.com the name of the PAM component le nom du composant PAM pam_\w+ the user information l'utilisateur concerné par l'authentification [^ ]+ the session action l'action de session opened|closed log["action"] = {'opened' : 'open', 'closed' : 'close'}.get(value, value) This type of message is logged at session opening or closing. Structure des message émis à l'ouverture ou la fermeture de session. PAMCOMPONENT\(PROGRAM:TYPE\):.* session ACTION for user USER the PAM component le composant PAM PAMCOMPONENT the program calling PAM le programme invoquant l'authentification via PAM PROGRAM the authentication type le type d'authentification TYPE the action taken regarding the session l'action associée à la session ACTION decode_action the user for which an authentication request is issued l'utilisateur pour lequel la demande d'authentification est émise USER pam_unix(cron:session): session opened for user www-data by (uid=0) cron pam_unix session open www-data access control A generic PAM message. Structure générique des messages PAM non relatifs à une ouverture ou fermeture de session. PAMCOMPONENT\(PROGRAM:TYPE\):.* (?:user=USER)? the PAM component le composant PAM PAMCOMPONENT the program calling PAM le programme invoquant l'authentification via PAM PROGRAM the authentication type le type d'authentification TYPE the user for which an authentication request is issued l'utilisateur pour lequel la demande d'authentification est émise USER pylogsparser-0.4/normalizers/IIS.xml0000644000175000017500000004740011705765631015740 0ustar fbofbo This normalizer handles IIS 6.0 (Internet Information Service) logs, which are in w3c ELFF (Extended Log File Format). Ce normaliseur gère les logs IIS 6.0, qui sont au format w3c ELFF (Extended Log File Format). clo@wallix.com Expression matching a w3c ELFF format field which is any non-whitespace character. Expression correspondant à un champ du format w3c ELFF, correspondant à tous les caractères 'non-espace' (ex.: espace, tabulation, saut de ligne, etc...) [^\s]+|- Expression matching a date in the yyyy-mm-dd hh:mm:ss format. Expression correspondant à une date au format yyyy-mm-dd hh:mm:ss. [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{1,2}:[0-9]{2}:[0-9]{2} Expression matching a date in mm/dd/yy format and a time in hh:mm:ss format. Expression correspondant à une date au format mm/dd/yy et une heure au format hh:mm:ss. [0-9]{2}/[0-9]{2}/[0-9]{2}, [0-9]{2}:[0-9]{2}:[0-9]{2} value = float(value) value = value / 1000 log['time_taken'] = value This is the default log format in w3c ELFF format. Log par défaut au format w3c ELFF. DATE\s+SERVICE\s+SERVER_IP\s+REQUEST_TYPE\s+RESOURCE\s+QUERY\s+PORT\s+USERNAME\s+CLIENT_IP\s+AGENT\s+ACTION_STATUS\s+SUB_STATUS\s+WIN_STATUS The date that the activity occurred. La date de l'évènement. DATE YYYY-MM-DD hh:mm:ss The Internet service and instance number that was accessed by a client. Le service et le numéro de la demande du client. SERVICE The IP address of the server on which the log entry was generated. IP du serveur. SERVER_IP The action the client was trying to perform. Nom de la méthode. Comme GET, PASS, etc.. REQUEST_TYPE The resource accessed. La cible de l'opération. RESOURCE The query, if any, the client was trying to perform. La requête que le client a tenté. QUERY The port number the client is connected to. Le port auquel le client est connecté. PORT The name of the authenticated user who accessed your server. This does not include anonymous users, who are represented by a hyphen (-). Le nom de l'utilisateur ayant accédé au serveur. USERNAME The IP address of the client that accessed your server. L'adresse IP du client. CLIENT_IP The browser used on the client. Le navigateur utilisé. AGENT The status of the action, in HTTP or FTP terms. Le status de l'action. ACTION_STATUS The substatus of the action. Le sous-status de l'action. SUB_STATUS The status of the action, in terms used by Microsoft Windows®. Le status de l'action, avec les termes de Windows. WIN_STATUS 2011-09-26 13:57:48 W3SVC1 127.0.0.1 GET /tapage.asp - 80 - 127.0.0.1 Mozilla/4.0+(compatible;MSIE+6.0;+windows+NT5.2;+SV1;+.NET+CLR+1.1.4322) 404 0 2 W3SVC1 127.0.0.1 GET /tapage.asp - 80 - 127.0.0.1 Mozilla/4.0+(compatible;MSIE+6.0;+windows+NT5.2;+SV1;+.NET+CLR+1.1.4322) 404 0 2 web server This is a log format in w3c ELFF format. Format de log au format w3c ELFF. CLIENT_IP,\s*USERNAME,\s*DATE,\s*SERVICE_NAME,\s*SERVER_NAME,\s*SERVER_IP,\s*TIME_TAKEN,\s*CLIENT_BYTES_SENT,\s*SERVER_BYTES_SENT,\s*SERVICE_STATUS_CODE,\s*WINDOWS_STATUS_CODE,\s*REQUEST_TYPE,\s*TARGET_OF_OPERATION,\s*PARAMETERS,\s* The IP address of the client that accessed your server. L'adresse du client ayant accédé au serveur. CLIENT_IP The name of the authenticated user who accessed your server. This does not include anonymous users, who are represented by a hyphen (-). Le nom de l'utilisateur ayant accédé au serveur. USERNAME The date that the activity occurred. La date de l'évènement. DATE MM/DD/YY, hh:mm:ss The Internet service and instance number that was accessed by a client. Le service et le numéro de la demande du client. SERVICE_NAME Server's hostname. Le nom du serveur. SERVER_NAME The IP address of the server on which the log entry was generated. IP du serveur. SERVER_IP Elapsed time to complete request. Temps écoulé pour réaliser la requête. TIME_TAKEN convert_time Number of bytes sent by client. (Request size) Taille de la requête. CLIENT_BYTES_SENT Number of bytes returned by the server. Nombre d'octets retournés par le serveur. SERVER_BYTES_SENT Service status code. (A value of 200 indicates that the request was fulfilled successfully.) Code de status du service. SERVICE_STATUS_CODE Windows status code. (A value of 0 indicates that the request was fulfilled successfully.) code de status de Windows. WINDOWS_STATUS_CODE Method name. Such as GET, PASS, ... Nom de la méthode. Comme GET, PASS, etc.. REQUEST_TYPE The target of the operation. La cible de l'opération. TARGET_OF_OPERATION the parameters that are passed to a script if any. Les paramètres du script s'il y en a un. PARAMETERS 172.16.255.255, anonymous, 03/20/01, 23:58:11, MSFTPSVC, SALES1, 172.16.255.255, 60, 275, 0, 0, 0, PASS, /Intro.htm, -, 172.16.255.255 anonymous MSFTPSVC SALES1 172.16.255.255 275 0 0 0 PASS /Intro.htm - web server pylogsparser-0.4/normalizers/VMWare_ESX4-ESXi4.xml0000644000175000017500000001603111705765631020146 0ustar fbofbo This normalizer parses VMware ESX 4.x and ESXi 4.x logs that are not handled by the Syslog normalizer. Ce normaliseur analyse les logs de VMware ESX 4.x et ESXi 4.x. qui ne sont pas gérés pas le normaliseur Syslog. clo@wallix.com Expression matching a date in the format yyyy-mm-dd hh:mm:ss. Expression correspondant à une date au format yyyy-mm-dd hh:mm:ss. \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3} Expression matching a hexadecimal number. Expression correspondant à un nombre héxadécimal. [A-F0-9]{8} Expression matching the 'alpha' field, words between '. Expression correspondant au champ 'alpha', qui contient les mots entre '. [^']+(?: [^']+)* Expression matching the 'level' field. Expression correspondant au champ 'level'. [^\s]+ Logs contained in hostd.log file. Logs contenus dans le fichier hostd.log. \[DATE NUMERIC LEVEL 'ALPHA'[^\]]*\] BODY The time at which the request was issued - please note that the timezone information is not carried over. La date à laquelle la requête a été émise. Veuillez noter que l'information de fuseau horaire n'est pas prise en compte. DATE YYYY-MM-DD hh:mm:ss NUMERIC The level is the type of the log. Le level correspond au type du log. LEVEL ALPHA The actual event message. Le message décrivant l'événement. BODY [2011-09-05 17:03:15.220 F6F74B90 verbose 'App'] [VpxaMoVm::CheckMoVm] did not find a VM with ID 67 in the vmList F6F74B90 verbose App [VpxaMoVm::CheckMoVm] did not find a VM with ID 67 in the vmList hypervisor [2011-09-05 17:19:21.741 F63E6900 info 'Vmomi' opID=996867CC-0000030B] Throw vmodl.fault.RequestCanceled F63E6900 info Vmomi Throw vmodl.fault.RequestCanceled hypervisor Logs contained in sysboot.log file. Log contenu dans le fichier sysboot.log. sysboot: EVENT The actual event message. Le message décrivant l'événement. EVENT sysboot: Executing 'esxcfg-init --set-boot-progress done' Executing 'esxcfg-init --set-boot-progress done' hypervisor pylogsparser-0.4/INSTALL0000644000175000017500000000035711627706147013256 0ustar fbofboStart pylogsparser unittests --------------------------- The command below will start logsparser test suite. $ NORMALIZERS_PATH=normalizers/ python tests/test_suite.py Install pylogsparser ------------------- # python setup.py install pylogsparser-0.4/README.rst0000644000175000017500000004035011715703401013675 0ustar fbofboLogsParser ========== Description ::::::::::: LogsParser is an opensource python library created by Wallix ( http://www.wallix.org ). It is used as the core mechanism for logs tagging and normalization by Wallix's LogBox ( http://www.wallix.com/index.php/products/wallix-logbox ). Logs come in a variety of formats. In order to parse many different types of logs, a developer used to need to write an engine based on a large list of complex regular expressions. It can become rapidly unreadable and unmaintainable. By using LogsParser, a developer can free herself from the burden of writing a log parsing engine, since the module comes in with "batteries included". Furthermore, this engine relies upon XML definition files that can be loaded at runtime. The definition files were designed to be easily readable and need very little skill in programming or regular expressions, without sacrificing powerfulness or expressiveness. Purpose ::::::: The LogsParser module uses normalization definition files in order to tag log entries. The definition files are written in XML. The definition files allow anyone with a basic understanding of regular expressions and knowledge of a specific log format to create and maintain a customized pool of parsers. Basically a definition file will consist of a list of log patterns, each composed of many keywords. A keyword is a placeholder for a notable and/or variable part in the described log line, and therefore associated to a tag name. It is paired to a tag type, e.g. a regular expression matching the expected value to assign to this tag. If the raw value extracted this way needs further processing, callback functions can be applied to this value. This format also allows to add useful meta-data about parsed logs, such as extensive documentation about expected log patterns and log samples. Format Description ------------------ A normalization definition file must strictly follow the specifications as they are detailed in the file normalizer.dtd . A simple template is provided to help parser writers get started with their task, called normalizer.template. Most definition files will include the following sections : * Some generic documentation about the parsed logs : emitting application, application version, etc ... (non-mandatory) * the definition file's author(s) (non-mandatory) * custom tag types (non-mandatory) * callback functions (non-mandatory) * Prerequisites on tag values prior to parsing (non-mandatory) * Log pattern(s) and how they are to be parsed * Extra tags with a fixed value that should be added once the parsing is done (non-mandatory) Root .... The definition file's root must hold the following elements : * the normalizer's name. * the normalizer's version. * the flags to apply to the compilation of regular expressions associated with this parser : unicode support, multiple lines support, and ignore case. * how to match the regular expression : from the beginning of the log line (match) or from anywhere in the targeted tag (search) * the tag value to parse (raw, body...) * the service taxonomy, if relevant, of the normalizer. See the end of this document for more details. Default tag types ................. A few basic tag types are defined in the file common_tagTypes.xml . In order to use it, it has to be loaded when instantiating the Normalizer class; see the class documentation for further information. Here is a list of default tag types shipped with this library. * Anything : any character chain of any length. * Integer * EpochTime : an EPOCH timestamp of arbitrary precision (to the second and below). * syslogDate : a date as seen in syslog formatted logs (example : Mar 12 20:13:23) * URL * MACAddress * Email * IP * ZuluTime : a "Zulu Time"-type timestamp (example : 2012-12-21T13:45:05) Custom Tag Types ................ It is always possible to define new tag types in a parser definition file, and to overwrite default ones. To define a new tag type, the following elements are needed : * a type name. This will be used as the type reference in log patterns. * the python type of the expected result : this element is not used yet and can be safely set to anything. * a non-mandatory description. * the regular expression defining this type. Callback Functions .................. One might want to transform a raw value after it has been extracted from a pattern: the syslog normalizer converts the raw log timestamp into a python datetime object, for example. In order to do this, the tag must be used to define a callback function. requires a function name as a mandatory attribute. Its text defines the function body as in python, meaning the PEP8 indentation rules are to be followed. When writing a callback function, the following rules must be respected : * Your callback function will take ONLY two arguments: **value** and **log**. "value" is the raw value extracted from applying the log pattern to the log, and "log" is the dictionary of the normalized log in its current state (prior to normalization induced by this parser definition file). * Your callback function can modify the "log" argument (especially assign the transformed value to the concerned tag name) and must not return anything. * Your callback function has a restricted access to the following facilities: :: "list", "dict", "tuple", "set", "long", "float", "object", "bool", "callable", "True", "False", "dir", "frozenset", "getattr", "hasattr", "abs", "cmp", "complex", "divmod", "id", "pow", "round", "slice", "vars", "hash", "hex", "int", "isinstance", "issubclass", "len", "map", "filter", "max", "min", "oct", "chr", "ord", "range", "reduce", "repr", "str", "unicode", "basestring", "type", "zip", "xrange", "None", "Exception" * Importing modules is therefore forbidden and impossible. The *re* and *datetime* modules are available for use as if the following lines were present: :: import re from datetime import datetime * In version 0.4, the "extras" package is introduced. It allows more freedom in what can be used in callbacks. It also increases execution speed in some cases; typically when you need to use complex objects in your callback like a big set or a big regular expression. In the old approach, this object would be created each time the function is called; by deporting the object's creation in the extras package it is created once and for all. See the modules in logsparser.extras for use cases. Default callbacks ................. As with default tag types, a few generic callbacks are defined in the file common_callBacks.xml . Currently they are meant to deal with common date formattings. Therefore they will automatically set the "date" tag. In order to use it, the callbacks file has to be loaded when instantiating the Normalizer class; see the class documentation for further information. In case of name collisions, callbacks defined in a normalizer description file take precedence over common callbacks. Here is a list of default callbacks shipped with this library. * MM/dd/YYYY hh:mm:ss : parses dates such as 04/13/2010 14:23:56 * dd/MMM/YYYY:hh:mm:ss : parses dates such as 19/Jul/2009 12:02:43 * MMM dd hh:mm:ss : parses dates such as Oct 23 10:23:12 . The year is guessed so that the resulting date is the closest in the past. * DDD MMM dd hh:mm:ss YYYY : parses dates such as Mon Sep 11 09:13:54 2011 * YYYY-MM-DD hh:mm:ss : parses dates such as 2012-12-21 00:00:00 * MM/DD/YY, hh:mm:ss : parses dates such as 10/23/11, 07:24:04 . The year is assumed to be in the XXIst century. * YYMMDD hh:mm:ss: parses dates such as 070811 17:23:12 . The year is assumed to be in the XXIst century. * ISO8601 : converts a combined date and time in UTC expressed according to the ISO 8601 standard. Also commonly referred to as "Zulu Time". * EPOCH : parses EPOCH timestamps * dd-MMM-YYYY hh:mm:ss : parses dates such as 28-Feb-2010 23:15:54 Final callbacks ............... One might want to wait until a pattern has been fully applied before processing data : if for example you'd like to tag a log with a value made of a concatenation of other values, and so on. It is possible to specify a list of callbacks to apply at the end of the parsing with the XML tag "finalCallbacks". Such callbacks will follow the mechanics described above, with one notable change: they will be called with the argument "value" set to None. Therefore, you have to make sure your callback will work correctly that way. There are a few examples of use available : in the test_normalizer.py test code, and in the deny_all normalizer. Pattern definition .................. A definition file can contain as many log patterns as one sees fit. These patterns are simplified regular expressions and applied in alphabetical order of their names, so it is important to name them so that the more precise patterns are tried before the more generic ones. A pattern is a "meta regular expression", which means that every syntactic rule from python's regular expressions are to be followed when writing a pattern, especially escaping special characters. To make the patterns easier to read than an obtuse regular expression, keywords act as "macros" and correspond to a part of the log to assign to a tag. A log pattern has the following components: * A name. * A non-mandatory description of the pattern's context. * The pattern itself, under the tag "text". * The tags as they appear in the pattern, the associated name once the normalization is over, and the callback functions to eventually call on their raw values * Non-mandatory log samples. These can be used for self-validation. If a tag name starts with __ (double underscore), this tag won't be added to the final normalized dictionary. This allows to create temporary tags that will typically be used in conjunction to a series of callback functions, when the original raw value has no actual interest. To define log patterns describing a CSV-formatted message, one must add the following attributes in the tag "text": * type="csv" * separator="," or the relevant separator character * quotechar='"' or the relevant quotation character Tags are then defined normally. Pylogsparser will deal automatically with missing fields. Best practices .............. * Order your patterns in decreasing order of specificity. Not doing so might trigger errors, as more generic patterns will match earlier. * The more precise your tagTypes' regular expressions, the more accurate your parser will be. * Use description tags liberally. The more documented a log format, the better. Examples are also invaluable. Tag naming convention ..................... The tag naming convention is lowercase, underscore separated words. It is strongly recommended to stick to that naming convention when writing new normalizers for consistency's sake. In case of dynamic fields, it is advised to make sure dynamic naming follows the convention. There's an example of this in MSExchange2007MessageTracking.xml; see the callback named "decode_MTLSourceContext". Log contains common informations such as username, IP address, informations about transport protocol... In order to ease log post-processing we must define a common method to name those tags and not deal for example with a series of "login, user, username, userid" all describing a user id. The alphabetical list below is a series of tag names that must be used when relevant. - action : action taken by a component such as DELETED, migrated, DROP, open. - bind_int : binding interface for a network service. - dest_host : hostname or FQDN of a destination host. - dest_ip : IP address of a destination host. - dest_mac : MAC address of a destination host. - dest_port : destination port of a network connection. - event_id : id describing an event. - inbound_int : network interface for incoming data. - len : a data size. - local_host : hostname or FQDN of the local host. - local_ip : IP adress of the local host. - local_mac : MAC address of the local host. - local_port : listening port of a local service. - message_id : message or transaction id. - message_recipient : message recipient id. - message_sender : message sender id. - method : component access method such as GET, key_auth. - outbound_int : network interface for outgoing data. - protocol : network or software protocol name or numeric id such as TCP, NTP, SMTP. - source_host : hostname or FQDN of a source host. - source_ip : IP address of a source host. - source_mac : MAC address of a source host. - source_port : source port of a network connection. - status : component status such as FAIL, success, 404. see below for a complete list. - url : an URL as defined in rfc1738. (scheme://netloc/path;parameters?query#fragment) - user : a user id. Service taxonomy ................ As of pylogsparser 0.4 a taxonomy tag is added to relevant normalizers. It helps classifying logs by service type, which can be useful for reporting among other things. Here is a list of identified services; suggestions and improvements are welcome ! +-----------+----------------------------------------+------------------------+ | Service | Description | Normalizers | +===========+========================================+========================+ | access | A service dealing with authentication | Fail2ban | | control | and/or authorization | pam | | | | sshd | | | | wabauth | +-----------+----------------------------------------+------------------------+ | antivirus | A service dealing with malware | bitdefender | | | detection and prevention | symantec | +-----------+----------------------------------------+------------------------+ | database | A database service such as mySQLd, | mysql | | | postmaster (postGRESQL), ... | | +-----------+----------------------------------------+------------------------+ | address | A service in charge of network address | dhcpd | |assignation| assignations | | +-----------+----------------------------------------+------------------------+ | name | A service in charge of network names | named | | resolution| resolutions | named-2 | +-----------+----------------------------------------+------------------------+ | firewall | A service in charge of monitoring | LEA | | | and filtering network traffic | arkoonFAST360 | | | | deny_event | | | | netfilter | +-----------+----------------------------------------+------------------------+ | file | A file transfer service | xferlog | | transfer | | | +-----------+----------------------------------------+------------------------+ | hypervisor| A virtualization platform service | VMWare_ESX4-ESXi4 | | | | | +-----------+----------------------------------------+------------------------+ | mail | A mail server | MSExchange2007- | | | | MessageTracking | | | | postfix | +-----------+----------------------------------------+------------------------+ | web proxy | A service acting as an intermediary | dansguardian | | | between clients and web resources; | deny_traffic | | | access control and content filtering | squid | | | can also occur | | +-----------+----------------------------------------+------------------------+ | web server| A service exposing web resources | IIS | | | | apache | +-----------+----------------------------------------+------------------------+ pylogsparser-0.4/LICENSE0000644000175000017500000006364211627706147013240 0ustar fbofbo GNU LESSER GENERAL PUBLIC LICENSE Version 2.1, February 1999 Copyright (C) 1991, 1999 Free Software Foundation, Inc. 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. [This is the first released version of the Lesser GPL. It also counts as the successor of the GNU Library Public License, version 2, hence the version number 2.1.] Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This license, the Lesser General Public License, applies to some specially designated software packages--typically libraries--of the Free Software Foundation and other authors who decide to use it. You can use it too, but we suggest you first think carefully about whether this license or the ordinary General Public License is the better strategy to use in any particular case, based on the explanations below. When we speak of free software, we are referring to freedom of use, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish); that you receive source code or can get it if you want it; that you can change the software and use pieces of it in new free programs; and that you are informed that you can do these things. To protect your rights, we need to make restrictions that forbid distributors to deny you these rights or to ask you to surrender these rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library or if you modify it. For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link other code with the library, you must provide complete object files to the recipients, so that they can relink them with the library after making changes to the library and recompiling it. And you must show them these terms so they know their rights. We protect your rights with a two-step method: (1) we copyright the library, and (2) we offer you this license, which gives you legal permission to copy, distribute and/or modify the library. To protect each distributor, we want to make it very clear that there is no warranty for the free library. Also, if the library is modified by someone else and passed on, the recipients should know that what they have is not the original version, so that the original author's reputation will not be affected by problems that might be introduced by others. Finally, software patents pose a constant threat to the existence of any free program. We wish to make sure that a company cannot effectively restrict the users of a free program by obtaining a restrictive license from a patent holder. Therefore, we insist that any patent license obtained for a version of the library must be consistent with the full freedom of use specified in this license. Most GNU software, including some libraries, is covered by the ordinary GNU General Public License. This license, the GNU Lesser General Public License, applies to certain designated libraries, and is quite different from the ordinary General Public License. We use this license for certain libraries in order to permit linking those libraries into non-free programs. When a program is linked with a library, whether statically or using a shared library, the combination of the two is legally speaking a combined work, a derivative of the original library. The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom. The Lesser General Public License permits more lax criteria for linking other code with the library. We call this license the "Lesser" General Public License because it does Less to protect the user's freedom than the ordinary General Public License. It also provides other free software developers Less of an advantage over competing non-free programs. These disadvantages are the reason we use the ordinary General Public License for many libraries. However, the Lesser license provides advantages in certain special circumstances. For example, on rare occasions, there may be a special need to encourage the widest possible use of a certain library, so that it becomes a de-facto standard. To achieve this, non-free programs must be allowed to use the library. A more frequent case is that a free library does the same job as widely used non-free libraries. In this case, there is little to gain by limiting the free library to free software only, so we use the Lesser General Public License. In other cases, permission to use a particular library in non-free programs enables a greater number of people to use a large body of free software. For example, permission to use the GNU C Library in non-free programs enables many more people to use the whole GNU operating system, as well as its variant, the GNU/Linux operating system. Although the Lesser General Public License is Less protective of the users' freedom, it does ensure that the user of a program that is linked with the Library has the freedom and the wherewithal to run that program using a modified version of the Library. The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, whereas the latter must be combined with the library in order to run. GNU LESSER GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any software library or other program which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Lesser General Public License (also called "this License"). Each licensee is addressed as "you". A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables. The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".) "Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library. Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does. 1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) The modified work must itself be a software library. b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change. c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License. d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful. (For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library. In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2, instead of to this License. (If a newer version than version 2 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices. Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy. This option is useful when you wish to copy part of the code of the Library into a program that is not a library. 4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange. If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code. 5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License. However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables. When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law. If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.) Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself. 6. As an exception to the Sections above, you may also combine or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications. You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things: a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.) b) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (1) uses at run time a copy of the library already present on the user's computer system, rather than copying library functions into the executable, and (2) will operate properly with a modified version of the library, if the user installs one, as long as the modified version is interface-compatible with the version that the work was made with. c) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution. d) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place. e) Verify that the user has already received a copy of these materials or that you have already sent this user a copy. For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the materials to be distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute. 7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above. b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it. 10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties with this License. 11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 13. The Free Software Foundation may publish revised and/or new versions of the Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation. 14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Libraries If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License). To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Also add information on how to contact you by electronic and paper mail. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the library `Frob' (a library for tweaking knobs) written by James Random Hacker. , 1 April 1990 Ty Coon, President of Vice That's all there is to it! pylogsparser-0.4/tests/0000755000175000017500000000000011715707344013360 5ustar fbofbopylogsparser-0.4/tests/test_norm_chain_speed.py0000644000175000017500000000344511627706151020271 0ustar fbofbo# -*- python -*- # pylogsparser - Logs parsers python library # # Copyright (C) 2011 Wallix Inc. # # This library is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by the # Free Software Foundation; either version 2.1 of the License, or (at your # option) any later version. # # This library is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more # details. # # You should have received a copy of the GNU Lesser General Public License # along with this library; if not, write to the Free Software Foundation, Inc., # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA import os import timeit from logsparser.lognormalizer import LogNormalizer if __name__ == "__main__": path = os.environ['NORMALIZERS_PATH'] ln = LogNormalizer(path) def test(): l = {'raw' : "<29>Jul 18 08:55:35 naruto squid[3245]: 1259844091.407 307 82.238.42.70 TCP_MISS/200 1015 GET http://www.ietf.org/css/ietf.css fbo DIRECT/64.170.98.32 text/css" } l = ln.uuidify(l) ln.normalize(l) print "Testing speed ..." t = timeit.Timer("test()", "from __main__ import test") speed = t.timeit(100000)/100000 print "%.2f microseconds per pass, giving a theoretical speed of %i logs/s." % (speed * 1000000, 1 / speed) print "Testing speed with minimal normalization ..." ln.set_active_normalizers({'syslog' : True}) ln.reload() t = timeit.Timer("test()", "from __main__ import test") speed = t.timeit(100000)/100000 print "%.2f microseconds per pass, giving a theoretical speed of %i logs/s." % (speed * 1000000, 1 / speed) pylogsparser-0.4/tests/test_normalizer.py0000644000175000017500000004452211710267252017154 0ustar fbofbo# -*- python -*- # pylogsparser - Logs parsers python library # # Copyright (C) 2011 Wallix Inc. # # This library is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by the # Free Software Foundation; either version 2.1 of the License, or (at your # option) any later version. # # This library is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more # details. # # You should have received a copy of the GNU Lesser General Public License # along with this library; if not, write to the Free Software Foundation, Inc., # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # import os import unittest from datetime import datetime from logsparser.normalizer import Normalizer, TagType, Tag, CallbackFunction, CSVPattern, get_generic_tagTypes from lxml.etree import parse, DTD from StringIO import StringIO class TestSample(unittest.TestCase): """Unit tests for logsparser.normalize. Validate sample log example""" normalizer_path = os.environ['NORMALIZERS_PATH'] def normalize_samples(self, norm, name, version): """Test logparser.normalize validate for syslog normalizer.""" # open parser n = parse(open(os.path.join(self.normalizer_path, norm))) # validate DTD dtd = DTD(open(os.path.join(self.normalizer_path, 'normalizer.dtd'))) dtd.assertValid(n) # Create normalizer from xml definition normalizer = Normalizer(n, os.path.join(self.normalizer_path, 'common_tagTypes.xml'), os.path.join(self.normalizer_path, 'common_callBacks.xml')) self.assertEquals(normalizer.name, name) self.assertEquals(normalizer.version, version) self.assertTrue(normalizer.validate()) def test_normalize_samples_001_syslog(self): self.normalize_samples('syslog.xml', 'syslog', 0.99) def test_normalize_samples_002_apache(self): self.normalize_samples('apache.xml', 'apache', 0.99) def test_normalize_samples_003_dhcpd(self): self.normalize_samples('dhcpd.xml', 'DHCPd', 0.99) def test_normalize_samples_004_lea(self): self.normalize_samples('LEA.xml', 'LEA', 0.99) def test_normalize_samples_005_netfilter(self): self.normalize_samples('netfilter.xml', 'netfilter', 0.99) def test_normalize_samples_006_pam(self): self.normalize_samples('pam.xml', 'PAM', 0.99) def test_normalize_samples_007_postfix(self): self.normalize_samples('postfix.xml', 'postfix', 0.99) def test_normalize_samples_008_squid(self): self.normalize_samples('squid.xml', 'squid', 0.99) def test_normalize_samples_009_sshd(self): self.normalize_samples('sshd.xml', 'sshd', 0.99) def test_normalize_samples_010_named(self): self.normalize_samples('named.xml', 'named', 0.99) def test_normalize_samples_011_named2(self): self.normalize_samples('named-2.xml', 'named-2', 0.99) def test_normalize_samples_012_symantec(self): self.normalize_samples('symantec.xml', 'symantec', 0.99) def test_normalize_samples_013_msexchange2007MTL(self): self.normalize_samples('MSExchange2007MessageTracking.xml', 'MSExchange2007MessageTracking', 0.99) def test_normalize_samples_014_arkoonfast360(self): self.normalize_samples('arkoonFAST360.xml', 'arkoonFAST360', 0.99) def test_normalize_samples_015_s3(self): self.normalize_samples('s3.xml', 's3', 0.99) def test_normalize_samples_016_snare(self): self.normalize_samples('snare.xml', 'snare', 0.99) def test_normalize_samples_017_vmware(self): self.normalize_samples('VMWare_ESX4-ESXi4.xml', 'VMWare_ESX4-ESXi4', 0.99) # def test_normalize_samples_018_mysql(self): # self.normalize_samples('mysql.xml', 'mysql', 0.99) def test_normalize_samples_019_IIS(self): self.normalize_samples('IIS.xml', 'IIS', 0.99) def test_normalize_samples_020_fail2ban(self): self.normalize_samples('Fail2ban.xml', 'Fail2ban', 0.99) def test_normalize_samples_021_GeoIPsource(self): try: import GeoIP #pyflakes:ignore self.normalize_samples('GeoIPsource.xml', 'GeoIPsource', 0.99) except ImportError: # cannot test pass def test_normalize_samples_022_URL_parsers(self): self.normalize_samples('URLparser.xml', 'URLparser', 0.99) self.normalize_samples('RefererParser.xml', 'RefererParser', 0.99) def test_normalize_samples_023_bitdefender(self): self.normalize_samples('bitdefender.xml', 'bitdefender', 0.99) def test_normalize_samples_024_denyall_traffic(self): self.normalize_samples('deny_traffic.xml', 'deny_traffic', 0.99) def test_normalize_samples_025_denyall_event(self): self.normalize_samples('deny_event.xml', 'deny_event', 0.99) def test_normalize_samples_026_xferlog(self): self.normalize_samples('xferlog.xml', 'xferlog', 0.99) def test_normalize_samples_027_wabauth(self): self.normalize_samples('wabauth.xml', 'wabauth', 0.99) def test_normalize_samples_028_dansguardian(self): self.normalize_samples('dansguardian.xml', 'dansguardian', 0.99) def test_normalize_samples_029_cisco_asa_header(self): self.normalize_samples('cisco-asa_header.xml', 'cisco-asa_header', 0.99) def test_normalize_samples_030_cisco_asa_msg(self): self.normalize_samples('cisco-asa_msg.xml', 'cisco-asa_msg', 0.99) class TestCSVPattern(unittest.TestCase): """Test CSVPattern behaviour""" normalizer_path = os.environ['NORMALIZERS_PATH'] tt1 = TagType(name='Anything', ttype=str, regexp='.*') tt2 = TagType(name='SyslogDate', ttype=datetime, regexp='[A-Z][a-z]{2} [ 0-9]\d \d{2}:\d{2}:\d{2}') tag_types = {} for tt in (tt1, tt2): tag_types[tt.name] = tt generic_tagTypes = get_generic_tagTypes(path = os.path.join(normalizer_path, 'common_tagTypes.xml')) cb_syslogdate = CallbackFunction(""" MONTHS = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] now = datetime.now() currentyear = now.year # Following line may throw a lot of ValueError newdate = datetime(currentyear, MONTHS.index(value[0:3]) + 1, int(value[4:6]), int(value[7:9]), int(value[10:12]), int(value[13:15])) log["date"] = newdate """, name = 'formatsyslogdate') def test_normalize_csv_pattern_001(self): t1 = Tag(name='date', tagtype = 'Anything', substitute = 'DATE') t2 = Tag(name='id', tagtype = 'Anything', substitute = 'ID') t3 = Tag(name='msg', tagtype = 'Anything', substitute = 'MSG') p_tags = {} for t in (t1, t2, t3): p_tags[t.name] = t p = CSVPattern('test', 'DATE,ID,MSG', tags = p_tags, tagTypes = self.tag_types, genericTagTypes = self.generic_tagTypes) ret = p.normalize('Jul 18 08:55:35,83,"start listening on 127.0.0.1, pam auth started"') self.assertEqual(ret['date'], 'Jul 18 08:55:35') self.assertEqual(ret['id'], '83') self.assertEqual(ret['msg'], 'start listening on 127.0.0.1, pam auth started') def test_normalize_csv_pattern_002(self): t1 = Tag(name='date', tagtype = 'SyslogDate', substitute = 'DATE') t2 = Tag(name='id', tagtype = 'Anything', substitute = 'ID') t3 = Tag(name='msg', tagtype = 'Anything', substitute = 'MSG') p_tags = {} for t in (t1, t2, t3): p_tags[t.name] = t p = CSVPattern('test', 'DATE,ID,MSG', tags = p_tags, tagTypes = self.tag_types, genericTagTypes = self.generic_tagTypes) ret = p.normalize('Jul 18 08:55:35,83,"start listening on 127.0.0.1, pam auth started"') self.assertEqual(ret['date'], 'Jul 18 08:55:35') self.assertEqual(ret['id'], '83') self.assertEqual(ret['msg'], 'start listening on 127.0.0.1, pam auth started') ret = p.normalize('2011 Jul 18 08:55:35,83,"start listening on 127.0.0.1, pam auth started"') self.assertEqual(ret, None) def test_normalize_csv_pattern_003(self): t1 = Tag(name='date', tagtype = 'SyslogDate', substitute = 'DATE', callbacks = ['formatsyslogdate']) t2 = Tag(name='id', tagtype = 'Anything', substitute = 'ID') t3 = Tag(name='msg', tagtype = 'Anything', substitute = 'MSG') p_tags = {} for t in (t1, t2, t3): p_tags[t.name] = t p = CSVPattern('test', 'DATE,ID,MSG', tags = p_tags, tagTypes = self.tag_types, callBacks = {self.cb_syslogdate.name:self.cb_syslogdate}, genericTagTypes = self.generic_tagTypes) ret = p.normalize('Jul 18 08:55:35,83,"start listening on 127.0.0.1, pam auth started"') self.assertEqual(ret['date'], datetime(datetime.now().year, 7, 18, 8, 55, 35)) self.assertEqual(ret['id'], '83') self.assertEqual(ret['msg'], 'start listening on 127.0.0.1, pam auth started') def test_normalize_csv_pattern_004(self): t1 = Tag(name='date', tagtype = 'Anything', substitute = 'DATE') t2 = Tag(name='id', tagtype = 'Anything', substitute = 'ID') t3 = Tag(name='msg', tagtype = 'Anything', substitute = 'MSG') p_tags = {} for t in (t1, t2, t3): p_tags[t.name] = t p = CSVPattern('test', ' DATE; ID ;MSG ', separator = ';', quotechar = '=', tags = p_tags, tagTypes = self.tag_types, genericTagTypes = self.generic_tagTypes) ret = p.normalize('Jul 18 08:55:35;83;=start listening on 127.0.0.1; pam auth started=') self.assertEqual(ret['date'], 'Jul 18 08:55:35') self.assertEqual(ret['id'], '83') self.assertEqual(ret['msg'], 'start listening on 127.0.0.1; pam auth started') def test_normalize_csv_pattern_005(self): t1 = Tag(name='date', tagtype = 'Anything', substitute = 'DATE') t2 = Tag(name='id', tagtype = 'Anything', substitute = 'ID') t3 = Tag(name='msg', tagtype = 'Anything', substitute = 'MSG') p_tags = {} for t in (t1, t2, t3): p_tags[t.name] = t p = CSVPattern('test', 'DATE ID MSG', separator = ' ', quotechar = '=', tags = p_tags, tagTypes = self.tag_types, genericTagTypes = self.generic_tagTypes) ret = p.normalize('=Jul 18 08:55:35= 83 =start listening on 127.0.0.1 pam auth started=') self.assertEqual(ret['date'], 'Jul 18 08:55:35') self.assertEqual(ret['id'], '83') self.assertEqual(ret['msg'], 'start listening on 127.0.0.1 pam auth started') def test_normalize_csv_pattern_006(self): t1 = Tag(name='date', tagtype = 'Anything', substitute = 'DATE') t2 = Tag(name='id', tagtype = 'Anything', substitute = 'ID') t3 = Tag(name='msg', tagtype = 'Anything', substitute = 'MSG') p_tags = {} for t in (t1, t2, t3): p_tags[t.name] = t p = CSVPattern('test', 'DATE ID MSG', separator = ' ', quotechar = '=', tags = p_tags, tagTypes = self.tag_types, genericTagTypes = self.generic_tagTypes) # Default behaviour of csv reader is doublequote for escape a quotechar. ret = p.normalize('=Jul 18 08:55:35= 83 =start listening on ==127.0.0.1 pam auth started=') self.assertEqual(ret['date'], 'Jul 18 08:55:35') self.assertEqual(ret['id'], '83') self.assertEqual(ret['msg'], 'start listening on =127.0.0.1 pam auth started') class TestCommonElementsPrecedence(unittest.TestCase): """Unit test used to validate that callbacks defined in a normalizer take precedence over common callbacks.""" normalizer_path = os.environ['NORMALIZERS_PATH'] fake_syslog = StringIO(""" Uh Ah mhu@wallix.com Oh Eh \d{1,3} log["TEST"] = "TEST" Hoo Hi MYMAC MYWHATEVER the log's priority urrrh MYMAC the log's date bleeeh MYWHATEVER MMM dd hh:mm:ss 99 HERPA DERP 99 TEST """) n = parse(fake_syslog) def test_00_validate_fake_syslog(self): """Validate the fake normalizer""" dtd = DTD(open(os.path.join(self.normalizer_path, 'normalizer.dtd'))) self.assertTrue(dtd.validate(self.n)) def test_10_common_elements_precedence(self): """Testing callbacks priority""" normalizer = Normalizer(self.n, os.path.join(self.normalizer_path, 'common_tagTypes.xml'), os.path.join(self.normalizer_path, 'common_callBacks.xml')) self.assertTrue(normalizer.validate()) class TestFinalCallbacks(unittest.TestCase): """Unit test used to validate FinalCallbacks""" normalizer_path = os.environ['NORMALIZERS_PATH'] fake_syslog = StringIO(""" Uh Ah mhu@wallix.com Oh Eh [a-zA-Z] log["toto"] = log["a"] + log["b"] if not value: log["tata"] = log["toto"] * 2 else: log["tata"] = log["toto"] * 3 log['b'] = value * 2 Hoo Hi A B C the log's priority urrrh A the log's date bleeeh B tutu the log's priority urrrh C a b c a bb c abb abbabb toto tata """) n = parse(fake_syslog) def test_00_validate_fake_syslog(self): """Validate the fake normalizer""" dtd = DTD(open(os.path.join(self.normalizer_path, 'normalizer.dtd'))) self.assertTrue(dtd.validate(self.n)) def test_10_final_callbacks(self): """Testing final callbacks""" normalizer = Normalizer(self.n, os.path.join(self.normalizer_path, 'common_tagTypes.xml'), os.path.join(self.normalizer_path, 'common_callBacks.xml')) self.assertTrue(['toto', 'tata'] == normalizer.finalCallbacks) self.assertTrue(normalizer.validate()) if __name__ == "__main__": unittest.main() pylogsparser-0.4/tests/test_suite.py0000644000175000017500000000265011700571003016106 0ustar fbofbo# -*- python -*- # pylogsparser - Logs parsers python library # # Copyright (C) 2011 Wallix Inc. # # This library is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by the # Free Software Foundation; either version 2.1 of the License, or (at your # option) any later version. # # This library is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more # details. # # You should have received a copy of the GNU Lesser General Public License # along with this library; if not, write to the Free Software Foundation, Inc., # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA """ The LogNormalizer need to be instanciated with the path to normalizers XML definitions. Tests expects to find normlizer path in NORMALIZERS_PATH environment variable. $ NORMALIZERS_PATH=normalizers/ python tests/test_suite.py """ import unittest import test_normalizer import test_lognormalizer import test_log_samples import test_commonElements tests = (test_commonElements, test_normalizer, test_lognormalizer, test_log_samples, ) load = unittest.defaultTestLoader.loadTestsFromModule suite = unittest.TestSuite(map(load, tests)) unittest.TextTestRunner(verbosity=2).run(suite) pylogsparser-0.4/tests/__init__.py0000644000175000017500000000000011627706151015454 0ustar fbofbopylogsparser-0.4/tests/test_lognormalizer.py0000644000175000017500000001627611700571003017652 0ustar fbofbo# -*- python -*- # pylogsparser - Logs parsers python library # # Copyright (C) 2011 Wallix Inc. # # This library is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by the # Free Software Foundation; either version 2.1 of the License, or (at your # option) any later version. # # This library is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more # details. # # You should have received a copy of the GNU Lesser General Public License # along with this library; if not, write to the Free Software Foundation, Inc., # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # import os import unittest import tempfile import shutil from logsparser.lognormalizer import LogNormalizer from lxml.etree import parse, fromstring as XMLfromstring class Test(unittest.TestCase): """Unit tests for logsparser.lognormalizer""" normalizer_path = os.environ['NORMALIZERS_PATH'] def test_000_invalid_paths(self): """Verify that we cannot instanciate LogNormalizer on invalid paths""" def bleh(paths): n = LogNormalizer(paths) return n self.assertRaises(ValueError, bleh, [self.normalizer_path, "/path/to/nowhere"]) self.assertRaises(ValueError, bleh, ["/path/to/nowhere",]) self.assertRaises(StandardError, bleh, ["/usr/bin/",]) def test_001_all_normalizers_activated(self): """ Verify that we have all normalizer activated when we instanciate LogNormalizer with an activate dict empty. """ ln = LogNormalizer(self.normalizer_path) self.assertTrue(len(ln)) self.assertEqual(len([an[0] for an in ln.get_active_normalizers() if an[1]]), len(ln)) self.assertEqual(len(ln._cache), len(ln)) def test_002_deactivate_normalizer(self): """ Verify that normalizer deactivation is working. """ ln = LogNormalizer(self.normalizer_path) active_n = ln.get_active_normalizers() to_deactivate = active_n.keys()[:2] for to_d in to_deactivate: del active_n[to_d] ln.set_active_normalizers(active_n) ln.reload() self.assertEqual(len([an[0] for an in ln.get_active_normalizers().items() if an[1]]), len(ln)-2) self.assertEqual(len(ln._cache), len(ln)-2) def test_003_activate_normalizer(self): """ Verify that normalizer activation is working. """ ln = LogNormalizer(self.normalizer_path) active_n = ln.get_active_normalizers() to_deactivate = active_n.keys()[0] to_activate = to_deactivate del active_n[to_deactivate] ln.set_active_normalizers(active_n) ln.reload() # now deactivation should be done so reactivate active_n[to_activate] = True ln.set_active_normalizers(active_n) ln.reload() self.assertEqual(len([an[0] for an in ln.get_active_normalizers() if an[1]]), len(ln)) self.assertEqual(len(ln._cache), len(ln)) def test_004_normalizer_uuid(self): """ Verify that we get at least uuid tag """ testlog = {'raw': 'a minimal log line'} ln = LogNormalizer(self.normalizer_path) ln.lognormalize(testlog) self.assertTrue('uuid' in testlog.keys()) def test_005_normalizer_test_a_syslog_log(self): """ Verify that lognormalizer extracts syslog header as tags """ testlog = {'raw': 'Jul 18 08:55:35 naruto app[3245]: body message'} ln = LogNormalizer(self.normalizer_path) ln.lognormalize(testlog) self.assertTrue('uuid' in testlog.keys()) self.assertTrue('date' in testlog.keys()) self.assertEqual(testlog['body'], 'body message') self.assertEqual(testlog['program'], 'app') self.assertEqual(testlog['pid'], '3245') def test_006_normalizer_test_a_syslog_log_with_syslog_deactivate(self): """ Verify that lognormalizer does not extract syslog header as tags when syslog normalizer is deactivated. """ testlog = {'raw': 'Jul 18 08:55:35 naruto app[3245]: body message'} ln = LogNormalizer(self.normalizer_path) active_n = ln.get_active_normalizers() to_deactivate = [n for n in active_n.keys() if n.find('syslog') >= 0] for n in to_deactivate: del active_n[n] ln.set_active_normalizers(active_n) ln.reload() ln.lognormalize(testlog) self.assertTrue('uuid' in testlog.keys()) self.assertFalse('date' in testlog.keys()) self.assertFalse('program' in testlog.keys()) def test_007_normalizer_getsource(self): """ Verify we can retreive XML source of a normalizer. """ ln = LogNormalizer(self.normalizer_path) source = ln.get_normalizer_source('syslog-0.99') self.assertEquals(XMLfromstring(source).getroottree().getroot().get('name'), 'syslog') def test_008_normalizer_multiple_paths(self): """ Verify we can can deal with multiple normalizer paths. """ fdir = tempfile.mkdtemp() sdir = tempfile.mkdtemp() for f in os.listdir(self.normalizer_path): path_f = os.path.join(self.normalizer_path, f) if os.path.isfile(path_f): shutil.copyfile(path_f, os.path.join(fdir, f)) shutil.move(os.path.join(fdir, 'postfix.xml'), os.path.join(sdir, 'postfix.xml')) ln = LogNormalizer([fdir, sdir]) source = ln.get_normalizer_source('postfix-0.99') self.assertEquals(XMLfromstring(source).getroottree().getroot().get('name'), 'postfix') self.assertTrue(ln.get_normalizer_path('postfix-0.99').startswith(sdir)) self.assertTrue(ln.get_normalizer_path('syslog-0.99').startswith(fdir)) xml_src = ln.get_normalizer_source('syslog-0.99') os.unlink(os.path.join(fdir, 'syslog.xml')) ln.reload() self.assertRaises(ValueError, ln.get_normalizer_path, 'syslog-0.99') ln.update_normalizer(xml_src, dir_path = sdir) self.assertTrue(ln.get_normalizer_path('syslog-0.99').startswith(sdir)) shutil.rmtree(fdir) shutil.rmtree(sdir) def test_009_normalizer_multiple_version(self): """ Verify we can can deal with a normalizer with more than one version. """ fdir = tempfile.mkdtemp() shutil.copyfile(os.path.join(self.normalizer_path, 'postfix.xml'), os.path.join(fdir, 'postfix.xml')) # Change normalizer version in fdir path xml = parse(os.path.join(fdir, 'postfix.xml')) xmln = xml.getroot() xmln.set('version', '1.0') xml.write(os.path.join(fdir, 'postfix.xml')) ln = LogNormalizer([self.normalizer_path, fdir]) self.assertEquals(XMLfromstring(ln.get_normalizer_source('postfix-0.99')).getroottree().getroot().get('version'), '0.99') self.assertEquals(XMLfromstring(ln.get_normalizer_source('postfix-1.0')).getroottree().getroot().get('version'), '1.0') shutil.rmtree(fdir) if __name__ == "__main__": unittest.main() pylogsparser-0.4/tests/test_log_samples.py0000644000175000017500000013005011710522056017263 0ustar fbofbo# -*- python -*- # -*- coding: utf-8 -*- # pylogsparser - Logs parsers python library # # Copyright (C) 2011 Wallix Inc. # # This library is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by the # Free Software Foundation; either version 2.1 of the License, or (at your # option) any later version. # # This library is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more # details. # # You should have received a copy of the GNU Lesser General Public License # along with this library; if not, write to the Free Software Foundation, Inc., # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # """Testing that normalization work as excepted Here you can add samples logs to test existing or new normalizers. In addition to examples validation defined in each normalizer xml definition you should add validation tests here. In this test all normalizer definitions are loaded and therefore it is useful to detect normalization conflicts. """ import os import unittest from datetime import datetime from logsparser import lognormalizer normalizer_path = os.environ['NORMALIZERS_PATH'] ln = lognormalizer.LogNormalizer(normalizer_path) class Test(unittest.TestCase): def aS(self, log, subset, notexpected = ()): """Assert that the result of normalization of a given line log has the given subset.""" data = {'raw' : log, 'body' : log} ln.lognormalize(data) for key in subset: self.assertEqual(data[key], subset[key]) for key in notexpected: self.assertFalse(key in data.keys()) def test_simple_syslog(self): """Test syslog logs""" now = datetime.now() self.aS("<40>%s neo kernel: tun_wallix: Disabled Privacy Extensions" % now.strftime("%b %d %H:%M:%S"), {'body': 'tun_wallix: Disabled Privacy Extensions', 'severity': 'emerg', 'severity_code' : '0', 'facility': 'syslog', 'facility_code' : '5', 'source': 'neo', 'program': 'kernel', 'date': now.replace(microsecond=0)}) self.aS("<40>%s fbo sSMTP[8847]: Cannot open mail:25" % now.strftime("%b %d %H:%M:%S"), {'body': 'Cannot open mail:25', 'severity': 'emerg', 'severity_code' : '0', 'facility': 'syslog', 'facility_code' : '5', 'source': 'fbo', 'program': 'sSMTP', 'pid': '8847', 'date': now.replace(microsecond=0)}) self.aS("%s fbo sSMTP[8847]: Cannot open mail:25" % now.strftime("%b %d %H:%M:%S"), {'body': 'Cannot open mail:25', 'source': 'fbo', 'program': 'sSMTP', 'pid': '8847', 'date': now.replace(microsecond=0)}) now = now.replace(month=now.month%12+1, day=1) self.aS("<40>%s neo kernel: tun_wallix: Disabled Privacy Extensions" % now.strftime("%b %d %H:%M:%S"), {'date': now.replace(microsecond=0, year=now.year-1), 'body': 'tun_wallix: Disabled Privacy Extensions', 'severity': 'emerg', 'severity_code' : '0', 'facility': 'syslog', 'facility_code' : '5', 'source': 'neo', 'program': 'kernel' }) def test_postfix(self): """Test postfix logs""" self.aS("<40>Dec 21 07:49:02 hosting03 postfix/cleanup[23416]: 2BD731B4017: message-id=<20071221073237.5244419B327@paris.office.wallix.com>", {'program': 'postfix', 'component': 'cleanup', 'queue_id': '2BD731B4017', 'pid': '23416', 'message_id': '20071221073237.5244419B327@paris.office.wallix.com'}) # self.aS("<40>Dec 21 07:49:01 hosting03 postfix/anvil[32717]: statistics: max connection rate 2/60s for (smtp:64.14.54.229) at Dec 21 07:40:04", # {'program': 'postfix', # 'component': 'anvil', # 'pid': '32717'}) # self.aS("<40>Dec 21 07:49:01 hosting03 postfix/pipe[23417]: 1E83E1B4017: to=, relay=vmail, delay=0.13, delays=0.11/0/0/0.02, dsn=2.0.0, status=sent (delivered via vmail service)", {'program': 'postfix', 'component': 'pipe', 'queue_id': '1E83E1B4017', 'message_recipient': 'gloubi@wallix.com', 'relay': 'vmail', 'dest_host': 'vmail', 'status': 'sent'}) self.aS("<40>Dec 21 07:49:04 hosting03 postfix/smtpd[23446]: C43971B4019: client=paris.office.wallix.com[82.238.42.70]", {'program': 'postfix', 'component': 'smtpd', 'queue_id': 'C43971B4019', 'client': 'paris.office.wallix.com[82.238.42.70]', 'source_host': 'paris.office.wallix.com', 'source_ip': '82.238.42.70'}) # self.aS("<40>Dec 21 07:52:56 hosting03 postfix/smtpd[23485]: connect from mail.gloubi.com[65.45.12.22]", # {'program': 'postfix', # 'component': 'smtpd', # 'ip': '65.45.12.22'}) self.aS("<40>Dec 21 08:42:17 hosting03 postfix/pipe[26065]: CEFFB1B4020: to=, orig_to=, relay=vacation, delay=4.1, delays=4/0/0/0.08, dsn=2.0.0, status=sent (delivered via vacation service)", {'program': 'postfix', 'component': 'pipe', 'message_recipient': 'gloubi@wallix.com@autoreply.wallix.com', 'orig_to': 'gloubi@wallix.com', 'relay': 'vacation', 'dest_host': 'vacation', 'status': 'sent'}) def test_squid(self): """Test squid logs""" self.aS("<40>Dec 21 07:49:02 hosting03 squid[54]: 1196341497.777 784 127.0.0.1 TCP_MISS/200 106251 GET http://fr.yahoo.com/ vbe DIRECT/217.146.186.51 text/html", { 'program': 'squid', 'date': datetime(2007, 11, 29, 13, 4, 57, 777000), 'elapsed': '784', 'source_ip': '127.0.0.1', 'event_id': 'TCP_MISS', 'status': '200', 'len': '106251', 'method': 'GET', 'url': 'http://fr.yahoo.com/', 'user': 'vbe' }) self.aS("<40>Dec 21 07:49:02 hosting03 : 1196341497.777 784 127.0.0.1 TCP_MISS/404 106251 GET http://fr.yahoo.com/gjkgf/gfgff/ - DIRECT/217.146.186.51 text/html", { 'program': 'squid', 'date': datetime(2007, 11, 29, 13, 4, 57, 777000), 'elapsed': '784', 'source_ip': '127.0.0.1', 'event_id': 'TCP_MISS', 'status': '404', 'len': '106251', 'method': 'GET', 'url': 'http://fr.yahoo.com/gjkgf/gfgff/' }) self.aS("Oct 22 01:27:16 pluto squid: 1259845087.188 10 82.238.42.70 TCP_MISS/200 13121 GET http://ak.bluestreak.com//adv/sig/%5E16238/%5E7451318/VABT.swf?url_download=&width=300&height=250&vidw=300&vidh=250&startbbanner=http://ak.bluestreak.com//adv/sig/%5E16238/%5E7451318/vdo_300x250_in.swf&endbanner=http://ak.bluestreak.com//adv/sig/%5E16238/%5E7451318/vdo_300x250_out.swf&video_hd=http://aak.bluestreak.com//adv/sig/%5E16238/%5E7451318/vdo_300x250_hd.flv&video_md=http://ak.bluestreak.com//adv/sig/%5E16238/%5E7451318/vdo_300x250_md.flv&video_bd=http://ak.bluestreak.comm//adv/sig/%5E16238/%5E7451318/vdo_300x250_bd.flv&url_tracer=http%3A//s0b.bluestreak.com/ix.e%3Fpx%26s%3D8008666%26a%3D7451318%26t%3D&start=2&duration1=3&duration2=4&duration3=5&durration4=6&duration5=7&end=8&hd=9&md=10&bd=11&gif=12&hover1=13&hover2=14&hover3=15&hover4=16&hover5=17&hover6=18&replay=19&sound_state=off&debug=0&playback_controls=off&tracking_objeect=tracking_object_8008666&url=javascript:bluestreak8008666_clic();&rnd=346.2680651591202 fbo DIRECT/92.123.65.129 application/x-shockwave-flash", {'program' : "squid", 'date' : datetime.utcfromtimestamp(float(1259845087.188)), 'elapsed' : "10", 'source_ip' : "82.238.42.70", 'event_id' : "TCP_MISS", 'status' : "200", 'len' : "13121", 'method' : "GET", 'user' : "fbo", 'peer_status' : "DIRECT", 'peer_host' : "92.123.65.129", 'mime_type' : "application/x-shockwave-flash", 'url' : "http://ak.bluestreak.com//adv/sig/%5E16238/%5E7451318/VABT.swf?url_download=&width=300&height=250&vidw=300&vidh=250&startbbanner=http://ak.bluestreak.com//adv/sig/%5E16238/%5E7451318/vdo_300x250_in.swf&endbanner=http://ak.bluestreak.com//adv/sig/%5E16238/%5E7451318/vdo_300x250_out.swf&video_hd=http://aak.bluestreak.com//adv/sig/%5E16238/%5E7451318/vdo_300x250_hd.flv&video_md=http://ak.bluestreak.com//adv/sig/%5E16238/%5E7451318/vdo_300x250_md.flv&video_bd=http://ak.bluestreak.comm//adv/sig/%5E16238/%5E7451318/vdo_300x250_bd.flv&url_tracer=http%3A//s0b.bluestreak.com/ix.e%3Fpx%26s%3D8008666%26a%3D7451318%26t%3D&start=2&duration1=3&duration2=4&duration3=5&durration4=6&duration5=7&end=8&hd=9&md=10&bd=11&gif=12&hover1=13&hover2=14&hover3=15&hover4=16&hover5=17&hover6=18&replay=19&sound_state=off&debug=0&playback_controls=off&tracking_objeect=tracking_object_8008666&url=javascript:bluestreak8008666_clic();&rnd=346.2680651591202"}) def test_netfilter(self): """Test netfilter logs""" self.aS("<40>Dec 26 09:30:07 dedibox kernel: FROM_INTERNET_DENY IN=eth0 OUT= MAC=00:40:63:e7:b2:17:00:15:fa:80:47:3f:08:00 SRC=88.252.4.37 DST=88.191.34.16 LEN=48 TOS=0x00 PREC=0x00 TTL=117 ID=56818 DF PROTO=TCP SPT=1184 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0", { 'program': 'netfilter', 'inbound_int': 'eth0', 'dest_mac': '00:40:63:e7:b2:17', 'source_mac': '00:15:fa:80:47:3f', 'source_ip': '88.252.4.37', 'dest_ip': '88.191.34.16', 'len': '48', 'protocol': 'TCP', 'source_port': '1184', 'prefix': 'FROM_INTERNET_DENY', 'dest_port': '445' }) self.aS("<40>Dec 26 08:45:23 dedibox kernel: TO_INTERNET_DENY IN=vif2.0 OUT=eth0 SRC=10.116.128.6 DST=82.225.197.239 LEN=121 TOS=0x00 PREC=0x00 TTL=63 ID=15592 DF PROTO=TCP SPT=993 DPT=56248 WINDOW=4006 RES=0x00 ACK PSH FIN URGP=0 ", { 'program': 'netfilter', 'inbound_int': 'vif2.0', 'outbound_int': 'eth0', 'source_ip': '10.116.128.6', 'dest_ip': '82.225.197.239', 'len': '121', 'protocol': 'TCP', 'source_port': '993', 'dest_port': '56248' }) # One malformed log self.aS("<40>Dec 26 08:45:23 dedibox kernel: TO_INTERNET_DENY IN=vif2.0 OUT=eth0 DST=82.225.197.239 LEN=121 TOS=0x00 PREC=0x00 TTL=63 ID=15592 DF PROTO=TCP SPT=993 DPT=56248 WINDOW=4006 RES=0x00 ACK PSH FIN URGP=0 ", { 'program': 'kernel' }, ('inbound_int', 'len')) self.aS("Sep 28 15:19:59 tulipe-input kernel: [1655854.311830] DROPPED: IN=eth0 OUT= MAC=32:42:cd:02:72:30:00:23:7d:c6:35:e6:08:00 SRC=10.10.4.7 DST=10.10.4.86 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=20805 DF PROTO=TCP SPT=34259 DPT=111 WINDOW=5840 RES=0x00 SYN URGP=0", {'program': 'netfilter', 'inbound_int' : "eth0", 'source_ip' : "10.10.4.7", 'dest_ip' : "10.10.4.86", 'len' : "60", 'protocol' : 'TCP', 'source_port' : '34259', 'dest_port' : '111', 'dest_mac' : '32:42:cd:02:72:30', 'source_mac' : '00:23:7d:c6:35:e6', 'prefix' : '[1655854.311830] DROPPED:' }) def test_dhcpd(self): """Test DHCPd log normalization""" self.aS("<40>Dec 25 15:00:15 gnaganok dhcpd: DHCPDISCOVER from 02:1c:25:a3:32:76 via 183.213.184.122", { 'program': 'dhcpd', 'action': 'DISCOVER', 'source_mac': '02:1c:25:a3:32:76', 'via': '183.213.184.122' }) self.aS("<40>Dec 25 15:00:15 gnaganok dhcpd: DHCPDISCOVER from 02:1c:25:a3:32:76 via vlan18.5", { 'program': 'dhcpd', 'action': 'DISCOVER', 'source_mac': '02:1c:25:a3:32:76', 'via': 'vlan18.5' }) for log in [ "DHCPOFFER on 183.231.184.122 to 00:13:ec:1c:06:5b via 183.213.184.122", "DHCPREQUEST for 183.231.184.122 from 00:13:ec:1c:06:5b via 183.213.184.122", "DHCPACK on 183.231.184.122 to 00:13:ec:1c:06:5b via 183.213.184.122", "DHCPNACK on 183.231.184.122 to 00:13:ec:1c:06:5b via 183.213.184.122", "DHCPDECLINE of 183.231.184.122 from 00:13:ec:1c:06:5b via 183.213.184.122 (bla)", "DHCPRELEASE of 183.231.184.122 from 00:13:ec:1c:06:5b via 183.213.184.122 for nonexistent lease" ]: self.aS("<40>Dec 25 15:00:15 gnaganok dhcpd: %s" % log, { 'program': 'dhcpd', 'source_ip': '183.231.184.122', 'source_mac': '00:13:ec:1c:06:5b', 'via': '183.213.184.122' }) self.aS("<40>Dec 25 15:00:15 gnaganok dhcpd: DHCPINFORM from 183.231.184.122", { 'program': 'dhcpd', 'source_ip': '183.231.184.122', 'action': 'INFORM' }) def test_sshd(self): """Test SSHd normalization""" self.aS("<40>Dec 26 10:32:40 naruto sshd[2274]: Failed password for bernat from 127.0.0.1 port 37234 ssh2", { 'program': 'sshd', 'action': 'fail', 'user': 'bernat', 'method': 'password', 'source_ip': '127.0.0.1' }) self.aS("<40>Dec 26 10:32:40 naruto sshd[2274]: Failed password for invalid user jfdghfg from 127.0.0.1 port 37234 ssh2", { 'program': 'sshd', 'action': 'fail', 'user': 'jfdghfg', 'method': 'password', 'source_ip': '127.0.0.1' }) self.aS("<40>Dec 26 10:32:40 naruto sshd[2274]: Failed none for invalid user kgjfk from 127.0.0.1 port 37233 ssh2", { 'program': 'sshd', 'action': 'fail', 'user': 'kgjfk', 'method': 'none', 'source_ip': '127.0.0.1' }) self.aS("<40>Dec 26 10:32:40 naruto sshd[2274]: Accepted password for bernat from 127.0.0.1 port 37234 ssh2", { 'program': 'sshd', 'action': 'accept', 'user': 'bernat', 'method': 'password', 'source_ip': '127.0.0.1' }) self.aS("<40>Dec 26 10:32:40 naruto sshd[2274]: Accepted publickey for bernat from 192.168.251.2 port 60429 ssh2", { 'program': 'sshd', 'action': 'accept', 'user': 'bernat', 'method': 'publickey', 'source_ip': '192.168.251.2' }) # See http://www.ossec.net/en/attacking-loganalysis.html self.aS("<40>Dec 26 10:32:40 naruto sshd[2274]: Failed password for invalid user myfakeuser from 10.1.1.1 port 123 ssh2 from 192.168.50.65 port 34813 ssh2", { 'program': 'sshd', 'action': 'fail', 'user': 'myfakeuser from 10.1.1.1 port 123 ssh2', 'method': 'password', 'source_ip': '192.168.50.65' }) # self.aS("Aug 1 18:30:05 knight sshd[20439]: Illegal user guest from 218.49.183.17", # {'program': 'sshd', # 'source' : 'knight', # 'user' : 'guest', # 'source_ip': '218.49.183.17', # 'body' : 'Illegal user guest from 218.49.183.17', # }) def test_pam(self): """Test PAM normalization""" self.aS("<40>Dec 26 10:32:25 s_all@naruto sshd[2263]: pam_unix(ssh:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=localhost user=bernat", { 'program': 'ssh', 'component': 'pam_unix', 'type': 'auth', 'user': 'bernat' }) self.aS("<40>Dec 26 10:09:01 s_all@naruto CRON[2030]: pam_unix(cron:session): session opened for user root by (uid=0)", { 'program': 'cron', 'component': 'pam_unix', 'type': 'session', 'user': 'root' }) self.aS("<40>Dec 26 10:32:25 s_all@naruto sshd[2263]: pam_unix(ssh:auth): check pass; user unknown", { 'program': 'ssh', 'component': 'pam_unix', 'type': 'auth' }) # This one should be better handled self.aS("<40>Dec 26 10:32:25 s_all@naruto sshd[2263]: pam_unix(ssh:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=localhost", { 'program': 'ssh', 'component': 'pam_unix', 'type': 'auth' }) def test_lea(self): """Test LEA normalization""" self.aS("Oct 22 01:27:16 pluto lea: loc=7803|time=1199716450|action=accept|orig=fw1|i/f_dir=inbound|i/f_name=PCnet1|has_accounting=0|uuid=<47823861,00000253,7b040a0a,000007b6>|product=VPN-1 & FireWall-1|__policy_id_tag=product=VPN-1 & FireWall-1[db_tag={9F95C344-FE3F-4E3E-ACD8-60B5194BAAB4};mgmt=fw1;date=1199701916;policy_name=Standard]|src=naruto|s_port=36973|dst=fw1|service=941|proto=tcp|rule=1", {'program' : 'lea', 'id' : "7803", 'action' : "accept", 'source_host' : "naruto", 'source_port' : "36973", 'dest_host' : "fw1", 'dest_port' : "941", 'protocol' : "tcp", 'product' : "VPN-1 & FireWall-1", 'inbound_int' : "PCnet1"}) def test_apache(self): """Test Apache normalization""" # Test Common Log Format (CLF) "%h %l %u %t \"%r\" %>s %O" self.aS("""127.0.0.1 - - [20/Jul/2009:00:29:39 +0300] "GET /index/helper/test HTTP/1.1" 200 889""", {'program' : "apache", 'source_ip' : "127.0.0.1", 'request' : 'GET /index/helper/test HTTP/1.1', 'len' : "889", 'date' : datetime(2009, 7, 20, 0, 29, 39), 'body' : '127.0.0.1 - - [20/Jul/2009:00:29:39 +0300] "GET /index/helper/test HTTP/1.1" 200 889'}) # Test "combined" log format "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" self.aS('10.10.4.4 - - [04/Dec/2009:16:23:13 +0100] "GET /tulipe.core.persistent.persistent-module.html HTTP/1.1" 200 2937 "http://10.10.4.86/toc.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.3) Gecko/20090910 Ubuntu/9.04 (jaunty) Shiretoko/3.5.3"', {'program' : "apache", 'source_ip' : "10.10.4.4", 'source_logname' : "-", 'user' : "-", 'date' : datetime(2009, 12, 4, 16, 23, 13), 'request' : 'GET /tulipe.core.persistent.persistent-module.html HTTP/1.1', 'status' : "200", 'len' : "2937", 'request_header_referer_contents' : "http://10.10.4.86/toc.html", 'useragent' : "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.3) Gecko/20090910 Ubuntu/9.04 (jaunty) Shiretoko/3.5.3", 'body' : '10.10.4.4 - - [04/Dec/2009:16:23:13 +0100] "GET /tulipe.core.persistent.persistent-module.html HTTP/1.1" 200 2937 "http://10.10.4.86/toc.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.3) Gecko/20090910 Ubuntu/9.04 (jaunty) Shiretoko/3.5.3"'}) # Test "vhost_combined" log format "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" #TODO: Update apache normalizer to handle this format. def test_bind9(self): """Test Bind9 normalization""" self.aS("Oct 22 01:27:16 pluto named: client 192.168.198.130#4532: bad zone transfer request: 'www.abc.com/IN': non-authoritative zone (NOTAUTH)", {'event_id' : "zone_transfer_bad", 'zone' : "www.abc.com", 'source_ip' : '192.168.198.130', 'class' : 'IN', 'program' : 'named'}) self.aS("Oct 22 01:27:16 pluto named: general: notice: client 10.10.4.4#39583: query: tpf.qa.ifr.lan IN SOA +", {'event_id' : "client_query", 'domain' : "tpf.qa.ifr.lan", 'category' : "general", 'severity' : "notice", 'class' : "IN", 'source_ip' : "10.10.4.4", 'program' : 'named'}) self.aS("Oct 22 01:27:16 pluto named: createfetch: 126.92.194.77.zen.spamhaus.org A", {'event_id' : "fetch_request", 'domain' : "126.92.194.77.zen.spamhaus.org", 'program' : 'named'}) def test_symantec8(self): """Test Symantec version 8 normalization""" self.aS("""200A13080122,23,2,8,TRAVEL00,SYSTEM,,,,,,,16777216,"Symantec AntiVirus Realtime Protection Loaded.",0,,0,,,,,0,,,,,,,,,,SAMPLE_COMPUTER,,,,Parent,GROUP,,8.0.93330""", {"program" : "symantec", "date" : datetime(2002, 11, 19, 8, 1, 34), "category" : "Summary", "local_host" : "TRAVEL00", "domain_name" : "GROUP", "event_logger_type" : "System", "event_id" : "GL_EVENT_RTS_LOAD", "eventblock_action" : "EB_LOG", "group_id" : "0", "operation_flags" : "0", "parent" : "SAMPLE_COMPUTER", "scan_id" : "0", "server_group" : "Parent", "user" : "SYSTEM", "version" : "8.0.93330"}) # Need to find real symantec version 9 log lines def test_symantec9(self): """Test Symantec version 9 normalization""" self.aS("""200A13080122,23,2,8,TRAVEL00,SYSTEM,,,,,,,16777216,"Symantec AntiVirus Realtime Protection Loaded.",0,,0,,,,,0,,,,,,,,,,SAMPLE_COMPUTER,,,,Parent,GROUP,,9.0.93330,,,,,,,,,,,,,,,,,,,,""", {"program" : "symantec", "date" : datetime(2002, 11, 19, 8, 1, 34), "category" : "Summary", "local_host" : "TRAVEL00", "domain_name" : "GROUP", "event_logger_type" : "System", "event_id" : "GL_EVENT_RTS_LOAD", "eventblock_action" : "EB_LOG", "group_id" : "0", "operation_flags" : "0", "parent" : "SAMPLE_COMPUTER", "scan_id" : "0", "server_group" : "Parent", "user" : "SYSTEM", "version" : "9.0.93330"}) def test_arkoonFAST360(self): """Test Arkoon FAST360 normalization""" self.aS('AKLOG-id=firewall time="2004-02-25 17:38:57" fw=myArkoon aktype=IP gmtime=1077727137 ip_log_type=ENDCONN src=10.10.192.61 dst=10.10.192.255 proto="137/udp" protocol=17 port_src=137 port_dest=137 intf_in=eth0 intf_out= pkt_len=78 nat=NO snat_addr=0 snat_port=0 dnat_addr=0 dnat_port=0 user="userName" pri=3 rule="myRule" action=DENY reason="Blocked by filter" description="dst addr received from Internet is private"', {"program" : "arkoon", "date" : datetime(2004, 02, 25, 16, 38, 57), "event_id" : "IP", "priority" : "3", "local_host" : "myArkoon", "user" : "userName", "protocol": "udp", "dest_ip" : "10.10.192.255", "source_ip" : "10.10.192.61", "reason" : "Blocked by filter", "ip_log_type" : "ENDCONN", "body" : 'id=firewall time="2004-02-25 17:38:57" fw=myArkoon aktype=IP gmtime=1077727137 ip_log_type=ENDCONN src=10.10.192.61 dst=10.10.192.255 proto="137/udp" protocol=17 port_src=137 port_dest=137 intf_in=eth0 intf_out= pkt_len=78 nat=NO snat_addr=0 snat_port=0 dnat_addr=0 dnat_port=0 user="userName" pri=3 rule="myRule" action=DENY reason="Blocked by filter" description="dst addr received from Internet is private"'}) # Assuming this kind of log with syslog like header is typically sent over the wire. self.aS('<134>IP-Logs: AKLOG - id=firewall time="2010-10-04 10:38:37" gmtime=1286181517 fw=doberman.jurassic.ta aktype=IP ip_log_type=NEWCONN src=172.10.10.107 dst=204.13.8.181 proto="http" protocol=6 port_src=2619 port_dest=80 intf_in=eth7 intf_out=eth2 pkt_len=48 nat=HIDE snat_addr=10.10.10.199 snat_port=16176 dnat_addr=0 dnat_port=0 tcp_seq=1113958286 tcp_ack=0 tcp_flags="SYN" user="" vpn-src="" pri=6 rule="surf_normal" action=ACCEPT', {'program': 'arkoon', 'event_id': 'IP', 'rule': 'surf_normal', 'ip_log_type': 'NEWCONN'}) # This one must not match the arkoonFAST360 parser # Assuming this king of log does not exist self.aS('<40>Dec 21 08:42:17 hosting arkoon: <134>IP-Logs: AKLOG - id=firewall time="2010-10-04 10:38:37" gmtime=1286181517 fw=doberman.jurassic.ta aktype=IP ip_log_type=NEWCONN src=172.10.10.107 dst=204.13.8.181 proto="http" protocol=6 port_src=2619 port_dest=80 intf_in=eth7 intf_out=eth2 pkt_len=48 nat=HIDE snat_addr=10.10.10.199 snat_port=16176 dnat_addr=0 dnat_port=0 tcp_seq=1113958286 tcp_ack=0 tcp_flags="SYN" user="" vpn-src="" pri=6 rule="surf_normal" action=ACCEPT', {'program': 'arkoon'}, # program is set by syslog parser ('event_id', 'rule', 'ip_log_type')) def test_MSExchange2007MTL(self): """Test Exchange 2007 message tracking log normalization""" self.aS("""2010-04-19T12:29:07.390Z,10.10.14.73,WIN2K3DC,,WIN2K3DC,"MDB:ada3d2c3-6f32-45db-b1ee-a68dbcc86664, Mailbox:68cf09c1-1344-4639-b013-3c6f8a588504, Event:1440, MessageClass:IPM.Note, CreationTime:2010-04-19T12:28:51.312Z, ClientType:User",,STOREDRIVER,SUBMIT,,,,,,,,,Coucou !,user7@qa.ifr.lan,,""", {'mdb': 'ada3d2c3-6f32-45db-b1ee-a68dbcc86664', 'source_host': 'WIN2K3DC', 'source_ip': '10.10.14.73', 'client_type': 'User', 'creation_time': 'Mon Apr 19 12:28:51 2010', 'date': datetime(2010, 4, 19, 12, 29, 7, 390000), 'event': '1440', 'event_id': 'SUBMIT', 'exchange_source': 'STOREDRIVER', 'mailbox': '68cf09c1-1344-4639-b013-3c6f8a588504', 'message_class': 'IPM.Note', 'message_id': 'C6539E897AEDFA469FE34D029FB708D43495@win2k3dc.qa.ifr.lan', 'message_subject': 'Coucou !', 'program': 'MS Exchange 2007 Message Tracking', 'dest_host': 'WIN2K3DC'}) def test_S3(self): """Test Amazon S3 bucket log normalization""" self.aS("""DEADBEEF testbucket [19/Jul/2011:13:17:11 +0000] 10.194.22.16 FACEDEAD CAFEDECA REST.GET.ACL - "GET /?acl HTTP/1.1" 200 - 951 - 397 - "-" "Jakarta Commons-HttpClient/3.0" -""", {'source_ip': '10.194.22.16', 'http_method': 'GET', 'protocol': 'HTTP/1.1', 'status': '200', 'user': 'DEADBEEF', 'method': 'REST.GET.ACL', 'program': 's3'}) def test_Snare(self): """Test Snare for Windows log normalization""" self.aS(unicode("""<13> Aug 31 15:46:47 a-zA-Z0-9_ MSWinEventLog 1 System 287 ven. août 26 16:45:45 201 4 Virtual Disk Service Constantin N/A Information a-zA-Z0-9_ None Le service s’est arrêté. 119 """, 'utf8'), {'snare_event_log_type': 'MSWinEventLog', 'criticality': '1', 'event_log_source_name': 'System', 'snare_event_counter': '287', 'event_id': '4', 'event_log_expanded_source_name': 'Virtual Disk Service', 'user': 'Constantin', 'sid_used': 'N/A', 'event_type': 'Information', 'source_host': 'a-zA-Z0-9_', 'audit_event_category': 'None', 'program' : 'EventLog', 'body': unicode('Le service s’est arrêté. 119 ', 'utf8')}) self.aS(unicode("""<13> Aug 31 15:46:47 a-zA-Z0-9_ MSWinEventLog 0 Security 284 ven. août 26 16:42:01 201 4689 Microsoft-Windows-Security-Auditing A-ZA-Z0-9_\\clo N/A Success Audit a-zA-Z0-9_ Fin du processus Un processus est terminé. Sujet : ID de sécurité : S-1-5-21-2423214773-420032381-3839276281-1000 Nom du compte : clo Domaine du compte : A-ZA-Z0-9_ ID d’ouverture de session : 0x21211 Informations sur le processus : ID du processus : 0xb4c Nom du processus : C:\\Windows\\System32\\taskeng.exe État de fin : 0x0 138 """, 'utf8'), {'snare_event_log_type': 'MSWinEventLog', 'criticality': '0', 'event_log_source_name': 'Security', 'snare_event_counter': '284', 'event_id': '4689', 'event_log_expanded_source_name': 'Microsoft-Windows-Security-Auditing', 'user': 'A-ZA-Z0-9_\\clo', 'sid_used': 'N/A', 'event_type': 'Success Audit', 'source_host': 'a-zA-Z0-9_', 'audit_event_category': 'Fin du processus', 'program' : "EventLog", 'body': unicode('Un processus est terminé. Sujet : ID de sécurité : S-1-5-21-2423214773-420032381-3839276281-1000 Nom du compte : clo Domaine du compte : A-ZA-Z0-9_ ID d’ouverture de session : 0x21211 Informations sur le processus : ID du processus : 0xb4c Nom du processus : C:\\Windows\\System32\\taskeng.exe État de fin : 0x0 138 ', 'utf8')}) def test_vmwareESX4_ESXi4(self): """Test VMware ESX 4.x and VMware ESXi 4.x log normalization""" self.aS("""[2011-09-05 16:06:30.016 F4CD1B90 verbose 'Locale' opID=996867CC-000002A6] Default resource used for 'host.SystemIdentificationInfo.IdentifierType.ServiceTag.summary' expected in module 'enum'.""", {'date': datetime(2011, 9, 5, 16, 6, 30), 'numeric': 'F4CD1B90', 'level': 'verbose', 'alpha': 'Locale', 'body': 'Default resource used for \'host.SystemIdentificationInfo.IdentifierType.ServiceTag.summary\' expected in module \'enum\'.'}) self.aS("""sysboot: Executing 'kill -TERM 314'""", {'body': 'Executing \'kill -TERM 314\''}) # def test_mysql(self): # """Test mysql log normalization""" # self.aS("""110923 11:04:58 36 Query show databases""", # {'date': datetime(2011, 9, 23, 11, 4, 58), # 'id': '36', # 'type': 'Query', # 'event': 'show databases'}) # self.aS("""110923 10:09:11 [Note] Plugin 'FEDERATED' is disabled.""", # {'date': datetime(2011, 9, 23, 10, 9, 11), # 'component': 'Note', # 'event': 'Plugin \'FEDERATED\' is disabled.'}) def test_IIS(self): """Test IIS log normalization""" self.aS("""172.16.255.255, anonymous, 03/20/01, 23:58:11, MSFTPSVC, SALES1, 172.16.255.255, 60, 275, 0, 0, 0, PASS, /Intro.htm, -,""", {'source_ip': '172.16.255.255', 'user': 'anonymous', 'date': datetime(2001, 3, 20, 23, 58, 11), 'service': 'MSFTPSVC', 'dest_host': 'SALES1', 'dest_ip': '172.16.255.255', 'time_taken': 0.06, 'sent_bytes_number': '275', 'returned_bytes_number': '0', 'status': '0', 'windows_status_code': '0', 'method': 'PASS', 'url_path': '/Intro.htm', 'script_parameters': '-'}) self.aS("""2011-09-26 13:57:48 W3SVC1 127.0.0.1 GET /tapage.asp - 80 - 127.0.0.1 Mozilla/4.0+(compatible;MSIE+6.0;+windows+NT5.2;+SV1;+.NET+CLR+1.1.4322) 404 0 2""", {'date': datetime(2011, 9, 26, 13, 57, 48), 'service': 'W3SVC1', 'dest_ip': '127.0.0.1', 'method': 'GET', 'url_path': '/tapage.asp', 'query': '-', 'port': '80', 'user': '-', 'source_ip': '127.0.0.1', 'useragent': 'Mozilla/4.0+(compatible;MSIE+6.0;+windows+NT5.2;+SV1;+.NET+CLR+1.1.4322)', 'status': '404', 'substatus': '0', 'win_status': '2'}) def test_fail2ban(self): """Test fail2ban ssh banishment logs""" self.aS("""2011-09-25 05:09:02,371 fail2ban.filter : INFO Log rotation detected for /var/log/auth.log""", {'program' : 'fail2ban', 'component' : 'filter', 'body' : "Log rotation detected for /var/log/auth.log", 'date' : datetime(2011,9,25,5,9,2).replace(microsecond = 371000)}) self.aS("""2011-09-25 21:59:24,304 fail2ban.actions: WARNING [ssh] Ban 219.117.199.6""", {'program' : 'fail2ban', 'component' : 'actions', 'action' : "Ban", 'protocol' : "ssh", 'source_ip' : "219.117.199.6", 'date' : datetime(2011,9,25,21,59,24).replace(microsecond = 304000)}) def test_bitdefender(self): """Test bitdefender spam.log (Mail Server for UNIX version)""" self.aS('10/20/2011 07:24:26 BDMAILD SPAM: sender: marcelo@nitex.com.br, recipients: re@corp.com, sender IP: 127.0.0.1, subject: "Lago para pesca, piscina, charrete, Hotel Fazenda", score: 1000, stamp: " v1, build 2.10.1.12405, blacklisted, total: 1000(750)", agent: Smtp Proxy 3.1.3, action: drop (move-to-quarantine;drop), header recipients: ( "cafe almoço e janta incluso" ), headers: ( "Received: from localhost [127.0.0.1] by BitDefender SMTP Proxy on localhost [127.0.0.1] for localhost [127.0.0.1]; Thu, 20 Oct 2011 07:24:26 +0200 (CEST)" "Received: from paris.office.corp.com (go.corp.lan [10.10.1.254]) by as-bd-64.ifr.lan (Postfix) with ESMTP id 4D23D1C7 for ; Thu, 20 Oct 2011 07:24:26 +0200 (CEST)" "Received: from rj50ssp.nitex.com.br (rj154ssp.nitex.com.br [177.47.99.154]) by paris.office.corp.com (Postfix) with ESMTP id 28C0D6A4891 for ; Thu, 20 Oct 2011 07:17:59 +0200 (CEST)" "Received: from rj154ssp.nitex.com.br (ced-sp.tuavitoria.com.br [177.47.99.13]) by rj50ssp.nitex.com.br (Postfix) with ESMTP id 9B867132C9E; Wed, 19 Oct 2011 22:29:20 -0200 (BRST)" ), group: "Default"', {'message_sender' : 'marcelo@nitex.com.br', 'program' : 'bitdefender', 'action' : 'drop', 'message_recipients' : 're@corp.com', 'date' : datetime(2011,10,20,07,24,26), 'reason' : 'blacklisted'}) self.aS('10/24/2011 04:31:39 BDSCAND ERROR: failed to initialize the AV core', {'program' : 'bitdefender', 'body' : 'failed to initialize the AV core', 'date' : datetime(2011,10,24,04,31,39)}) def test_simple_wabauth(self): """Test syslog logs""" self.aS("Dec 20 17:20:22 wab2 WAB(CORE)[18190]: type='session closed' username='admin' secondary='root@debian32' client_ip='10.10.4.25' src_protocol='SFTP_SESSION' dst_protocol='SFTP_SESSION' message=''", { 'account': 'root', 'client_ip': '10.10.4.25', 'date': datetime(2011, 12, 20, 17, 20, 22), 'dest_proto': 'SFTP_SESSION', 'message': '', 'pid': '18190', 'program': 'WAB(CORE)', 'resource': 'debian32', 'source': 'wab2', 'source_proto': 'SFTP_SESSION', 'type': 'session closed', 'username': 'admin'}) self.aS("Dec 20 17:19:35 wab2 WAB(CORE)[18190]: type='primary_authentication' timestamp='2011-12-20 17:19:35.621952' username='admin' client_ip='10.10.4.25' diagnostic='SUCCESS'", {'client_ip': '10.10.4.25', 'date': datetime(2011, 12, 20, 17, 19, 35), 'diagnostic': 'SUCCESS', 'pid': '18190', 'program': 'WAB(CORE)', 'source': 'wab2', 'type': 'primary_authentication', 'username': 'admin'}) self.aS("Dec 20 17:19:35 wab2 WAB(CORE)[18190]: type='session opened' username='admin' secondary='root@debian32' client_ip='10.10.4.25' src_protocol='SFTP_SESSION' dst_protocol='SFTP_SESSION' message=''", { 'account': 'root', 'client_ip': '10.10.4.25', 'date': datetime(2011, 12, 20, 17, 19, 35), 'dest_proto': 'SFTP_SESSION', 'message': '', 'pid': '18190', 'program': 'WAB(CORE)', 'resource': 'debian32', 'source': 'wab2', 'source_proto': 'SFTP_SESSION', 'type': 'session opened', 'username': 'admin'}) def test_xferlog(self): """Testing xferlog formatted logs""" self.aS("Thu Sep 2 09:52:00 2004 50 192.168.20.10 896242 /home/test/file1.tgz b _ o r suporte ftp 0 * c ", {'transfer_time' : '50', 'source_ip' : '192.168.20.10', 'len' : '896242', 'filename' : '/home/test/file1.tgz', 'transfer_type_code' : 'b', 'special_action_flag' : '_', 'direction_code' : 'o', 'access_mode_code' : 'r', 'completion_status_code' : 'c', 'authentication_method_code' : '0', 'transfer_type' : 'binary', 'special_action' : 'none', 'direction' : 'outgoing', 'access_mode' : 'real', 'completion_status' : 'complete', 'authentication_method' : 'none', 'user' : 'suporte', 'service_name' : 'ftp', 'authenticated_user_id' : '*', 'program' : 'ftpd', 'date' : datetime(2004,9,2,9,52),}) self.aS("Tue Dec 27 11:24:23 2011 1 127.0.0.1 711074 /home/mhu/Documents/Brooks,_Max_-_World_War_Z.mobi b _ o r mhu ftp 0 * c", {'transfer_time' : '1', 'source_ip' : '127.0.0.1', 'len' : '711074', 'filename' : '/home/mhu/Documents/Brooks,_Max_-_World_War_Z.mobi', 'transfer_type_code' : 'b', 'special_action_flag' : '_', 'direction_code' : 'o', 'access_mode_code' : 'r', 'completion_status_code' : 'c', 'authentication_method_code' : '0', 'transfer_type' : 'binary', 'special_action' : 'none', 'direction' : 'outgoing', 'access_mode' : 'real', 'completion_status' : 'complete', 'authentication_method' : 'none', 'user' : 'mhu', 'service_name' : 'ftp', 'authenticated_user_id' : '*', 'program' : 'ftpd', 'date' : datetime(2011,12,27,11,24,23),}) def test_dansguardian(self): """Testing dansguardian logs""" self.aS("2011.12.13 10:41:28 10.10.42.23 10.10.42.23 http://safebrowsing.clients.google.com/safebrowsing/downloads?client=Iceweasel&appver=3.5.16&pver=2.2&wrkey=AKEgNityGqylPYNyNETvnRjDjo4mIKcwv7f-8UCJaKERjXG6cXrikbgdA0AG6J8A6zng73h9U1GoE7P5ZPn0dDLmD_t3q1csCw== *EXCEPTION* Site interdit trouv&ecute;. POST 491 0 2 200 - limited_access -", {'program' : 'dansguardian', 'user' : '10.10.42.23', 'source_ip' : '10.10.42.23', 'url' : 'http://safebrowsing.clients.google.com/safebrowsing/downloads?client=Iceweasel&appver=3.5.16&pver=2.2&wrkey=AKEgNityGqylPYNyNETvnRjDjo4mIKcwv7f-8UCJaKERjXG6cXrikbgdA0AG6J8A6zng73h9U1GoE7P5ZPn0dDLmD_t3q1csCw==', 'actions' : "*EXCEPTION*", 'action' : 'EXCEPTION', 'reason' : "Site interdit trouv&ecute;.", "method" : "POST", "len" : "491", "naughtiness" : "0", "filter_group_number" : "2", "status" : "200", "mime_type" : "-", "filter_group_name" : "limited_access", 'date' : datetime(2011,12,13,10,41,28),}) def test_deny_event(self): """Testing denyAll event logs""" self.aS("""224,2011-01-24 17:44:46.061903,2011-01-24 17:44:46.061903,,,192.168.219.10,127.0.0.1,,2,1,4,0,"Session opened (read-write), Forwarded for 192.168.219.1.",superadmin,gui,,{403ec510-27d9-11e0-bbe7-000c298895c5}Session,,,,,,,,,,,,,,,,,,,,""", {'alert_id': '0', 'alert_subtype': 'Access', 'alert_subtype_id': '1', 'alert_type': 'System', 'alert_type_id': '2', 'alert_value': 'Session opened (read-write), Forwarded for 192.168.219.1.', 'body': '224,2011-01-24 17:44:46.061903,2011-01-24 17:44:46.061903,,,192.168.219.10,127.0.0.1,,2,1,4,0,"Session opened (read-write), Forwarded for 192.168.219.1.",superadmin,gui,,{403ec510-27d9-11e0-bbe7-000c298895c5}Session,,,,,,,,,,,,,,,,,,,,', 'date': datetime(2011, 1, 24, 17, 44, 46), 'end_date': '2011-01-24 17:44:46.061903', 'event': 'User successful login', 'event_uid': '224', 'interface': 'gui', 'ip_device': '192.168.219.10', 'parameter_changed': '{403ec510-27d9-11e0-bbe7-000c298895c5}Session', 'raw': '224,2011-01-24 17:44:46.061903,2011-01-24 17:44:46.061903,,,192.168.219.10,127.0.0.1,,2,1,4,0,"Session opened (read-write), Forwarded for 192.168.219.1.",superadmin,gui,,{403ec510-27d9-11e0-bbe7-000c298895c5}Session,,,,,,,,,,,,,,,,,,,,', 'severity': 'Warn', 'severity_code': '4', 'source_ip': '127.0.0.1', 'user': 'superadmin'}) self.aS("""1,2011-01-20 15:09:38.130965,2011-01-20 15:09:38.130965,,,::1,,,2,2,5,0,rWeb started.,,,,,,,,,,,,,,,,,,,,,,,,""", {'alert_id': '0', 'alert_subtype': 'Device Operations', 'alert_subtype_id': '2', 'alert_type': 'System', 'alert_type_id': '2', 'alert_value': 'rWeb started.', 'body': '1,2011-01-20 15:09:38.130965,2011-01-20 15:09:38.130965,,,::1,,,2,2,5,0,rWeb started.,,,,,,,,,,,,,,,,,,,,,,,,', 'date': datetime(2011, 1, 20, 15, 9, 38), 'end_date': '2011-01-20 15:09:38.130965', 'event': 'rWeb started', 'event_uid': '1', 'ip_device': '::1', 'raw': '1,2011-01-20 15:09:38.130965,2011-01-20 15:09:38.130965,,,::1,,,2,2,5,0,rWeb started.,,,,,,,,,,,,,,,,,,,,,,,,', 'severity': 'Notice', 'severity_code': '5'} ) def test_cisco_asa(self): """Testing CISCO ASA logs""" self.aS("""<168>Mar 05 2010 11:06:12 ciscoasa : %ASA-6-305011: Built dynamic TCP translation from 14net:14.36.103.220/300 to 172net:172.18.254.146/55""", {'program': 'cisco-asa', 'severity_code': '6', 'event_id': '305011', 'date': datetime(2010, 3, 5, 11, 6, 12), 'taxonomy': 'firewall', 'outbound_int': '172net', 'dest_port': '55'}) self.aS("""<168>Jul 02 2006 07:33:45 ciscoasa : %ASA-6-302013: Built outbound TCP connection 8300517 for outside:64.156.4.191/110 (64.156.4.191/110) to inside:192.168.8.12/3109 (xxx.xxx.185.142/11310)""", {'program': 'cisco-asa', 'severity_code': '6', 'event_id': '302013', 'date': datetime(2006, 7, 2, 7, 33, 45), 'taxonomy': 'firewall', 'outbound_int': 'inside', 'dest_ip': '192.168.8.12'}) if __name__ == "__main__": unittest.main() pylogsparser-0.4/tests/test_commonElements.py0000644000175000017500000001626411700571003017750 0ustar fbofbo# -*- python -*- # pylogsparser - Logs parsers python library # # Copyright (C) 2011 Wallix Inc. # # This library is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by the # Free Software Foundation; either version 2.1 of the License, or (at your # option) any later version. # # This library is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more # details. # # You should have received a copy of the GNU Lesser General Public License # along with this library; if not, write to the Free Software Foundation, Inc., # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # import os import unittest from datetime import datetime, timedelta from logsparser.normalizer import get_generic_tagTypes from logsparser.normalizer import get_generic_callBacks def get_sensible_year(*args): """args is a list of ordered date elements, from month and day (both mandatory) to eventual second. The function gives the most sensible year for that set of values, so that the date is not set in the future.""" year = int(datetime.now().year) d = datetime(year, *args) if d > datetime.now(): return year - 1 return year def generic_time_callback_test(instance, cb): """Testing time formatting callbacks. This is boilerplate code.""" # so far only time related callbacks were written. If it changes, list # here non related functions to skip in this test. instance.assertTrue(cb in instance.cb.keys()) DATES_TO_TEST = [ datetime.utcnow() + timedelta(-1), datetime.utcnow() + timedelta(-180), datetime.utcnow() + timedelta(1), # will always be considered as in the future unless you're testing on new year's eve... ] # The pattern translation list. Order is important ! translations = [ ("YYYY", "%Y"), ("YY" , "%y"), ("DDD" , "%a"), # localized day ("DD" , "%d"), # day with eventual leading 0 ("dd" , "%d"), ("MMM" , "%b"), # localized month ("MM" , "%m"), # month number with eventual leading 0 ("hh" , "%H"), ("mm" , "%M"), ("ss" , "%S") ] pattern = cb for old, new in translations: pattern = pattern.replace(old, new) # special cases if pattern == "ISO8601": pattern = "%Y-%m-%dT%H:%M:%SZ" for d in DATES_TO_TEST: if pattern == "EPOCH": value = d.strftime('%s') + ".%i" % (d.microsecond/1000) expected_result = datetime.utcfromtimestamp(float(value)) else: value = d.strftime(pattern) expected_result = datetime.strptime(value, pattern) # Deal with time formats that don't define a year explicitly if "%y" not in pattern.lower(): expected_year = get_sensible_year(*expected_result.timetuple()[1:-3]) expected_result = expected_result.replace(year = expected_year) log = {} instance.cb[cb](value, log) instance.assertTrue("date" in log.keys()) instance.assertEqual(log['date'], expected_result) class TestGenericLibrary(unittest.TestCase): """Unit testing for the generic libraries""" normalizer_path = os.environ['NORMALIZERS_PATH'] tagTypes = get_generic_tagTypes(os.path.join(normalizer_path, 'common_tagTypes.xml')) cb = get_generic_callBacks(os.path.join(normalizer_path, 'common_callBacks.xml')) def test_000_availability(self): """Testing libraries' availability""" self.assertTrue( self.tagTypes != {} ) self.assertTrue( self.cb != {} ) def test_010_test_tagTypes(self): """Testing tagTypes' accuracy""" self.assertTrue(self.tagTypes['EpochTime'].compiled_regexp.match('12934824.134')) self.assertTrue(self.tagTypes['EpochTime'].compiled_regexp.match('12934824')) self.assertTrue(self.tagTypes['syslogDate'].compiled_regexp.match('Jan 23 10:23:45')) self.assertTrue(self.tagTypes['syslogDate'].compiled_regexp.match('Oct 6 23:05:10')) self.assertTrue(self.tagTypes['URL'].compiled_regexp.match('http://www.wallix.org')) self.assertTrue(self.tagTypes['URL'].compiled_regexp.match('https://mysecuresite.com/?myparam=myvalue&myotherparam=myothervalue')) self.assertTrue(self.tagTypes['Email'].compiled_regexp.match('mhu@wallix.com')) self.assertTrue(self.tagTypes['Email'].compiled_regexp.match('matthieu.huin@wallix.com')) self.assertTrue(self.tagTypes['Email'].compiled_regexp.match('John-Fitzgerald.Willis@super-duper.institution.withlotsof.subdomains.org')) self.assertTrue(self.tagTypes['IP'].compiled_regexp.match('192.168.1.1')) self.assertTrue(self.tagTypes['IP'].compiled_regexp.match('255.255.255.0')) # shouldn't match ... self.assertTrue(self.tagTypes['IP'].compiled_regexp.match('999.888.777.666')) self.assertTrue(self.tagTypes['MACAddress'].compiled_regexp.match('0e:88:6a:4b:00:ff')) self.assertTrue(self.tagTypes['ZuluTime'].compiled_regexp.match('2012-12-21')) self.assertTrue(self.tagTypes['ZuluTime'].compiled_regexp.match('2012-12-21T12:34:56.99')) # I wish there was a way to create these tests on the fly ... def test_020_test_time_callback(self): """Testing callback MM/dd/YYYY hh:mm:ss""" generic_time_callback_test(self, "MM/dd/YYYY hh:mm:ss") def test_030_test_time_callback(self): """Testing callback dd/MMM/YYYY:hh:mm:ss""" generic_time_callback_test(self, "dd/MMM/YYYY:hh:mm:ss") def test_040_test_time_callback(self): """Testing callback MMM dd hh:mm:ss""" generic_time_callback_test(self, "MMM dd hh:mm:ss") def test_050_test_time_callback(self): """Testing callback DDD MMM dd hh:mm:ss YYYY""" generic_time_callback_test(self, "DDD MMM dd hh:mm:ss YYYY") def test_060_test_time_callback(self): """Testing callback YYYY-MM-DD hh:mm:ss""" generic_time_callback_test(self, "YYYY-MM-DD hh:mm:ss") def test_070_test_time_callback(self): """Testing callback MM/DD/YY, hh:mm:ss""" generic_time_callback_test(self, "MM/DD/YY, hh:mm:ss") def test_070_test_time_callback(self): """Testing callback YYMMDD hh:mm:ss""" generic_time_callback_test(self, "YYMMDD hh:mm:ss") def test_080_test_time_callback(self): """Testing callback ISO8601""" generic_time_callback_test(self, "ISO8601") def test_090_test_time_callback(self): """Testing callback EPOCH""" generic_time_callback_test(self, "EPOCH") def test_100_test_time_callback(self): """Testing callback dd-MMM-YYYY hh:mm:ss""" generic_time_callback_test(self, "dd-MMM-YYYY hh:mm:ss") if __name__ == "__main__": unittest.main() pylogsparser-0.4/setup.py0000644000175000017500000000452511715705366013740 0ustar fbofbo# -*- python -*- # pylogsparser - Logs parsers python library # # Copyright (C) 2011 Wallix SARL # # This library is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by the # Free Software Foundation; either version 2.1 of the License, or (at your # option) any later version. # # This library is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more # details. # # You should have received a copy of the GNU Lesser General Public License # along with this library; if not, write to the Free Software Foundation, Inc., # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # import os import glob from distutils.core import setup # Utility function to read the README file. # Used for the long_description. It's nice, because now 1) we have a top level # README file and 2) it's easier to type in the README file than to put a raw # string in below ... def read(fname): return open(os.path.join(os.path.dirname(__file__), fname)).read() data = glob.glob('normalizers/*.xml') data.extend(glob.glob('normalizers/*.template')) data.extend(glob.glob('normalizers/*.dtd')) fr_trans = glob.glob('logsparser/i18n/fr_FR/LC_MESSAGES/normalizer.*') setup( name = "pylogsparser", version = "0.4", author = "Wallix", author_email = "opensource@wallix.org", description = ("A log parser library packaged with a set of ready to use parsers (DHCPd, Squid, Apache, ...)"), license = "LGPL", keywords = "log parser xml library python", url = "http://www.wallix.org/pylogsparser-project/", package_dir={'logsparser.tests':'tests'}, packages=['logsparser', 'logsparser.tests', 'logsparser.extras'], data_files=[('share/logsparser/normalizers', data), ('share/logsparser/i18n/fr_FR/LC_MESSAGES/', fr_trans),], requires=['lxml', 'pytz'], long_description=read('README.rst'), # http://pypi.python.org/pypi?:action=list_classifiers classifiers=[ "Development Status :: 4 - Beta", "Topic :: System :: Logging", "Topic :: Software Development :: Libraries", "License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)", ], ) pylogsparser-0.4/logsparser/0000755000175000017500000000000011715707344014377 5ustar fbofbopylogsparser-0.4/logsparser/__init__.py0000644000175000017500000000000011627706151016473 0ustar fbofbopylogsparser-0.4/logsparser/i18n/0000755000175000017500000000000011715707344015156 5ustar fbofbopylogsparser-0.4/logsparser/i18n/fr_FR/0000755000175000017500000000000011715707344016154 5ustar fbofbopylogsparser-0.4/logsparser/i18n/fr_FR/LC_MESSAGES/0000755000175000017500000000000011715707344017741 5ustar fbofbopylogsparser-0.4/logsparser/i18n/fr_FR/LC_MESSAGES/normalizer.po0000644000175000017500000000412711705765631022471 0ustar fbofbo# SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER # This file is distributed under the same license as the PACKAGE package. # FIRST AUTHOR , YEAR. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2012-01-18 12:36+0100\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "Language: \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: 8bit\n" #: logsparser/normalizer.py:738 #, python-format msgid "" "%(title)s\n" "\n" "**Written by**\n" "\n" "%(authors)s\n" "\n" "Description\n" ":::::::::::\n" "\n" "%(description)s %(taxonomy)s\n" "\n" "This normalizer can parse logs of the following structure(s):\n" "\n" "%(patterns)s\n" "\n" "Examples\n" "::::::::\n" "\n" "%(examples)s" msgstr "" "%(title)s\n" "\n" "**Auteur(s)**\n" "\n" "%(authors)s\n" "\n" "Description\n" ":::::::::::\n" "\n" "%(description)s\n %(taxonomy)s\n" "\n" "Ce normaliseur reconnaît les logs structurés de la façon suivante:\n" "\n" "%(patterns)s\n" "\n" "Exemples\n" "::::::::\n" "\n" "%(examples)s" #: logsparser/normalizer.py:762 logsparser/normalizer.py:773 msgid "undocumented" msgstr "non documenté" #: logsparser/normalizer.py:766 #, python-format msgid "This normalizer belongs to the category : *%s*" msgstr "Ce normaliseur appartient à la catégorie : *%s*" #: logsparser/normalizer.py:771 msgid "" ", where\n" "\n" msgstr "où\n" "\n" #: logsparser/normalizer.py:773 #, python-format msgid " * **%s** is %s " msgstr " * **%s** est %s " #: logsparser/normalizer.py:775 #, python-format msgid "(normalized as *%s*)" msgstr "(tag associé : *%s*)" #: logsparser/normalizer.py:778 msgid "" "\n" " Additionally, The following tags are automatically set:\n" "\n" msgstr "\n" " Les tags additionnels suivants sont définis automatiquement :\n" "\n" #: logsparser/normalizer.py:788 #, python-format msgid "" "* *%s*, normalized as\n" "\n" msgstr "" "* *%s*, dont les tags suivants sont extraits:\n" "\n" pylogsparser-0.4/logsparser/i18n/fr_FR/LC_MESSAGES/normalizer.mo0000644000175000017500000000255211705765631022466 0ustar fbofbo d <0 ).3 bBoC /#1)[ Additionally, The following tags are automatically set: * **%s** is %s %(title)s **Written by** %(authors)s Description ::::::::::: %(description)s %(taxonomy)s This normalizer can parse logs of the following structure(s): %(patterns)s Examples :::::::: %(examples)s(normalized as *%s*)* *%s*, normalized as , where This normalizer belongs to the category : *%s*undocumentedProject-Id-Version: PACKAGE VERSION Report-Msgid-Bugs-To: POT-Creation-Date: 2012-01-18 12:36+0100 PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE Last-Translator: FULL NAME Language-Team: LANGUAGE Language: MIME-Version: 1.0 Content-Type: text/plain; charset=CHARSET Content-Transfer-Encoding: 8bit Les tags additionnels suivants sont définis automatiquement : * **%s** est %s %(title)s **Auteur(s)** %(authors)s Description ::::::::::: %(description)s %(taxonomy)s Ce normaliseur reconnaît les logs structurés de la façon suivante: %(patterns)s Exemples :::::::: %(examples)s(tag associé : *%s*)* *%s*, dont les tags suivants sont extraits: où Ce normaliseur appartient à la catégorie : *%s*non documentépylogsparser-0.4/logsparser/lognormalizer.py0000644000175000017500000002622411700571003017624 0ustar fbofbo# -*- python -*- # pylogsparser - Logs parsers python library # # Copyright (C) 2011 Wallix Inc. # # This library is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by the # Free Software Foundation; either version 2.1 of the License, or (at your # option) any later version. # # This library is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more # details. # # You should have received a copy of the GNU Lesser General Public License # along with this library; if not, write to the Free Software Foundation, Inc., # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # """This module exposes the L{LogNormalizer} class that can be used for higher-level management of the normalization flow. Using this module is in no way mandatory in order to benefit from the normalization system; the C{LogNormalizer} class provides basic facilities for further integration in a wider project (web services, ...). """ import os import uuid as _UUID_ import warnings import StringIO from normalizer import Normalizer from lxml.etree import parse, DTD, fromstring as XMLfromstring class LogNormalizer(): """Basic normalization flow manager. Normalizers definitions are loaded from a path and checked against the DTD. If the definitions are syntactically correct, the normalizers are instantiated and populate the manager's cache. Normalization priormority is established as follows: * Maximum priority assigned to normalizers where the "appliedTo" tag is set to "raw". They MUST be mutually exclusive. * Medium priority assigned to normalizers where the "appliedTo" tag is set to "body". * Lowest priority assigned to any remaining normalizers. Some extra treatment is also done prior and after the log normalization: * Assignment of a unique ID, under the tag "uuid" * Conversion of date tags to UTC, if the "_timezone" was set prior to the normalization process.""" def __init__(self, normalizers_paths, active_normalizers = {}): """ Instantiates a flow manager. The default behavior is to activate every available normalizer. @param normalizers_paths: a list of absolute paths to the normalizer XML definitions to use or a just a single path as str. @param active_normalizers: a dictionary of active normalizers in the form {name: [True|False]}. """ if not isinstance(normalizers_paths, list or tuple): normalizers_paths = [normalizers_paths,] self.normalizers_paths = normalizers_paths self.active_normalizers = active_normalizers self.dtd, self.ctt, self.ccb = None, None, None # Walk through paths for normalizer.dtd and common_tagTypes.xml # /!\ dtd file and common elements will be overrriden if present in # many directories. for norm_path in self.normalizers_paths: if not os.path.isdir(norm_path): raise ValueError, "Invalid normalizer directory : %s" % norm_path dtd = os.path.join(norm_path, 'normalizer.dtd') ctt = os.path.join(norm_path, 'common_tagTypes.xml') ccb = os.path.join(norm_path, 'common_callBacks.xml') if os.path.isfile(dtd): self.dtd = DTD(open(dtd)) if os.path.isfile(ctt): self.ctt = ctt if os.path.isfile(ccb): self.ccb = ccb # Technically the common elements files should NOT be mandatory. # But many normalizers use them, so better safe than sorry. if not self.dtd or not self.ctt or not self.ccb: raise StandardError, "Missing DTD or common library files" self._cache = [] self.reload() def reload(self): """Refreshes this instance's normalizers pool.""" self.normalizers = { 'raw' : [], 'body' : [] } for path in self.iter_normalizer(): norm = parse(open(path)) if not self.dtd.validate(norm): warnings.warn('Skipping %s : invalid DTD' % path) print 'invalid normalizer ', path else: normalizer = Normalizer(norm, self.ctt, self.ccb) normalizer.uuid = self._compute_norm_uuid(normalizer) self.normalizers.setdefault(normalizer.appliedTo, []) self.normalizers[normalizer.appliedTo].append(normalizer) self.activate_normalizers() def _compute_norm_uuid(self, normalizer): return "%s-%s" % (normalizer.name, normalizer.version) def iter_normalizer(self): """ Iterates through normalizers and returns the normalizers' paths. @return: a generator of absolute paths. """ for path in self.normalizers_paths: for root, dirs, files in os.walk(path): for name in files: if not name.startswith('common_tagTypes') and \ not name.startswith('common_callBacks') and \ name.endswith('.xml'): yield os.path.join(root, name) def __len__(self): """ Returns the amount of available normalizers. """ return len([n for n in self.iter_normalizer()]) def update_normalizer(self, raw_xml_contents, name = None, dir_path = None ): """used to add or update a normalizer. @param raw_xml_contents: XML description of normalizer as flat XML. It must comply to the DTD. @param name: if set, the XML description will be saved as name.xml. If left blank, name will be fetched from the XML description. @param dir_path: the path to the directory where to copy the given normalizer. """ path = self.normalizers_paths[0] if dir_path: if dir_path in self.normalizers_paths: path = dir_path xmlconf = XMLfromstring(raw_xml_contents).getroottree() if not self.dtd.validate(xmlconf): raise ValueError, "This definition file does not follow the normalizers DTD :\n\n%s" % \ self.dtd.error_log.filter_from_errors() if not name: name = xmlconf.getroot().get('name') if not name.endswith('.xml'): name += '.xml' xmlconf.write(open(os.path.join(path, name), 'w'), encoding = 'utf8', method = 'xml', pretty_print = True) self.reload() def get_normalizer_by_uuid(self, uuid): """Returns normalizer by uuid.""" try: norm = [ u for u in sum(self.normalizers.values(), []) if u.uuid == uuid][0] return norm except: raise ValueError, "Normalizer uuid : %s not found" % uuid def get_normalizer_source(self, uuid): """Returns the raw XML source of normalizer uuid.""" return self.get_normalizer_by_uuid(uuid).get_source() def get_normalizer_path(self, uuid): """Returns the filesystem path of a normalizer.""" return self.get_normalizer_by_uuid(uuid).sys_path def activate_normalizers(self): """Activates normalizers according to what was set by calling set_active_normalizers. If no call to the latter function has been made so far, this method activates every normalizer.""" if not self.active_normalizers: self.active_normalizers = dict([ (n.uuid, True) for n in \ sum([ v for v in self.normalizers.values()], []) ]) # fool-proof the list self.set_active_normalizers(self.active_normalizers) # build an ordered cache to speed things up self._cache = [] # First normalizers to apply are the "raw" ones. for norm in self.normalizers['raw']: # consider the normalizer to be inactive if not # explicitly in our list if self.active_normalizers.get(norm.uuid, False): self._cache.append(norm) # Then, apply the applicative normalization on "body": for norm in self.normalizers['body']: if self.active_normalizers.get(norm.uuid, False): self._cache.append(norm) # Then, apply everything else for norm in sum([ self.normalizers[u] for u in self.normalizers if u not in ['raw', 'body']], []): if self.active_normalizers.get(norm.uuid, False): self._cache.append(norm) def get_active_normalizers(self): """Returns a dictionary of normalizers; keys are normalizers' uuid and values are True|False according to the normalizer's activation state.""" return self.active_normalizers def set_active_normalizers(self, norms = {}): """Sets the active/inactive normalizers. Default behavior is to deactivate every normalizer. @param norms: a dictionary, similar to the one returned by get_active_normalizers.""" default = dict([ (n.uuid, False) for n in \ sum([ v for v in self.normalizers.values()], []) ]) default.update(norms) self.active_normalizers = default def lognormalize(self, data): """ This method is the entry point to normalize data (a log). data is passed through every activated normalizer and extra tagging occurs accordingly. data receives also an extra uuid tag. @param data: must be a dictionary with at least a key 'raw' or 'body' with BaseString values (preferably Unicode). Here an example : >>> from logsparser import lognormalizer >>> from pprint import pprint >>> ln = lognormalizer.LogNormalizer('/usr/local/share/normalizers/') >>> mylog = {'raw' : 'Jul 18 15:35:01 zoo /USR/SBIN/CRON[14338]: (root) CMD (/srv/git/redmine-changesets.sh)'} >>> ln.lognormalize(mylog) >>> pprint mylog {'body': '(root) CMD (/srv/git/redmine-changesets.sh)', 'date': datetime.datetime(2011, 7, 18, 15, 35, 1), 'pid': '14338', 'program': '/USR/SBIN/CRON', 'raw': 'Jul 18 15:35:01 zoo /USR/SBIN/CRON[14338]: (root) CMD (/srv/git/redmine-changesets.sh)', 'source': 'zoo', 'uuid': 70851882840934161193887647073096992594L} """ data = self.uuidify(data) data = self.normalize(data) # some more functions for clarity def uuidify(self, log): """Adds a unique UID to the normalized log.""" log["uuid"] = _UUID_.uuid4().int return log def normalize(self, log): """plain normalization.""" for norm in self._cache: log = norm.normalize(log) return log def _normalize(self, log): """Used for testing only, the normalizers' tags prerequisite are deactivated.""" for norm in self._cache: log = norm.normalize(log, do_not_check_prereq = True) return log pylogsparser-0.4/logsparser/extras/0000755000175000017500000000000011715707344015705 5ustar fbofbopylogsparser-0.4/logsparser/extras/domain_parser.py0000644000175000017500000014065711715703401021105 0ustar fbofbo# -*- coding: utf-8 -*- # -*- python -*- # pylogsparser - Logs parsers python library # # Copyright (C) 2011 Wallix Inc. # # This library is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by the # Free Software Foundation; either version 2.1 of the License, or (at your # option) any later version. # # This library is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more # details. # # You should have received a copy of the GNU Lesser General Public License # along with this library; if not, write to the Free Software Foundation, Inc., # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # """Here we define a function that can parse FQDNs that are IANA compliant.""" tld = set(("ac", "com.ac", "edu.ac", "gov.ac", "net.ac", "mil.ac", "org.ac", "ad", "nom.ad", "ae", "co.ae", "net.ae", "org.ae", "sch.ae", "ac.ae", "gov.ae", "mil.ae", "aero", "accident-investigation.aero", "accident-prevention.aero", "aerobatic.aero", "aeroclub.aero", "aerodrome.aero", "agents.aero", "aircraft.aero", "airline.aero", "airport.aero", "air-surveillance.aero", "airtraffic.aero", "air-traffic-control.aero", "ambulance.aero", "amusement.aero", "association.aero", "author.aero", "ballooning.aero", "broker.aero", "caa.aero", "cargo.aero", "catering.aero", "certification.aero", "championship.aero", "charter.aero", "civilaviation.aero", "club.aero", "conference.aero", "consultant.aero", "consulting.aero", "control.aero", "council.aero", "crew.aero", "design.aero", "dgca.aero", "educator.aero", "emergency.aero", "engine.aero", "engineer.aero", "entertainment.aero", "equipment.aero", "exchange.aero", "express.aero", "federation.aero", "flight.aero", "freight.aero", "fuel.aero", "gliding.aero", "government.aero", "groundhandling.aero", "group.aero", "hanggliding.aero", "homebuilt.aero", "insurance.aero", "journal.aero", "journalist.aero", "leasing.aero", "logistics.aero", "magazine.aero", "maintenance.aero", "marketplace.aero", "media.aero", "microlight.aero", "modelling.aero", "navigation.aero", "parachuting.aero", "paragliding.aero", "passenger-association.aero", "pilot.aero", "press.aero", "production.aero", "recreation.aero", "repbody.aero", "res.aero", "research.aero", "rotorcraft.aero", "safety.aero", "scientist.aero", "services.aero", "show.aero", "skydiving.aero", "software.aero", "student.aero", "taxi.aero", "trader.aero", "trading.aero", "trainer.aero", "union.aero", "workinggroup.aero", "works.aero", "af", "gov.af", "com.af", "org.af", "net.af", "edu.af", "ag", "com.ag", "org.ag", "net.ag", "co.ag", "nom.ag", "ai", "off.ai", "com.ai", "net.ai", "org.ai", "al", "com.al", "edu.al", "gov.al", "mil.al", "net.al", "org.al", "am", "an", "com.an", "net.an", "org.an", "edu.an", "ao", "ed.ao", "gv.ao", "og.ao", "co.ao", "pb.ao", "it.ao", "aq", "*.ar", "!congresodelalengua3.ar", "!educ.ar", "!gobiernoelectronico.ar", "!mecon.ar", "!nacion.ar", "!nic.ar", "!promocion.ar", "!retina.ar", "!uba.ar", "e164.arpa", "in-addr.arpa", "ip6.arpa", "iris.arpa", "uri.arpa", "urn.arpa", "as", "gov.as", "asia", "at", "ac.at", "co.at", "gv.at", "or.at", "biz.at", "info.at", "priv.at", "*.au", "act.edu.au", "nsw.edu.au", "nt.edu.au", "qld.edu.au", "sa.edu.au", "tas.edu.au", "vic.edu.au", "wa.edu.au", "act.gov.au", "nt.gov.au", "qld.gov.au", "sa.gov.au", "tas.gov.au", "vic.gov.au", "wa.gov.au", "aw", "com.aw", "ax", "az", "com.az", "net.az", "int.az", "gov.az", "org.az", "edu.az", "info.az", "pp.az", "mil.az", "name.az", "pro.az", "biz.az", "ba", "org.ba", "net.ba", "edu.ba", "gov.ba", "mil.ba", "unsa.ba", "unbi.ba", "co.ba", "com.ba", "rs.ba", "bb", "biz.bb", "com.bb", "edu.bb", "gov.bb", "info.bb", "net.bb", "org.bb", "store.bb", "*.bd", "be", "ac.be", "bf", "gov.bf", "bg", "a.bg", "b.bg", "c.bg", "d.bg", "e.bg", "f.bg", "g.bg", "h.bg", "i.bg", "j.bg", "k.bg", "l.bg", "m.bg", "n.bg", "o.bg", "p.bg", "q.bg", "r.bg", "s.bg", "t.bg", "u.bg", "v.bg", "w.bg", "x.bg", "y.bg", "z.bg", "0.bg", "1.bg", "2.bg", "3.bg", "4.bg", "5.bg", "6.bg", "7.bg", "8.bg", "9.bg", "bh", "com.bh", "bi", "co.bi", "com.bi", "edu.bi", "or.bi", "org.bi", "biz", "bj", "asso.bj", "barreau.bj", "gouv.bj", "bm", "com.bm", "edu.bm", "gov.bm", "net.bm", "org.bm", "*.bn", "bo", "com.bo", "edu.bo", "gov.bo", "gob.bo", "int.bo", "org.bo", "net.bo", "mil.bo", "tv.bo", "br", "adm.br", "adv.br", "agr.br", "am.br", "arq.br", "art.br", "ato.br", "bio.br", "blog.br", "bmd.br", "can.br", "cim.br", "cng.br", "cnt.br", "com.br", "coop.br", "ecn.br", "edu.br", "eng.br", "esp.br", "etc.br", "eti.br", "far.br", "flog.br", "fm.br", "fnd.br", "fot.br", "fst.br", "g12.br", "ggf.br", "gov.br", "imb.br", "ind.br", "inf.br", "jor.br", "jus.br", "lel.br", "mat.br", "med.br", "mil.br", "mus.br", "net.br", "nom.br", "not.br", "ntr.br", "odo.br", "org.br", "ppg.br", "pro.br", "psc.br", "psi.br", "qsl.br", "rec.br", "slg.br", "srv.br", "tmp.br", "trd.br", "tur.br", "tv.br", "vet.br", "vlog.br", "wiki.br", "zlg.br", "bs", "com.bs", "net.bs", "org.bs", "edu.bs", "gov.bs", "*.bt", "bw", "co.bw", "org.bw", "by", "gov.by", "mil.by", "com.by", "of.by", "bz", "com.bz", "net.bz", "org.bz", "edu.bz", "gov.bz", "ca", "ab.ca", "bc.ca", "mb.ca", "nb.ca", "nf.ca", "nl.ca", "ns.ca", "nt.ca", "nu.ca", "on.ca", "pe.ca", "qc.ca", "sk.ca", "yk.ca", "gc.ca", "cat", "cc", "cd", "gov.cd", "cf", "cg", "ch", "ci", "org.ci", "or.ci", "com.ci", "co.ci", "edu.ci", "ed.ci", "ac.ci", "net.ci", "go.ci", "asso.ci", "aéroport.ci", "int.ci", "presse.ci", "md.ci", "gouv.ci", "*.ck", "cl", "gov.cl", "gob.cl", "cm", "gov.cm", "cn", "ac.cn", "com.cn", "edu.cn", "gov.cn", "net.cn", "org.cn", "mil.cn", "公司.cn", "网络.cn", "網絡.cn", "ah.cn", "bj.cn", "cq.cn", "fj.cn", "gd.cn", "gs.cn", "gz.cn", "gx.cn", "ha.cn", "hb.cn", "he.cn", "hi.cn", "hl.cn", "hn.cn", "jl.cn", "js.cn", "jx.cn", "ln.cn", "nm.cn", "nx.cn", "qh.cn", "sc.cn", "sd.cn", "sh.cn", "sn.cn", "sx.cn", "tj.cn", "xj.cn", "xz.cn", "yn.cn", "zj.cn", "hk.cn", "mo.cn", "tw.cn", "co", "arts.co", "com.co", "edu.co", "firm.co", "gov.co", "info.co", "int.co", "mil.co", "net.co", "nom.co", "org.co", "rec.co", "web.co", "com", "ar.com", "br.com", "cn.com", "de.com", "eu.com", "gb.com", "hu.com", "jpn.com", "kr.com", "no.com", "qc.com", "ru.com", "sa.com", "se.com", "uk.com", "us.com", "uy.com", "za.com", "operaunite.com", "coop", "cr", "ac.cr", "co.cr", "ed.cr", "fi.cr", "go.cr", "or.cr", "sa.cr", "cu", "com.cu", "edu.cu", "org.cu", "net.cu", "gov.cu", "inf.cu", "cv", "cx", "gov.cx", "*.cy", "cz", "de", "dj", "dk", "dm", "com.dm", "net.dm", "org.dm", "edu.dm", "gov.dm", "*.do", "dz", "com.dz", "org.dz", "net.dz", "gov.dz", "edu.dz", "asso.dz", "pol.dz", "art.dz", "ec", "com.ec", "info.ec", "net.ec", "fin.ec", "k12.ec", "med.ec", "pro.ec", "org.ec", "edu.ec", "gov.ec", "mil.ec", "edu", "ee", "edu.ee", "gov.ee", "riik.ee", "lib.ee", "med.ee", "com.ee", "pri.ee", "aip.ee", "org.ee", "fie.ee", "*.eg", "*.er", "es", "com.es", "nom.es", "org.es", "gob.es", "edu.es", "*.et", "eu", "fi", "aland.fi", "iki.fi", "*.fj", "*.fk", "fm", "fo", "fr", "com.fr", "asso.fr", "nom.fr", "prd.fr", "presse.fr", "tm.fr", "aeroport.fr", "assedic.fr", "avocat.fr", "avoues.fr", "cci.fr", "chambagri.fr", "chirurgiens-dentistes.fr", "experts-comptables.fr", "geometre-expert.fr", "gouv.fr", "greta.fr", "huissier-justice.fr", "medecin.fr", "notaires.fr", "pharmacien.fr", "port.fr", "veterinaire.fr", "ga", "gd", "ge", "com.ge", "edu.ge", "gov.ge", "org.ge", "mil.ge", "net.ge", "pvt.ge", "gf", "gg", "co.gg", "org.gg", "net.gg", "sch.gg", "gov.gg", "gh", "com.gh", "edu.gh", "gov.gh", "org.gh", "mil.gh", "gi", "com.gi", "ltd.gi", "gov.gi", "mod.gi", "edu.gi", "org.gi", "gl", "gm", "ac.gn", "com.gn", "edu.gn", "gov.gn", "org.gn", "net.gn", "gov", "gp", "com.gp", "net.gp", "mobi.gp", "edu.gp", "org.gp", "asso.gp", "gq", "gr", "com.gr", "edu.gr", "net.gr", "org.gr", "gov.gr", "gs", "*.gt", "*.gu", "gw", "gy", "co.gy", "com.gy", "net.gy", "hk", "com.hk", "edu.hk", "gov.hk", "idv.hk", "net.hk", "org.hk", "公司.hk", "教育.hk", "敎育.hk", "政府.hk", "個人.hk", "个人.hk", "箇人.hk", "網络.hk", "网络.hk", "组織.hk", "網絡.hk", "网絡.hk", "组织.hk", "組織.hk", "組织.hk", "hm", "hn", "com.hn", "edu.hn", "org.hn", "net.hn", "mil.hn", "gob.hn", "hr", "iz.hr", "from.hr", "name.hr", "com.hr", "ht", "com.ht", "shop.ht", "firm.ht", "info.ht", "adult.ht", "net.ht", "pro.ht", "org.ht", "med.ht", "art.ht", "coop.ht", "pol.ht", "asso.ht", "edu.ht", "rel.ht", "gouv.ht", "perso.ht", "hu", "co.hu", "info.hu", "org.hu", "priv.hu", "sport.hu", "tm.hu", "2000.hu", "agrar.hu", "bolt.hu", "casino.hu", "city.hu", "erotica.hu", "erotika.hu", "film.hu", "forum.hu", "games.hu", "hotel.hu", "ingatlan.hu", "jogasz.hu", "konyvelo.hu", "lakas.hu", "media.hu", "news.hu", "reklam.hu", "sex.hu", "shop.hu", "suli.hu", "szex.hu", "tozsde.hu", "utazas.hu", "video.hu", "*.id", "ie", "gov.ie", "*.il", "im", "co.im", "ltd.co.im", "plc.co.im", "net.im", "gov.im", "org.im", "nic.im", "ac.im", "in", "co.in", "firm.in", "net.in", "org.in", "gen.in", "ind.in", "nic.in", "ac.in", "edu.in", "res.in", "gov.in", "mil.in", "info", "int", "eu.int", "io", "com.io", "iq", "gov.iq", "edu.iq", "mil.iq", "com.iq", "org.iq", "net.iq", "ir", "ac.ir", "co.ir", "gov.ir", "id.ir", "net.ir", "org.ir", "sch.ir", "is", "net.is", "com.is", "edu.is", "gov.is", "org.is", "int.is", "it", "gov.it", "edu.it", "agrigento.it", "ag.it", "alessandria.it", "al.it", "ancona.it", "an.it", "aosta.it", "aoste.it", "ao.it", "arezzo.it", "ar.it", "ascoli-piceno.it", "ascolipiceno.it", "ap.it", "asti.it", "at.it", "avellino.it", "av.it", "bari.it", "ba.it", "barlettaandriatrani.it", "barletta-andria-trani.it", "belluno.it", "bl.it", "benevento.it", "bn.it", "bergamo.it", "bg.it", "biella.it", "bi.it", "bologna.it", "bo.it", "bolzano.it", "bozen.it", "balsan.it", "alto-adige.it", "altoadige.it", "suedtirol.it", "bz.it", "brescia.it", "bs.it", "brindisi.it", "br.it", "cagliari.it", "ca.it", "caltanissetta.it", "cl.it", "campobasso.it", "cb.it", "caserta.it", "ce.it", "catania.it", "ct.it", "catanzaro.it", "cz.it", "chieti.it", "ch.it", "como.it", "co.it", "cosenza.it", "cs.it", "cremona.it", "cr.it", "crotone.it", "kr.it", "cuneo.it", "cn.it", "enna.it", "en.it", "fermo.it", "ferrara.it", "fe.it", "firenze.it", "florence.it", "fi.it", "foggia.it", "fg.it", "forli-cesena.it", "forlicesena.it", "fc.it", "frosinone.it", "fr.it", "genova.it", "genoa.it", "ge.it", "gorizia.it", "go.it", "grosseto.it", "gr.it", "imperia.it", "im.it", "isernia.it", "is.it", "laquila.it", "aquila.it", "aq.it", "la-spezia.it", "laspezia.it", "sp.it", "latina.it", "lt.it", "lecce.it", "le.it", "lecco.it", "lc.it", "livorno.it", "li.it", "lodi.it", "lo.it", "lucca.it", "lu.it", "macerata.it", "mc.it", "mantova.it", "mn.it", "massa-carrara.it", "massacarrara.it", "ms.it", "matera.it", "mt.it", "messina.it", "me.it", "milano.it", "milan.it", "mi.it", "modena.it", "mo.it", "monza.it", "napoli.it", "naples.it", "na.it", "novara.it", "no.it", "nuoro.it", "nu.it", "oristano.it", "or.it", "padova.it", "padua.it", "pd.it", "palermo.it", "pa.it", "parma.it", "pr.it", "pavia.it", "pv.it", "perugia.it", "pg.it", "pescara.it", "pe.it", "pesaro-urbino.it", "pesarourbino.it", "pu.it", "piacenza.it", "pc.it", "pisa.it", "pi.it", "pistoia.it", "pt.it", "pordenone.it", "pn.it", "potenza.it", "pz.it", "prato.it", "po.it", "ragusa.it", "rg.it", "ravenna.it", "ra.it", "reggio-calabria.it", "reggiocalabria.it", "rc.it", "reggio-emilia.it", "reggioemilia.it", "re.it", "rieti.it", "ri.it", "rimini.it", "rn.it", "roma.it", "rome.it", "rm.it", "rovigo.it", "ro.it", "salerno.it", "sa.it", "sassari.it", "ss.it", "savona.it", "sv.it", "siena.it", "si.it", "siracusa.it", "sr.it", "sondrio.it", "so.it", "taranto.it", "ta.it", "teramo.it", "te.it", "terni.it", "tr.it", "torino.it", "turin.it", "to.it", "trapani.it", "tp.it", "trento.it", "trentino.it", "tn.it", "treviso.it", "tv.it", "trieste.it", "ts.it", "udine.it", "ud.it", "varese.it", "va.it", "venezia.it", "venice.it", "ve.it", "verbania.it", "vb.it", "vercelli.it", "vc.it", "verona.it", "vr.it", "vibo-valentia.it", "vibovalentia.it", "vv.it", "vicenza.it", "vi.it", "viterbo.it", "vt.it", "je", "co.je", "org.je", "net.je", "sch.je", "gov.je", "*.jm", "jo", "com.jo", "org.jo", "net.jo", "edu.jo", "sch.jo", "gov.jo", "mil.jo", "name.jo", "jobs", "jp", "ac.jp", "ad.jp", "co.jp", "ed.jp", "go.jp", "gr.jp", "lg.jp", "ne.jp", "or.jp", "*.aichi.jp", "*.akita.jp", "*.aomori.jp", "*.chiba.jp", "*.ehime.jp", "*.fukui.jp", "*.fukuoka.jp", "*.fukushima.jp", "*.gifu.jp", "*.gunma.jp", "*.hiroshima.jp", "*.hokkaido.jp", "*.hyogo.jp", "*.ibaraki.jp", "*.ishikawa.jp", "*.iwate.jp", "*.kagawa.jp", "*.kagoshima.jp", "*.kanagawa.jp", "*.kawasaki.jp", "*.kitakyushu.jp", "*.kobe.jp", "*.kochi.jp", "*.kumamoto.jp", "*.kyoto.jp", "*.mie.jp", "*.miyagi.jp", "*.miyazaki.jp", "*.nagano.jp", "*.nagasaki.jp", "*.nagoya.jp", "*.nara.jp", "*.niigata.jp", "*.oita.jp", "*.okayama.jp", "*.okinawa.jp", "*.osaka.jp", "*.saga.jp", "*.saitama.jp", "*.sapporo.jp", "*.sendai.jp", "*.shiga.jp", "*.shimane.jp", "*.shizuoka.jp", "*.tochigi.jp", "*.tokushima.jp", "*.tokyo.jp", "*.tottori.jp", "*.toyama.jp", "*.wakayama.jp", "*.yamagata.jp", "*.yamaguchi.jp", "*.yamanashi.jp", "*.yokohama.jp", "!metro.tokyo.jp", "!pref.aichi.jp", "!pref.akita.jp", "!pref.aomori.jp", "!pref.chiba.jp", "!pref.ehime.jp", "!pref.fukui.jp", "!pref.fukuoka.jp", "!pref.fukushima.jp", "!pref.gifu.jp", "!pref.gunma.jp", "!pref.hiroshima.jp", "!pref.hokkaido.jp", "!pref.hyogo.jp", "!pref.ibaraki.jp", "!pref.ishikawa.jp", "!pref.iwate.jp", "!pref.kagawa.jp", "!pref.kagoshima.jp", "!pref.kanagawa.jp", "!pref.kochi.jp", "!pref.kumamoto.jp", "!pref.kyoto.jp", "!pref.mie.jp", "!pref.miyagi.jp", "!pref.miyazaki.jp", "!pref.nagano.jp", "!pref.nagasaki.jp", "!pref.nara.jp", "!pref.niigata.jp", "!pref.oita.jp", "!pref.okayama.jp", "!pref.okinawa.jp", "!pref.osaka.jp", "!pref.saga.jp", "!pref.saitama.jp", "!pref.shiga.jp", "!pref.shimane.jp", "!pref.shizuoka.jp", "!pref.tochigi.jp", "!pref.tokushima.jp", "!pref.tottori.jp", "!pref.toyama.jp", "!pref.wakayama.jp", "!pref.yamagata.jp", "!pref.yamaguchi.jp", "!pref.yamanashi.jp", "!city.chiba.jp", "!city.fukuoka.jp", "!city.hiroshima.jp", "!city.kawasaki.jp", "!city.kitakyushu.jp", "!city.kobe.jp", "!city.kyoto.jp", "!city.nagoya.jp", "!city.niigata.jp", "!city.okayama.jp", "!city.osaka.jp", "!city.saitama.jp", "!city.sapporo.jp", "!city.sendai.jp", "!city.shizuoka.jp", "!city.yokohama.jp", "*.ke", "kg", "org.kg", "net.kg", "com.kg", "edu.kg", "gov.kg", "mil.kg", "*.kh", "ki", "edu.ki", "biz.ki", "net.ki", "org.ki", "gov.ki", "info.ki", "com.ki", "km", "org.km", "nom.km", "gov.km", "prd.km", "tm.km", "edu.km", "mil.km", "ass.km", "com.km", "coop.km", "asso.km", "presse.km", "medecin.km", "notaires.km", "pharmaciens.km", "veterinaire.km", "gouv.km", "kn", "net.kn", "org.kn", "edu.kn", "gov.kn", "kr", "ac.kr", "co.kr", "es.kr", "go.kr", "hs.kr", "kg.kr", "mil.kr", "ms.kr", "ne.kr", "or.kr", "pe.kr", "re.kr", "sc.kr", "busan.kr", "chungbuk.kr", "chungnam.kr", "daegu.kr", "daejeon.kr", "gangwon.kr", "gwangju.kr", "gyeongbuk.kr", "gyeonggi.kr", "gyeongnam.kr", "incheon.kr", "jeju.kr", "jeonbuk.kr", "jeonnam.kr", "seoul.kr", "ulsan.kr", "*.kw", "ky", "edu.ky", "gov.ky", "com.ky", "org.ky", "net.ky", "kz", "org.kz", "edu.kz", "net.kz", "gov.kz", "mil.kz", "com.kz", "la", "int.la", "net.la", "info.la", "edu.la", "gov.la", "per.la", "com.la", "org.la", "c.la", "com.lb", "edu.lb", "gov.lb", "net.lb", "org.lb", "lc", "com.lc", "net.lc", "co.lc", "org.lc", "edu.lc", "gov.lc", "li", "lk", "gov.lk", "sch.lk", "net.lk", "int.lk", "com.lk", "org.lk", "edu.lk", "ngo.lk", "soc.lk", "web.lk", "ltd.lk", "assn.lk", "grp.lk", "hotel.lk", "local", "com.lr", "edu.lr", "gov.lr", "org.lr", "net.lr", "ls", "co.ls", "org.ls", "lt", "gov.lt", "lu", "lv", "com.lv", "edu.lv", "gov.lv", "org.lv", "mil.lv", "id.lv", "net.lv", "asn.lv", "conf.lv", "ly", "com.ly", "net.ly", "gov.ly", "plc.ly", "edu.ly", "sch.ly", "med.ly", "org.ly", "id.ly", "ma", "co.ma", "net.ma", "gov.ma", "org.ma", "ac.ma", "press.ma", "mc", "tm.mc", "asso.mc", "md", "me", "co.me", "net.me", "org.me", "edu.me", "ac.me", "gov.me", "its.me", "priv.me", "mg", "org.mg", "nom.mg", "gov.mg", "prd.mg", "tm.mg", "edu.mg", "mil.mg", "com.mg", "mh", "mil", "mk", "com.mk", "org.mk", "net.mk", "edu.mk", "gov.mk", "inf.mk", "name.mk", "ml", "com.ml", "edu.ml", "gouv.ml", "gov.ml", "net.ml", "org.ml", "presse.ml", "*.mm", "mn", "gov.mn", "edu.mn", "org.mn", "mo", "com.mo", "net.mo", "org.mo", "edu.mo", "gov.mo", "mobi", "mp", "mq", "mr", "gov.mr", "ms", "*.mt", "mu", "com.mu", "net.mu", "org.mu", "gov.mu", "ac.mu", "co.mu", "or.mu", "museum", "academy.museum", "agriculture.museum", "air.museum", "airguard.museum", "alabama.museum", "alaska.museum", "amber.museum", "ambulance.museum", "american.museum", "americana.museum", "americanantiques.museum", "americanart.museum", "amsterdam.museum", "and.museum", "annefrank.museum", "anthro.museum", "anthropology.museum", "antiques.museum", "aquarium.museum", "arboretum.museum", "archaeological.museum", "archaeology.museum", "architecture.museum", "art.museum", "artanddesign.museum", "artcenter.museum", "artdeco.museum", "arteducation.museum", "artgallery.museum", "arts.museum", "artsandcrafts.museum", "asmatart.museum", "assassination.museum", "assisi.museum", "association.museum", "astronomy.museum", "atlanta.museum", "austin.museum", "australia.museum", "automotive.museum", "aviation.museum", "axis.museum", "badajoz.museum", "baghdad.museum", "bahn.museum", "bale.museum", "baltimore.museum", "barcelona.museum", "baseball.museum", "basel.museum", "baths.museum", "bauern.museum", "beauxarts.museum", "beeldengeluid.museum", "bellevue.museum", "bergbau.museum", "berkeley.museum", "berlin.museum", "bern.museum", "bible.museum", "bilbao.museum", "bill.museum", "birdart.museum", "birthplace.museum", "bonn.museum", "boston.museum", "botanical.museum", "botanicalgarden.museum", "botanicgarden.museum", "botany.museum", "brandywinevalley.museum", "brasil.museum", "bristol.museum", "british.museum", "britishcolumbia.museum", "broadcast.museum", "brunel.museum", "brussel.museum", "brussels.museum", "bruxelles.museum", "building.museum", "burghof.museum", "bus.museum", "bushey.museum", "cadaques.museum", "california.museum", "cambridge.museum", "can.museum", "canada.museum", "capebreton.museum", "carrier.museum", "cartoonart.museum", "casadelamoneda.museum", "castle.museum", "castres.museum", "celtic.museum", "center.museum", "chattanooga.museum", "cheltenham.museum", "chesapeakebay.museum", "chicago.museum", "children.museum", "childrens.museum", "childrensgarden.museum", "chiropractic.museum", "chocolate.museum", "christiansburg.museum", "cincinnati.museum", "cinema.museum", "circus.museum", "civilisation.museum", "civilization.museum", "civilwar.museum", "clinton.museum", "clock.museum", "coal.museum", "coastaldefence.museum", "cody.museum", "coldwar.museum", "collection.museum", "colonialwilliamsburg.museum", "coloradoplateau.museum", "columbia.museum", "columbus.museum", "communication.museum", "communications.museum", "community.museum", "computer.museum", "computerhistory.museum", "comunicações.museum", "contemporary.museum", "contemporaryart.museum", "convent.museum", "copenhagen.museum", "corporation.museum", "correios-e-telecomunicações.museum", "corvette.museum", "costume.museum", "countryestate.museum", "county.museum", "crafts.museum", "cranbrook.museum", "creation.museum", "cultural.museum", "culturalcenter.museum", "culture.museum", "cyber.museum", "cymru.museum", "dali.museum", "dallas.museum", "database.museum", "ddr.museum", "decorativearts.museum", "delaware.museum", "delmenhorst.museum", "denmark.museum", "depot.museum", "design.museum", "detroit.museum", "dinosaur.museum", "discovery.museum", "dolls.museum", "donostia.museum", "durham.museum", "eastafrica.museum", "eastcoast.museum", "education.museum", "educational.museum", "egyptian.museum", "eisenbahn.museum", "elburg.museum", "elvendrell.museum", "embroidery.museum", "encyclopedic.museum", "england.museum", "entomology.museum", "environment.museum", "environmentalconservation.museum", "epilepsy.museum", "essex.museum", "estate.museum", "ethnology.museum", "exeter.museum", "exhibition.museum", "family.museum", "farm.museum", "farmequipment.museum", "farmers.museum", "farmstead.museum", "field.museum", "figueres.museum", "filatelia.museum", "film.museum", "fineart.museum", "finearts.museum", "finland.museum", "flanders.museum", "florida.museum", "force.museum", "fortmissoula.museum", "fortworth.museum", "foundation.museum", "francaise.museum", "frankfurt.museum", "franziskaner.museum", "freemasonry.museum", "freiburg.museum", "fribourg.museum", "frog.museum", "fundacio.museum", "furniture.museum", "gallery.museum", "garden.museum", "gateway.museum", "geelvinck.museum", "gemological.museum", "geology.museum", "georgia.museum", "giessen.museum", "glas.museum", "glass.museum", "gorge.museum", "grandrapids.museum", "graz.museum", "guernsey.museum", "halloffame.museum", "hamburg.museum", "handson.museum", "harvestcelebration.museum", "hawaii.museum", "health.museum", "heimatunduhren.museum", "hellas.museum", "helsinki.museum", "hembygdsforbund.museum", "heritage.museum", "histoire.museum", "historical.museum", "historicalsociety.museum", "historichouses.museum", "historisch.museum", "historisches.museum", "history.museum", "historyofscience.museum", "horology.museum", "house.museum", "humanities.museum", "illustration.museum", "imageandsound.museum", "indian.museum", "indiana.museum", "indianapolis.museum", "indianmarket.museum", "intelligence.museum", "interactive.museum", "iraq.museum", "iron.museum", "isleofman.museum", "jamison.museum", "jefferson.museum", "jerusalem.museum", "jewelry.museum", "jewish.museum", "jewishart.museum", "jfk.museum", "journalism.museum", "judaica.museum", "judygarland.museum", "juedisches.museum", "juif.museum", "karate.museum", "karikatur.museum", "kids.museum", "koebenhavn.museum", "koeln.museum", "kunst.museum", "kunstsammlung.museum", "kunstunddesign.museum", "labor.museum", "labour.museum", "lajolla.museum", "lancashire.museum", "landes.museum", "lans.museum", "läns.museum", "larsson.museum", "lewismiller.museum", "lincoln.museum", "linz.museum", "living.museum", "livinghistory.museum", "localhistory.museum", "london.museum", "losangeles.museum", "louvre.museum", "loyalist.museum", "lucerne.museum", "luxembourg.museum", "luzern.museum", "mad.museum", "madrid.museum", "mallorca.museum", "manchester.museum", "mansion.museum", "mansions.museum", "manx.museum", "marburg.museum", "maritime.museum", "maritimo.museum", "maryland.museum", "marylhurst.museum", "media.museum", "medical.museum", "medizinhistorisches.museum", "meeres.museum", "memorial.museum", "mesaverde.museum", "michigan.museum", "midatlantic.museum", "military.museum", "mill.museum", "miners.museum", "mining.museum", "minnesota.museum", "missile.museum", "missoula.museum", "modern.museum", "moma.museum", "money.museum", "monmouth.museum", "monticello.museum", "montreal.museum", "moscow.museum", "motorcycle.museum", "muenchen.museum", "muenster.museum", "mulhouse.museum", "muncie.museum", "museet.museum", "museumcenter.museum", "museumvereniging.museum", "music.museum", "national.museum", "nationalfirearms.museum", "nationalheritage.museum", "nativeamerican.museum", "naturalhistory.museum", "naturalhistorymuseum.museum", "naturalsciences.museum", "nature.museum", "naturhistorisches.museum", "natuurwetenschappen.museum", "naumburg.museum", "naval.museum", "nebraska.museum", "neues.museum", "newhampshire.museum", "newjersey.museum", "newmexico.museum", "newport.museum", "newspaper.museum", "newyork.museum", "niepce.museum", "norfolk.museum", "north.museum", "nrw.museum", "nuernberg.museum", "nuremberg.museum", "nyc.museum", "nyny.museum", "oceanographic.museum", "oceanographique.museum", "omaha.museum", "online.museum", "ontario.museum", "openair.museum", "oregon.museum", "oregontrail.museum", "otago.museum", "oxford.museum", "pacific.museum", "paderborn.museum", "palace.museum", "paleo.museum", "palmsprings.museum", "panama.museum", "paris.museum", "pasadena.museum", "pharmacy.museum", "philadelphia.museum", "philadelphiaarea.museum", "philately.museum", "phoenix.museum", "photography.museum", "pilots.museum", "pittsburgh.museum", "planetarium.museum", "plantation.museum", "plants.museum", "plaza.museum", "portal.museum", "portland.museum", "portlligat.museum", "posts-and-telecommunications.museum", "preservation.museum", "presidio.museum", "press.museum", "project.museum", "public.museum", "pubol.museum", "quebec.museum", "railroad.museum", "railway.museum", "research.museum", "resistance.museum", "riodejaneiro.museum", "rochester.museum", "rockart.museum", "roma.museum", "russia.museum", "saintlouis.museum", "salem.museum", "salvadordali.museum", "salzburg.museum", "sandiego.museum", "sanfrancisco.museum", "santabarbara.museum", "santacruz.museum", "santafe.museum", "saskatchewan.museum", "satx.museum", "savannahga.museum", "schlesisches.museum", "schoenbrunn.museum", "schokoladen.museum", "school.museum", "schweiz.museum", "science.museum", "scienceandhistory.museum", "scienceandindustry.museum", "sciencecenter.museum", "sciencecenters.museum", "science-fiction.museum", "sciencehistory.museum", "sciences.museum", "sciencesnaturelles.museum", "scotland.museum", "seaport.museum", "settlement.museum", "settlers.museum", "shell.museum", "sherbrooke.museum", "sibenik.museum", "silk.museum", "ski.museum", "skole.museum", "society.museum", "sologne.museum", "soundandvision.museum", "southcarolina.museum", "southwest.museum", "space.museum", "spy.museum", "square.museum", "stadt.museum", "stalbans.museum", "starnberg.museum", "state.museum", "stateofdelaware.museum", "station.museum", "steam.museum", "steiermark.museum", "stjohn.museum", "stockholm.museum", "stpetersburg.museum", "stuttgart.museum", "suisse.museum", "surgeonshall.museum", "surrey.museum", "svizzera.museum", "sweden.museum", "sydney.museum", "tank.museum", "tcm.museum", "technology.museum", "telekommunikation.museum", "television.museum", "texas.museum", "textile.museum", "theater.museum", "time.museum", "timekeeping.museum", "topology.museum", "torino.museum", "touch.museum", "town.museum", "transport.museum", "tree.museum", "trolley.museum", "trust.museum", "trustee.museum", "uhren.museum", "ulm.museum", "undersea.museum", "university.museum", "usa.museum", "usantiques.museum", "usarts.museum", "uscountryestate.museum", "usculture.museum", "usdecorativearts.museum", "usgarden.museum", "ushistory.museum", "ushuaia.museum", "uslivinghistory.museum", "utah.museum", "uvic.museum", "valley.museum", "vantaa.museum", "versailles.museum", "viking.museum", "village.museum", "virginia.museum", "virtual.museum", "virtuel.museum", "vlaanderen.museum", "volkenkunde.museum", "wales.museum", "wallonie.museum", "war.museum", "washingtondc.museum", "watchandclock.museum", "watch-and-clock.museum", "western.museum", "westfalen.museum", "whaling.museum", "wildlife.museum", "williamsburg.museum", "windmill.museum", "workshop.museum", "york.museum", "yorkshire.museum", "yosemite.museum", "youth.museum", "zoological.museum", "zoology.museum", "ירושלים.museum", "иком.museum", "mv", "aero.mv", "biz.mv", "com.mv", "coop.mv", "edu.mv", "gov.mv", "info.mv", "int.mv", "mil.mv", "museum.mv", "name.mv", "net.mv", "org.mv", "pro.mv", "mw", "ac.mw", "biz.mw", "co.mw", "com.mw", "coop.mw", "edu.mw", "gov.mw", "int.mw", "museum.mw", "net.mw", "org.mw", "mx", "com.mx", "org.mx", "gob.mx", "edu.mx", "net.mx", "my", "com.my", "net.my", "org.my", "gov.my", "edu.my", "mil.my", "name.my", "*.mz", "na", "info.na", "pro.na", "name.na", "school.na", "or.na", "dr.na", "us.na", "mx.na", "ca.na", "in.na", "cc.na", "tv.na", "ws.na", "mobi.na", "co.na", "com.na", "org.na", "name", "nc", "asso.nc", "ne", "net", "gb.net", "se.net", "uk.net", "za.net", "nf", "com.nf", "net.nf", "per.nf", "rec.nf", "web.nf", "arts.nf", "firm.nf", "info.nf", "other.nf", "store.nf", "ac.ng", "com.ng", "edu.ng", "gov.ng", "net.ng", "org.ng", "*.ni", "nl", "no", "fhs.no", "vgs.no", "fylkesbibl.no", "folkebibl.no", "museum.no", "idrett.no", "priv.no", "mil.no", "stat.no", "dep.no", "kommune.no", "herad.no", "aa.no", "ah.no", "bu.no", "fm.no", "hl.no", "hm.no", "jan-mayen.no", "mr.no", "nl.no", "nt.no", "of.no", "ol.no", "oslo.no", "rl.no", "sf.no", "st.no", "svalbard.no", "tm.no", "tr.no", "va.no", "vf.no", "gs.aa.no", "gs.ah.no", "gs.bu.no", "gs.fm.no", "gs.hl.no", "gs.hm.no", "gs.jan-mayen.no", "gs.mr.no", "gs.nl.no", "gs.nt.no", "gs.of.no", "gs.ol.no", "gs.oslo.no", "gs.rl.no", "gs.sf.no", "gs.st.no", "gs.svalbard.no", "gs.tm.no", "gs.tr.no", "gs.va.no", "gs.vf.no", "akrehamn.no", "åkrehamn.no", "algard.no", "ålgård.no", "arna.no", "brumunddal.no", "bryne.no", "bronnoysund.no", "brønnøysund.no", "drobak.no", "drøbak.no", "egersund.no", "fetsund.no", "floro.no", "florø.no", "fredrikstad.no", "hokksund.no", "honefoss.no", "hønefoss.no", "jessheim.no", "jorpeland.no", "jørpeland.no", "kirkenes.no", "kopervik.no", "krokstadelva.no", "langevag.no", "langevåg.no", "leirvik.no", "mjondalen.no", "mjøndalen.no", "mo-i-rana.no", "mosjoen.no", "mosjøen.no", "nesoddtangen.no", "orkanger.no", "osoyro.no", "osøyro.no", "raholt.no", "råholt.no", "sandnessjoen.no", "sandnessjøen.no", "skedsmokorset.no", "slattum.no", "spjelkavik.no", "stathelle.no", "stavern.no", "stjordalshalsen.no", "stjørdalshalsen.no", "tananger.no", "tranby.no", "vossevangen.no", "afjord.no", "åfjord.no", "agdenes.no", "al.no", "ål.no", "alesund.no", "ålesund.no", "alstahaug.no", "alta.no", "áltá.no", "alaheadju.no", "álaheadju.no", "alvdal.no", "amli.no", "åmli.no", "amot.no", "åmot.no", "andebu.no", "andoy.no", "andøy.no", "andasuolo.no", "ardal.no", "årdal.no", "aremark.no", "arendal.no", "ås.no", "aseral.no", "åseral.no", "asker.no", "askim.no", "askvoll.no", "askoy.no", "askøy.no", "asnes.no", "åsnes.no", "audnedaln.no", "aukra.no", "aure.no", "aurland.no", "aurskog-holand.no", "aurskog-høland.no", "austevoll.no", "austrheim.no", "averoy.no", "averøy.no", "balestrand.no", "ballangen.no", "balat.no", "bálát.no", "balsfjord.no", "bahccavuotna.no", "báhccavuotna.no", "bamble.no", "bardu.no", "beardu.no", "beiarn.no", "bajddar.no", "bájddar.no", "baidar.no", "báidár.no", "berg.no", "bergen.no", "berlevag.no", "berlevåg.no", "bearalvahki.no", "bearalváhki.no", "bindal.no", "birkenes.no", "bjarkoy.no", "bjarkøy.no", "bjerkreim.no", "bjugn.no", "bodo.no", "bodø.no", "badaddja.no", "bådåddjå.no", "budejju.no", "bokn.no", "bremanger.no", "bronnoy.no", "brønnøy.no", "bygland.no", "bykle.no", "barum.no", "bærum.no", "bo.telemark.no", "bø.telemark.no", "bo.nordland.no", "bø.nordland.no", "bievat.no", "bievát.no", "bomlo.no", "bømlo.no", "batsfjord.no", "båtsfjord.no", "bahcavuotna.no", "báhcavuotna.no", "dovre.no", "drammen.no", "drangedal.no", "dyroy.no", "dyrøy.no", "donna.no", "dønna.no", "eid.no", "eidfjord.no", "eidsberg.no", "eidskog.no", "eidsvoll.no", "eigersund.no", "elverum.no", "enebakk.no", "engerdal.no", "etne.no", "etnedal.no", "evenes.no", "evenassi.no", "evenášši.no", "evje-og-hornnes.no", "farsund.no", "fauske.no", "fuossko.no", "fuoisku.no", "fedje.no", "fet.no", "finnoy.no", "finnøy.no", "fitjar.no", "fjaler.no", "fjell.no", "flakstad.no", "flatanger.no", "flekkefjord.no", "flesberg.no", "flora.no", "fla.no", "flå.no", "folldal.no", "forsand.no", "fosnes.no", "frei.no", "frogn.no", "froland.no", "frosta.no", "frana.no", "fræna.no", "froya.no", "frøya.no", "fusa.no", "fyresdal.no", "forde.no", "førde.no", "gamvik.no", "gangaviika.no", "gáŋgaviika.no", "gaular.no", "gausdal.no", "gildeskal.no", "gildeskål.no", "giske.no", "gjemnes.no", "gjerdrum.no", "gjerstad.no", "gjesdal.no", "gjovik.no", "gjøvik.no", "gloppen.no", "gol.no", "gran.no", "grane.no", "granvin.no", "gratangen.no", "grimstad.no", "grong.no", "kraanghke.no", "kråanghke.no", "grue.no", "gulen.no", "hadsel.no", "halden.no", "halsa.no", "hamar.no", "hamaroy.no", "habmer.no", "hábmer.no", "hapmir.no", "hápmir.no", "hammerfest.no", "hammarfeasta.no", "hámmárfeasta.no", "haram.no", "hareid.no", "harstad.no", "hasvik.no", "aknoluokta.no", "ákŋoluokta.no", "hattfjelldal.no", "aarborte.no", "haugesund.no", "hemne.no", "hemnes.no", "hemsedal.no", "heroy.more-og-romsdal.no", "herøy.møre-og-romsdal.no", "heroy.nordland.no", "herøy.nordland.no", "hitra.no", "hjartdal.no", "hjelmeland.no", "hobol.no", "hobøl.no", "hof.no", "hol.no", "hole.no", "holmestrand.no", "holtalen.no", "holtålen.no", "hornindal.no", "horten.no", "hurdal.no", "hurum.no", "hvaler.no", "hyllestad.no", "hagebostad.no", "hægebostad.no", "hoyanger.no", "høyanger.no", "hoylandet.no", "høylandet.no", "ha.no", "hå.no", "ibestad.no", "inderoy.no", "inderøy.no", "iveland.no", "jevnaker.no", "jondal.no", "jolster.no", "jølster.no", "karasjok.no", "karasjohka.no", "kárášjohka.no", "karlsoy.no", "galsa.no", "gálsá.no", "karmoy.no", "karmøy.no", "kautokeino.no", "guovdageaidnu.no", "klepp.no", "klabu.no", "klæbu.no", "kongsberg.no", "kongsvinger.no", "kragero.no", "kragerø.no", "kristiansand.no", "kristiansund.no", "krodsherad.no", "krødsherad.no", "kvalsund.no", "rahkkeravju.no", "ráhkkerávju.no", "kvam.no", "kvinesdal.no", "kvinnherad.no", "kviteseid.no", "kvitsoy.no", "kvitsøy.no", "kvafjord.no", "kvæfjord.no", "giehtavuoatna.no", "kvanangen.no", "kvænangen.no", "navuotna.no", "návuotna.no", "kafjord.no", "kåfjord.no", "gaivuotna.no", "gáivuotna.no", "larvik.no", "lavangen.no", "lavagis.no", "loabat.no", "loabát.no", "lebesby.no", "davvesiida.no", "leikanger.no", "leirfjord.no", "leka.no", "leksvik.no", "lenvik.no", "leangaviika.no", "leaŋgaviika.no", "lesja.no", "levanger.no", "lier.no", "lierne.no", "lillehammer.no", "lillesand.no", "lindesnes.no", "lindas.no", "lindås.no", "lom.no", "loppa.no", "lahppi.no", "láhppi.no", "lund.no", "lunner.no", "luroy.no", "lurøy.no", "luster.no", "lyngdal.no", "lyngen.no", "ivgu.no", "lardal.no", "lerdal.no", "lærdal.no", "lodingen.no", "lødingen.no", "lorenskog.no", "lørenskog.no", "loten.no", "løten.no", "malvik.no", "masoy.no", "måsøy.no", "muosat.no", "muosát.no", "mandal.no", "marker.no", "marnardal.no", "masfjorden.no", "meland.no", "meldal.no", "melhus.no", "meloy.no", "meløy.no", "meraker.no", "meråker.no", "moareke.no", "moåreke.no", "midsund.no", "midtre-gauldal.no", "modalen.no", "modum.no", "molde.no", "moskenes.no", "moss.no", "mosvik.no", "malselv.no", "målselv.no", "malatvuopmi.no", "málatvuopmi.no", "namdalseid.no", "aejrie.no", "namsos.no", "namsskogan.no", "naamesjevuemie.no", "nååmesjevuemie.no", "laakesvuemie.no", "nannestad.no", "narvik.no", "narviika.no", "naustdal.no", "nedre-eiker.no", "nes.akershus.no", "nes.buskerud.no", "nesna.no", "nesodden.no", "nesseby.no", "unjarga.no", "unjárga.no", "nesset.no", "nissedal.no", "nittedal.no", "nord-aurdal.no", "nord-fron.no", "nord-odal.no", "norddal.no", "nordkapp.no", "davvenjarga.no", "davvenjárga.no", "nordre-land.no", "nordreisa.no", "raisa.no", "ráisa.no", "nore-og-uvdal.no", "notodden.no", "naroy.no", "nærøy.no", "notteroy.no", "nøtterøy.no", "odda.no", "oksnes.no", "øksnes.no", "oppdal.no", "oppegard.no", "oppegård.no", "orkdal.no", "orland.no", "ørland.no", "orskog.no", "ørskog.no", "orsta.no", "ørsta.no", "os.hedmark.no", "os.hordaland.no", "osen.no", "osteroy.no", "osterøy.no", "ostre-toten.no", "østre-toten.no", "overhalla.no", "ovre-eiker.no", "øvre-eiker.no", "oyer.no", "øyer.no", "oygarden.no", "øygarden.no", "oystre-slidre.no", "øystre-slidre.no", "porsanger.no", "porsangu.no", "porsáŋgu.no", "porsgrunn.no", "radoy.no", "radøy.no", "rakkestad.no", "rana.no", "ruovat.no", "randaberg.no", "rauma.no", "rendalen.no", "rennebu.no", "rennesoy.no", "rennesøy.no", "rindal.no", "ringebu.no", "ringerike.no", "ringsaker.no", "rissa.no", "risor.no", "risør.no", "roan.no", "rollag.no", "rygge.no", "ralingen.no", "rælingen.no", "rodoy.no", "rødøy.no", "romskog.no", "rømskog.no", "roros.no", "røros.no", "rost.no", "røst.no", "royken.no", "røyken.no", "royrvik.no", "røyrvik.no", "rade.no", "råde.no", "salangen.no", "siellak.no", "saltdal.no", "salat.no", "sálát.no", "sálat.no", "samnanger.no", "sande.more-og-romsdal.no", "sande.møre-og-romsdal.no", "sande.vestfold.no", "sandefjord.no", "sandnes.no", "sandoy.no", "sandøy.no", "sarpsborg.no", "sauda.no", "sauherad.no", "sel.no", "selbu.no", "selje.no", "seljord.no", "sigdal.no", "siljan.no", "sirdal.no", "skaun.no", "skedsmo.no", "ski.no", "skien.no", "skiptvet.no", "skjervoy.no", "skjervøy.no", "skierva.no", "skiervá.no", "skjak.no", "skjåk.no", "skodje.no", "skanland.no", "skånland.no", "skanit.no", "skánit.no", "smola.no", "smøla.no", "snillfjord.no", "snasa.no", "snåsa.no", "snoasa.no", "snaase.no", "snåase.no", "sogndal.no", "sokndal.no", "sola.no", "solund.no", "songdalen.no", "sortland.no", "spydeberg.no", "stange.no", "stavanger.no", "steigen.no", "steinkjer.no", "stjordal.no", "stjørdal.no", "stokke.no", "stor-elvdal.no", "stord.no", "stordal.no", "storfjord.no", "omasvuotna.no", "strand.no", "stranda.no", "stryn.no", "sula.no", "suldal.no", "sund.no", "sunndal.no", "surnadal.no", "sveio.no", "svelvik.no", "sykkylven.no", "sogne.no", "søgne.no", "somna.no", "sømna.no", "sondre-land.no", "søndre-land.no", "sor-aurdal.no", "sør-aurdal.no", "sor-fron.no", "sør-fron.no", "sor-odal.no", "sør-odal.no", "sor-varanger.no", "sør-varanger.no", "matta-varjjat.no", "mátta-várjjat.no", "sorfold.no", "sørfold.no", "sorreisa.no", "sørreisa.no", "sorum.no", "sørum.no", "tana.no", "deatnu.no", "time.no", "tingvoll.no", "tinn.no", "tjeldsund.no", "dielddanuorri.no", "tjome.no", "tjøme.no", "tokke.no", "tolga.no", "torsken.no", "tranoy.no", "tranøy.no", "tromso.no", "tromsø.no", "tromsa.no", "romsa.no", "trondheim.no", "troandin.no", "trysil.no", "trana.no", "træna.no", "trogstad.no", "trøgstad.no", "tvedestrand.no", "tydal.no", "tynset.no", "tysfjord.no", "divtasvuodna.no", "divttasvuotna.no", "tysnes.no", "tysvar.no", "tysvær.no", "tonsberg.no", "tønsberg.no", "ullensaker.no", "ullensvang.no", "ulvik.no", "utsira.no", "vadso.no", "vadsø.no", "cahcesuolo.no", "čáhcesuolo.no", "vaksdal.no", "valle.no", "vang.no", "vanylven.no", "vardo.no", "vardø.no", "varggat.no", "várggát.no", "vefsn.no", "vaapste.no", "vega.no", "vegarshei.no", "vegårshei.no", "vennesla.no", "verdal.no", "verran.no", "vestby.no", "vestnes.no", "vestre-slidre.no", "vestre-toten.no", "vestvagoy.no", "vestvågøy.no", "vevelstad.no", "vik.no", "vikna.no", "vindafjord.no", "volda.no", "voss.no", "varoy.no", "værøy.no", "vagan.no", "vågan.no", "voagat.no", "vagsoy.no", "vågsøy.no", "vaga.no", "vågå.no", "valer.ostfold.no", "våler.østfold.no", "valer.hedmark.no", "våler.hedmark.no", "*.np", "nr", "biz.nr", "info.nr", "gov.nr", "edu.nr", "org.nr", "net.nr", "com.nr", "nu", "*.nz", "*.om", "org", "ae.org", "za.org", "pa", "ac.pa", "gob.pa", "com.pa", "org.pa", "sld.pa", "edu.pa", "net.pa", "ing.pa", "abo.pa", "med.pa", "nom.pa", "pe", "edu.pe", "gob.pe", "nom.pe", "mil.pe", "org.pe", "com.pe", "net.pe", "pf", "com.pf", "org.pf", "edu.pf", "*.pg", "ph", "com.ph", "net.ph", "org.ph", "gov.ph", "edu.ph", "ngo.ph", "mil.ph", "i.ph", "pk", "com.pk", "net.pk", "edu.pk", "org.pk", "fam.pk", "biz.pk", "web.pk", "gov.pk", "gob.pk", "gok.pk", "gon.pk", "gop.pk", "gos.pk", "info.pk", "pl", "aid.pl", "agro.pl", "atm.pl", "auto.pl", "biz.pl", "com.pl", "edu.pl", "gmina.pl", "gsm.pl", "info.pl", "mail.pl", "miasta.pl", "media.pl", "mil.pl", "net.pl", "nieruchomosci.pl", "nom.pl", "org.pl", "pc.pl", "powiat.pl", "priv.pl", "realestate.pl", "rel.pl", "sex.pl", "shop.pl", "sklep.pl", "sos.pl", "szkola.pl", "targi.pl", "tm.pl", "tourism.pl", "travel.pl", "turystyka.pl", "6bone.pl", "art.pl", "mbone.pl", "gov.pl", "uw.gov.pl", "um.gov.pl", "ug.gov.pl", "upow.gov.pl", "starostwo.gov.pl", "so.gov.pl", "sr.gov.pl", "po.gov.pl", "pa.gov.pl", "ngo.pl", "irc.pl", "usenet.pl", "augustow.pl", "babia-gora.pl", "bedzin.pl", "beskidy.pl", "bialowieza.pl", "bialystok.pl", "bielawa.pl", "bieszczady.pl", "boleslawiec.pl", "bydgoszcz.pl", "bytom.pl", "cieszyn.pl", "czeladz.pl", "czest.pl", "dlugoleka.pl", "elblag.pl", "elk.pl", "glogow.pl", "gniezno.pl", "gorlice.pl", "grajewo.pl", "ilawa.pl", "jaworzno.pl", "jelenia-gora.pl", "jgora.pl", "kalisz.pl", "kazimierz-dolny.pl", "karpacz.pl", "kartuzy.pl", "kaszuby.pl", "katowice.pl", "kepno.pl", "ketrzyn.pl", "klodzko.pl", "kobierzyce.pl", "kolobrzeg.pl", "konin.pl", "konskowola.pl", "kutno.pl", "lapy.pl", "lebork.pl", "legnica.pl", "lezajsk.pl", "limanowa.pl", "lomza.pl", "lowicz.pl", "lubin.pl", "lukow.pl", "malbork.pl", "malopolska.pl", "mazowsze.pl", "mazury.pl", "mielec.pl", "mielno.pl", "mragowo.pl", "naklo.pl", "nowaruda.pl", "nysa.pl", "olawa.pl", "olecko.pl", "olkusz.pl", "olsztyn.pl", "opoczno.pl", "opole.pl", "ostroda.pl", "ostroleka.pl", "ostrowiec.pl", "ostrowwlkp.pl", "pila.pl", "pisz.pl", "podhale.pl", "podlasie.pl", "polkowice.pl", "pomorze.pl", "pomorskie.pl", "prochowice.pl", "pruszkow.pl", "przeworsk.pl", "pulawy.pl", "radom.pl", "rawa-maz.pl", "rybnik.pl", "rzeszow.pl", "sanok.pl", "sejny.pl", "siedlce.pl", "slask.pl", "slupsk.pl", "sosnowiec.pl", "stalowa-wola.pl", "skoczow.pl", "starachowice.pl", "stargard.pl", "suwalki.pl", "swidnica.pl", "swiebodzin.pl", "swinoujscie.pl", "szczecin.pl", "szczytno.pl", "tarnobrzeg.pl", "tgory.pl", "turek.pl", "tychy.pl", "ustka.pl", "walbrzych.pl", "warmia.pl", "warszawa.pl", "waw.pl", "wegrow.pl", "wielun.pl", "wlocl.pl", "wloclawek.pl", "wodzislaw.pl", "wolomin.pl", "wroclaw.pl", "zachpomor.pl", "zagan.pl", "zarow.pl", "zgora.pl", "zgorzelec.pl", "gda.pl", "gdansk.pl", "gdynia.pl", "med.pl", "sopot.pl", "gliwice.pl", "krakow.pl", "poznan.pl", "wroc.pl", "zakopane.pl", "pn", "gov.pn", "co.pn", "org.pn", "edu.pn", "net.pn", "pr", "com.pr", "net.pr", "org.pr", "gov.pr", "edu.pr", "isla.pr", "pro.pr", "biz.pr", "info.pr", "name.pr", "est.pr", "prof.pr", "ac.pr", "pro", "aca.pro", "bar.pro", "cpa.pro", "jur.pro", "law.pro", "med.pro", "eng.pro", "ps", "edu.ps", "gov.ps", "sec.ps", "plo.ps", "com.ps", "org.ps", "net.ps", "pt", "net.pt", "gov.pt", "org.pt", "edu.pt", "int.pt", "publ.pt", "com.pt", "nome.pt", "pw", "co.pw", "ne.pw", "or.pw", "ed.pw", "go.pw", "belau.pw", "*.py", "*.qa", "re", "com.re", "asso.re", "nom.re", "ro", "com.ro", "org.ro", "tm.ro", "nt.ro", "nom.ro", "info.ro", "rec.ro", "arts.ro", "firm.ro", "store.ro", "www.ro", "rs", "co.rs", "org.rs", "edu.rs", "ac.rs", "gov.rs", "in.rs", "ru", "ac.ru", "com.ru", "edu.ru", "int.ru", "net.ru", "org.ru", "pp.ru", "adygeya.ru", "altai.ru", "amur.ru", "arkhangelsk.ru", "astrakhan.ru", "bashkiria.ru", "belgorod.ru", "bir.ru", "bryansk.ru", "buryatia.ru", "cbg.ru", "chel.ru", "chelyabinsk.ru", "chita.ru", "chukotka.ru", "chuvashia.ru", "dagestan.ru", "dudinka.ru", "e-burg.ru", "grozny.ru", "irkutsk.ru", "ivanovo.ru", "izhevsk.ru", "jar.ru", "joshkar-ola.ru", "kalmykia.ru", "kaluga.ru", "kamchatka.ru", "karelia.ru", "kazan.ru", "kchr.ru", "kemerovo.ru", "khabarovsk.ru", "khakassia.ru", "khv.ru", "kirov.ru", "koenig.ru", "komi.ru", "kostroma.ru", "krasnoyarsk.ru", "kuban.ru", "kurgan.ru", "kursk.ru", "lipetsk.ru", "magadan.ru", "mari.ru", "mari-el.ru", "marine.ru", "mordovia.ru", "mosreg.ru", "msk.ru", "murmansk.ru", "nalchik.ru", "nnov.ru", "nov.ru", "novosibirsk.ru", "nsk.ru", "omsk.ru", "orenburg.ru", "oryol.ru", "palana.ru", "penza.ru", "perm.ru", "pskov.ru", "ptz.ru", "rnd.ru", "ryazan.ru", "sakhalin.ru", "samara.ru", "saratov.ru", "simbirsk.ru", "smolensk.ru", "spb.ru", "stavropol.ru", "stv.ru", "surgut.ru", "tambov.ru", "tatarstan.ru", "tom.ru", "tomsk.ru", "tsaritsyn.ru", "tsk.ru", "tula.ru", "tuva.ru", "tver.ru", "tyumen.ru", "udm.ru", "udmurtia.ru", "ulan-ude.ru", "vladikavkaz.ru", "vladimir.ru", "vladivostok.ru", "volgograd.ru", "vologda.ru", "voronezh.ru", "vrn.ru", "vyatka.ru", "yakutia.ru", "yamal.ru", "yaroslavl.ru", "yekaterinburg.ru", "yuzhno-sakhalinsk.ru", "amursk.ru", "baikal.ru", "cmw.ru", "fareast.ru", "jamal.ru", "kms.ru", "k-uralsk.ru", "kustanai.ru", "kuzbass.ru", "magnitka.ru", "mytis.ru", "nakhodka.ru", "nkz.ru", "norilsk.ru", "oskol.ru", "pyatigorsk.ru", "rubtsovsk.ru", "snz.ru", "syzran.ru", "vdonsk.ru", "zgrad.ru", "gov.ru", "mil.ru", "test.ru", "rw", "gov.rw", "net.rw", "edu.rw", "ac.rw", "com.rw", "co.rw", "int.rw", "mil.rw", "gouv.rw", "com.sa", "net.sa", "org.sa", "gov.sa", "med.sa", "pub.sa", "edu.sa", "sch.sa", "sb", "com.sb", "edu.sb", "gov.sb", "net.sb", "org.sb", "sc", "com.sc", "gov.sc", "net.sc", "org.sc", "edu.sc", "sd", "com.sd", "net.sd", "org.sd", "edu.sd", "med.sd", "gov.sd", "info.sd", "se", "a.se", "ac.se", "b.se", "bd.se", "brand.se", "c.se", "d.se", "e.se", "f.se", "fh.se", "fhsk.se", "fhv.se", "g.se", "h.se", "i.se", "k.se", "komforb.se", "kommunalforbund.se", "komvux.se", "l.se", "lanbib.se", "m.se", "n.se", "naturbruksgymn.se", "o.se", "org.se", "p.se", "parti.se", "pp.se", "press.se", "r.se", "s.se", "sshn.se", "t.se", "tm.se", "u.se", "w.se", "x.se", "y.se", "z.se", "sg", "com.sg", "net.sg", "org.sg", "gov.sg", "edu.sg", "per.sg", "sh", "si", "sk", "sl", "com.sl", "net.sl", "edu.sl", "gov.sl", "org.sl", "sm", "sn", "art.sn", "com.sn", "edu.sn", "gouv.sn", "org.sn", "perso.sn", "univ.sn", "sr", "st", "co.st", "com.st", "consulado.st", "edu.st", "embaixada.st", "gov.st", "mil.st", "net.st", "org.st", "principe.st", "saotome.st", "store.st", "su", "*.sv", "sy", "edu.sy", "gov.sy", "net.sy", "mil.sy", "com.sy", "org.sy", "sz", "co.sz", "ac.sz", "org.sz", "tc", "td", "tel", "tf", "tg", "th", "ac.th", "co.th", "go.th", "in.th", "mi.th", "net.th", "or.th", "tj", "ac.tj", "biz.tj", "co.tj", "com.tj", "edu.tj", "go.tj", "gov.tj", "int.tj", "mil.tj", "name.tj", "net.tj", "nic.tj", "org.tj", "test.tj", "web.tj", "tk", "tl", "gov.tl", "tm", "tn", "com.tn", "ens.tn", "fin.tn", "gov.tn", "ind.tn", "intl.tn", "nat.tn", "net.tn", "org.tn", "info.tn", "perso.tn", "tourism.tn", "edunet.tn", "rnrt.tn", "rns.tn", "rnu.tn", "mincom.tn", "agrinet.tn", "defense.tn", "turen.tn", "to", "com.to", "gov.to", "net.to", "org.to", "edu.to", "mil.to", "*.tr", "travel", "tt", "co.tt", "com.tt", "org.tt", "net.tt", "biz.tt", "info.tt", "pro.tt", "int.tt", "coop.tt", "jobs.tt", "mobi.tt", "travel.tt", "museum.tt", "aero.tt", "name.tt", "gov.tt", "edu.tt", "tv", "com.tv", "net.tv", "org.tv", "gov.tv", "tw", "edu.tw", "gov.tw", "mil.tw", "com.tw", "net.tw", "org.tw", "idv.tw", "game.tw", "ebiz.tw", "club.tw", "網路.tw", "組織.tw", "商業.tw", "ac.tz", "co.tz", "go.tz", "ne.tz", "or.tz", "ua", "com.ua", "edu.ua", "gov.ua", "in.ua", "net.ua", "org.ua", "cherkassy.ua", "chernigov.ua", "chernovtsy.ua", "ck.ua", "cn.ua", "crimea.ua", "cv.ua", "dn.ua", "dnepropetrovsk.ua", "donetsk.ua", "dp.ua", "if.ua", "ivano-frankivsk.ua", "kh.ua", "kharkov.ua", "kherson.ua", "khmelnitskiy.ua", "kiev.ua", "kirovograd.ua", "km.ua", "kr.ua", "ks.ua", "kv.ua", "lg.ua", "lugansk.ua", "lutsk.ua", "lviv.ua", "mk.ua", "nikolaev.ua", "od.ua", "odessa.ua", "pl.ua", "poltava.ua", "rovno.ua", "rv.ua", "sebastopol.ua", "sumy.ua", "te.ua", "ternopil.ua", "uzhgorod.ua", "vinnica.ua", "vn.ua", "zaporizhzhe.ua", "zp.ua", "zhitomir.ua", "zt.ua", "ug", "co.ug", "ac.ug", "sc.ug", "go.ug", "ne.ug", "or.ug", "*.uk", "*.sch.uk", "!bl.uk", "!british-library.uk", "!icnet.uk", "!jet.uk", "!nel.uk", "!nhs.uk", "!nls.uk", "!national-library-scotland.uk", "!parliament.uk", "us", "dni.us", "fed.us", "isa.us", "kids.us", "nsn.us", "ak.us", "al.us", "ar.us", "as.us", "az.us", "ca.us", "co.us", "ct.us", "dc.us", "de.us", "fl.us", "ga.us", "gu.us", "hi.us", "ia.us", "id.us", "il.us", "in.us", "ks.us", "ky.us", "la.us", "ma.us", "md.us", "me.us", "mi.us", "mn.us", "mo.us", "ms.us", "mt.us", "nc.us", "nd.us", "ne.us", "nh.us", "nj.us", "nm.us", "nv.us", "ny.us", "oh.us", "ok.us", "or.us", "pa.us", "pr.us", "ri.us", "sc.us", "sd.us", "tn.us", "tx.us", "ut.us", "vi.us", "vt.us", "va.us", "wa.us", "wi.us", "wv.us", "wy.us", "*.uy", "uz", "com.uz", "co.uz", "va", "vc", "com.vc", "net.vc", "org.vc", "gov.vc", "mil.vc", "edu.vc", "*.ve", "vg", "vi", "co.vi", "com.vi", "k12.vi", "net.vi", "org.vi", "vn", "com.vn", "net.vn", "org.vn", "edu.vn", "gov.vn", "int.vn", "ac.vn", "biz.vn", "info.vn", "name.vn", "pro.vn", "health.vn", "vu", "ws", "com.ws", "net.ws", "org.ws", "gov.ws", "edu.ws", "*.ye", "*.yu", "*.za", "*.zm", "*.zw",)) def get_domain(fqdn): domain_elements = fqdn.split('.') for c in range(-len(domain_elements), 0): potential_domain = ".".join(domain_elements[c:]) potential_wildcard_domain = ".".join(["*"]+domain_elements[c:][1:]) potential_exception_domain = "!" + potential_domain if (potential_exception_domain in tld): return ".".join(domain_elements[c:]) if potential_domain in tld or potential_wildcard_domain in tld: return ".".join(domain_elements[c-1:]) # couldn't find any matching TLD, maybe it's an internal domain ? if len(domain_elements) > 2: return ".".join(domain_elements[1:]) return fqdn pylogsparser-0.4/logsparser/extras/__init__.py0000644000175000017500000000162411715703401020007 0ustar fbofbo# -*- coding: utf-8 -*- # -*- python -*- # pylogsparser - Logs parsers python library # # Copyright (C) 2011 Wallix Inc. # # This library is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by the # Free Software Foundation; either version 2.1 of the License, or (at your # option) any later version. # # This library is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more # details. # # You should have received a copy of the GNU Lesser General Public License # along with this library; if not, write to the Free Software Foundation, Inc., # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # from domain_parser import get_domain from robots import robot_regex pylogsparser-0.4/logsparser/extras/robots.py0000644000175000017500000000443211715703401017560 0ustar fbofbo# -*- coding: utf-8 -*- # -*- python -*- # pylogsparser - Logs parsers python library # # Copyright (C) 2011 Wallix Inc. # # This library is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by the # Free Software Foundation; either version 2.1 of the License, or (at your # option) any later version. # # This library is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more # details. # # You should have received a copy of the GNU Lesser General Public License # along with this library; if not, write to the Free Software Foundation, Inc., # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # """In this module we define a regular expression used to fetch the most common robots.""" import re # taken from genrobotlist.pl in the awstats project : http://awstats.cvs.sourceforge.net robots = [ 'antibot', 'appie', 'architext', 'bingbot', 'bjaaland', 'digout4u', 'echo', 'fast-webcrawler', 'ferret', 'googlebot', 'gulliver', 'harvest', 'htdig', 'ia_archiver', 'askjeeves', 'jennybot', 'linkwalker', 'lycos', 'mercator', 'moget', 'muscatferret', 'myweb', 'netcraft', 'nomad', 'petersnews', 'scooter', 'slurp', 'unlost_web_crawler', 'voila', 'voyager', 'webbase', 'weblayers', 'wisenutbot', 'aport', 'awbot', 'baiduspider', 'bobby', 'boris', 'bumblebee', 'cscrawler', 'daviesbot', 'exactseek', 'ezresult', 'gigabot', 'gnodspider', 'grub', 'henrythemiragorobot', 'holmes', 'internetseer', 'justview', 'linkbot', 'metager-linkchecker', 'linkchecker', 'microsoft_url_control', 'msiecrawler', 'nagios', 'perman', 'pompos', 'rambler', 'redalert', 'shoutcast', 'slysearch', 'surveybot', 'turnitinbot', 'turtlescanner', 'turtle', 'ultraseek', 'webclipping.com', 'webcompass', 'yahoo-verticalcrawler', 'yandex', 'zealbot', 'zyborg', ] robot_regex = re.compile("|".join(robots), re.IGNORECASE) pylogsparser-0.4/logsparser/normalizer.py0000644000175000017500000010656211715703401017133 0ustar fbofbo# -*- python -*- # pylogsparser - Logs parsers python library # # Copyright (C) 2011 Wallix Inc. # # This library is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by the # Free Software Foundation; either version 2.1 of the License, or (at your # option) any later version. # # This library is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more # details. # # You should have received a copy of the GNU Lesser General Public License # along with this library; if not, write to the Free Software Foundation, Inc., # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # """ Here we have everything needed to parse and use XML definition files. The only class one should ever use here is L{Normalizer}. The rest is used during the parsing of the definition files that is taken care of by the Normalizer class. """ import re import csv import warnings import math from lxml.etree import parse, tostring from datetime import datetime # pyflakes:ignore import urlparse # pyflakes:ignore import logsparser.extras as extras # pyflakes:ignore try: import GeoIP #pyflakes:ignore country_code_by_address = GeoIP.new(GeoIP.GEOIP_MEMORY_CACHE).country_code_by_addr except ImportError, e: country_code_by_address =lambda x: None # the following symbols and modules are allowed for use in callbacks. SAFE_SYMBOLS = ["list", "dict", "tuple", "set", "long", "float", "object", "bool", "callable", "True", "False", "dir", "frozenset", "getattr", "hasattr", "abs", "cmp", "complex", "divmod", "id", "pow", "round", "slice", "vars", "hash", "hex", "int", "isinstance", "issubclass", "len", "map", "filter", "max", "min", "oct", "chr", "ord", "range", "reduce", "repr", "str", "unicode", "basestring", "type", "zip", "xrange", "None", "Exception", "re", "datetime", "math", "urlparse", "country_code_by_address", "extras"] class Tag(object): """A tag as defined in a pattern.""" def __init__(self, name, tagtype, substitute, description = {}, callbacks = []): """@param name: the tag's name @param tagtype: the tag's type name @param substitute: the string chain representing the tag in a log pattern @param description = a dictionary holding multilingual descriptions of the tag @param callbacks: a list of eventual callbacks to fire once the tag value has been extracted""" self.name = name self.tagtype = tagtype self.substitute = substitute self.description = description self.callbacks = callbacks def get_description(self, language = 'en'): """@Return : The tag description""" return self.description.get(language, 'N/A') class TagType(object): """A tag type. This defines how to match a given tag.""" def __init__(self, name, ttype, regexp, description = {}, flags = re.UNICODE | re.IGNORECASE): """@param name: the tag type's name @param ttype: the expected type of the value fetched by the associated regular expression @param regexp: the regular expression (as text, not compiled) associated to this type @param description: a dictionary holding multilingual descriptions of the tag type @param flags: flags by which to compile the regular expression""" self.name = name self.ttype = ttype self.regexp = regexp self.description = description try: self.compiled_regexp = re.compile(regexp, flags) except: raise ValueError, "Invalid regular expression %s" % regexp # import the common tag types def get_generic_tagTypes(path = 'normalizers/common_tagTypes.xml'): """Imports the common tag types. @return: a dictionary of tag types.""" generic = {} try: tagTypes = parse(open(path, 'r')).getroot() for tagType in tagTypes: tt_name = tagType.get('name') tt_type = tagType.get('ttype') or 'basestring' tt_desc = {} for child in tagType: if child.tag == 'description': for desc in child: lang = desc.get('language') or 'en' tt_desc[lang] = child.text elif child.tag == 'regexp': tt_regexp = child.text generic[tt_name] = TagType(tt_name, tt_type, tt_regexp, tt_desc) return generic except StandardError, err: warnings.warn("Could not load generic tags definition file : %s \ - generic tags will not be available." % err) return {} # import the common callbacks def get_generic_callBacks(path = 'normalizers/common_callBacks.xml'): """Imports the common callbacks. @return a dictionnary of callbacks.""" generic = {} try: callBacks = parse(open(path, 'r')).getroot() for callBack in callBacks: cb_name = callBack.get('name') # cb_desc = {} for child in callBack: if child.tag == 'code': cb_code = child.text # descriptions are not used yet but implemented in xml and dtd files for later use # elif child.tag == 'description': # for desc in child: # lang = desc.get('language') # cb_desc[lang] = desc.text generic[cb_name] = CallbackFunction(cb_code, cb_name) return generic except StandardError, err: warnings.warn("Could not load generic callbacks definition file : %s \ - generic callbacks will not be available." % err) return {} class PatternExample(object): """Represents an log sample matching a given pattern. expected_tags is a dictionary of tag names -> values that should be obtained after the normalization of this sample.""" def __init__(self, raw_line, expected_tags = {}, description = {}): self.raw_line = raw_line self.expected_tags = expected_tags self.description = description def get_description(self, language = 'en'): """@return : An example description""" return { 'sample' : self.raw_line, 'normalization' : self.expected_tags } class Pattern(object): """A pattern, as defined in a normalizer configuration file.""" def __init__(self, name, pattern, tags = {}, description = '', commonTags = {}, examples = [] ): self.name = name self.pattern = pattern self.tags = tags self.description = description self.examples = examples self.commonTags = commonTags def normalize(self, logline): raise NotImplementedError def test_examples(self): raise NotImplementedError def get_description(self, language = 'en'): tags_desc = dict([ (tag.name, tag.get_description(language)) for tag in self.tags.values() ]) substitutes = dict([ (tag.substitute, tag.name) for tag in self.tags.values() ]) examples_desc = [ example.get_description(language) for example in self.examples ] return { 'pattern' : self.pattern, 'description' : self.description.get(language, "N/A"), 'tags' : tags_desc, 'substitutes' : substitutes, 'commonTags' : self.commonTags, 'examples' : examples_desc } class CSVPattern(object): """A pattern that handle CSV case.""" def __init__(self, name, pattern, separator = ',', quotechar = '"', tags = {}, callBacks = [], tagTypes = {}, genericTagTypes = {}, genericCallBacks = {}, description = '', commonTags = {}, examples = []): """ @param name: the pattern name @param pattern: the CSV pattern @param separator: the CSV delimiter @param quotechar: the CSV quote character @param tags: a dict of L{Tag} instance with Tag name as key @param callBacks: a list of L{CallbackFunction} @param tagTypes: a dict of L{TagType} instance with TagType name as key @param genericTagTypes: a dict of L{TagType} instance from common_tags xml definition with TagType name as key @param genericCallBacks: a dict of L{CallBacks} instance from common_callbacks xml definition with callback name as key @param description: a pattern description @param commonTags: a Dict of tags to add to the final normalisation @param examples: a list of L{PatternExample} """ self.name = name self.pattern = pattern self.separator = separator self.quotechar = quotechar self.tags = tags self.callBacks = callBacks self.tagTypes = tagTypes self.genericTagTypes = genericTagTypes self.genericCallBacks = genericCallBacks self.description = description self.examples = examples self.commonTags = commonTags _fields = self.pattern.split(self.separator) if self.separator != ' ': self.fields = [f.strip() for f in _fields] else: self.fields = _fields self.check_count = len(self.fields) def postprocess(self, data): for tag in self.tags: # tagTypes defined in the conf file take precedence on the # generic ones. If nothing found either way, fall back to # Anything. tag_regexp = self.tagTypes.get(self.tags[tag].tagtype, self.genericTagTypes.get(self.tags[tag].tagtype, self.genericTagTypes['Anything'])).regexp r = re.compile(tag_regexp) field = self.tags[tag].substitute if field not in data.keys(): continue if not r.match(data[field]): # We found a tag that not matchs the expected regexp return None else: value = data[field] del data[field] data[tag] = value # try to apply callbacks # but do not try to apply callbacks if we do not have any value if not data[tag]: continue callbacks_names = self.tags[tag].callbacks for cbname in callbacks_names: try: # get the callback in the definition file, or look it up in the common library if not found callback = [cb for cb in self.callBacks.values() if cb.name == cbname] or\ [cb for cb in self.genericCallBacks.values() if cb.name == cbname] callback = callback[0] except: warnings.warn("Unable to find callback %s for pattern %s" % (cbname, self.name)) continue try: callback(data[tag], data) except Exception, e: raise Exception("Error on callback %s in pattern %s : %s - skipping" % (cbname, self.name, e)) # remove temporary tags temp_tags = [t for t in data.keys() if t.startswith('__')] for t in temp_tags: del data[t] empty_tags = [t for t in data.keys() if not data[t]] # remove empty tags for t in empty_tags: del data[t] return data def normalize(self, logline): # Verify logline is a basestring if not isinstance(logline, basestring): return None # Try to retreive some fields with csv reader try: data = [data for data in csv.reader([logline], delimiter = self.separator, quotechar = self.quotechar)][0] except: return None # Check we have something in data if not data: return None else: # Verify csv reader has match the expected number of fields if len(data) != self.check_count: return None # Check expected for for fileds and apply callbacks data = self.postprocess(dict(zip(self.fields, data))) # Add common tags if data: data.update(self.commonTags) return data def test_examples(self): raise NotImplementedError def get_description(self, language = 'en'): tags_desc = dict([ (tag.name, tag.get_description(language)) for tag in self.tags.values() ]) substitutes = dict([ (tag.substitute, tag.name) for tag in self.tags.values() ]) examples_desc = [ example.get_description(language) for example in self.examples ] return { 'pattern' : self.pattern, 'description' : self.description.get(language, "N/A"), 'tags' : tags_desc, 'substitutes' : substitutes, 'commonTags' : self.commonTags, 'examples' : examples_desc } class CallbackFunction(object): """This class is used to define a callback function from source code present in the XML configuration file. The function is defined in a sanitized environment (imports are disabled, for instance). This class is inspired from this recipe : http://code.activestate.com/recipes/550804-create-a-restricted-python-function-from-a-string/ """ def __init__(self, function_body = "log['test'] = value", name = 'unknown'): source = "def __cbfunc__(value,log):\n" source += '\t' + '\n\t'.join(function_body.split('\n')) + '\n' self.__doc__ = "Callback function generated from the following code:\n\n" + source byteCode = compile(source, '', 'exec') self.name = name # Setup a standard-compatible python environment builtins = dict() globs = dict() locs = dict() builtins["locals"] = lambda: locs builtins["globals"] = lambda: globs globs["__builtins__"] = builtins globs["__name__"] = "SAFE_ENV" globs["__doc__"] = source if type(__builtins__) is dict: bi_dict = __builtins__ else: bi_dict = __builtins__.__dict__ for k in SAFE_SYMBOLS: try: locs[k] = locals()[k] continue except KeyError: pass try: globs[k] = globals()[k] continue except KeyError: pass try: builtins[k] = bi_dict[k] except KeyError: pass # set the function in the safe environment eval(byteCode, globs, locs) self.cbfunction = locs["__cbfunc__"] def __call__(self, value, log): """call the instance as a function to run the callback.""" # Exceptions are caught higher up in the normalization process. self.cbfunction(value, log) return log class Normalizer(object): """Log Normalizer, based on an XML definition file.""" def __init__(self, xmlconf, genericTagTypes, genericCallBacks): """initializes the normalizer with an lxml ElementTree. @param xmlconf: lxml ElementTree normalizer definition @param genericTagTypes: path to generic tags definition xml file """ self.text_source = tostring(xmlconf, pretty_print = True) self.sys_path = xmlconf.docinfo.URL normalizer = xmlconf.getroot() self.genericTagTypes = get_generic_tagTypes(genericTagTypes) self.genericCallBacks = get_generic_callBacks(genericCallBacks) self.description = {} self.authors = [] self.tagTypes = {} self.callbacks = {} self.prerequisites = {} self.patterns = {} self.commonTags = {} self.finalCallbacks = [] self.name = normalizer.get('name') if not self.name: raise ValueError, "The normalizer configuration lacks a name." self.version = float(normalizer.get('version')) or 1.0 self.appliedTo = normalizer.get('appliedTo') or 'raw' self.re_flags = ( (normalizer.get('unicode') == "yes" and re.UNICODE ) or 0 ) |\ ( (normalizer.get('ignorecase') == "yes" and re.IGNORECASE ) or 0 ) |\ ( (normalizer.get('multiline') == "yes" and re.MULTILINE ) or 0 ) self.matchtype = ( normalizer.get('matchtype') == "search" and "search" ) or 'match' try: self.taxonomy = normalizer.get('taxonomy') except: self.taxonomy = None for node in normalizer: if node.tag == "description": for desc in node: self.description[desc.get('language')] = desc.text elif node.tag == "authors": for author in node: self.authors.append(author.text) elif node.tag == "tagTypes": for tagType in node: tT_description = {} tT_regexp = '' for child in tagType: if child.tag == 'description': for desc in child: tT_description[desc.get("language")] = desc.text elif child.tag == 'regexp': tT_regexp = child.text self.tagTypes[tagType.get('name')] = TagType(tagType.get('name'), tagType.get('ttype') or "basestring", tT_regexp, tT_description, self.re_flags) elif node.tag == 'callbacks': for callback in node: self.callbacks[callback.get('name')] = CallbackFunction(callback.text, callback.get('name')) elif node.tag == 'prerequisites': for prereqTag in node: self.prerequisites[prereqTag.get('name')] = prereqTag.text elif node.tag == 'patterns': self.__parse_patterns(node) elif node.tag == "commonTags": for commonTag in node: self.commonTags[commonTag.get('name')] = commonTag.text elif node.tag == "finalCallbacks": for callback in node: self.finalCallbacks.append(callback.text) # precompile regexp self.full_regexp, self.tags_translation, self.tags_to_pattern, whatever = self.get_uncompiled_regexp() self.full_regexp = re.compile(self.full_regexp, self.re_flags) def __parse_patterns(self, node): for pattern in node: p_name = pattern.get('name') p_description = {} p_tags = {} p_commonTags = {} p_examples = [] p_csv = {} for p_node in pattern: if p_node.tag == 'description': for desc in p_node: p_description[desc.get('language')] = desc.text elif p_node.tag == 'text': p_pattern = p_node.text if 'type' in p_node.attrib: p_type = p_node.get('type') if p_type == 'csv': p_csv = {'type': 'csv'} if 'separator' in p_node.attrib: p_csv['separator'] = p_node.get('separator') if 'quotechar' in p_node.attrib: p_csv['quotechar'] = p_node.get('quotechar') elif p_node.tag == 'tags': for tag in p_node: t_cb = [] t_description = {} t_name = tag.get('name') t_tagtype = tag.get('tagType') for child in tag: if child.tag == 'description': for desc in child: t_description[desc.get('language')] = desc.text if child.tag == 'substitute': t_substitute = child.text elif child.tag == 'callbacks': for cb in child: t_cb.append(cb.text) p_tags[t_name] = Tag(t_name, t_tagtype, t_substitute, t_description, t_cb) elif p_node.tag == "commonTags": for commontag in p_node: p_commonTags[commontag.get('name')] = commontag.text elif p_node.tag == 'examples': for example in p_node: e_description = {} e_expectedTags = {} for child in example: if child.tag == 'description': for desc in child: e_description[desc.get('language')] = desc.text elif child.tag == 'text': e_rawline = child.text elif child.tag == "expectedTags": for etag in child: e_expectedTags[etag.get('name')] = etag.text p_examples.append(PatternExample(e_rawline, e_expectedTags, e_description)) if not p_csv: self.patterns[p_name] = Pattern(p_name, p_pattern, p_tags, p_description, p_commonTags, p_examples) else: self.patterns[p_name] = CSVPattern(p_name, p_pattern, p_csv['separator'], p_csv['quotechar'], p_tags, self.callbacks, self.tagTypes, self.genericTagTypes, self.genericCallBacks, p_description, p_commonTags, p_examples) def get_description(self, language = "en"): return "%s v. %s" % (self.name, self.version) def get_long_description(self, language = 'en'): patterns_desc = [ pattern.get_description(language) for pattern in self.patterns.values() ] return { 'name' : self.name, 'version' : self.version, 'authors' : self.authors, 'description' : self.description.get(language, "N/A"), 'patterns' : patterns_desc, 'commonTags' : self.commonTags, 'taxonomy' : self.taxonomy } def get_uncompiled_regexp(self, p = None, increment = 0): """returns the uncompiled regular expression associated to pattern named p. If p is None, all patterns are stitched together, ready for compilation. increment is the starting value to use for the generic tag names in the returned regular expression. @return: regexp, dictionary of tag names <-> tag codes, dictionary of tags codes <-> pattern the tag came from, new increment value """ patterns = p regexps = [] tags_translations = {} tags_to_pattern = {} if not patterns: # WARNING ! dictionary keys are not necessarily returned in creation order. # This is silly, as the pattern order is crucial. So we must enforce that # patterns are named in alphabetical order of precedence ... patterns = sorted(self.patterns.keys()) if isinstance(patterns, basestring): patterns = [patterns] for pattern in patterns: if isinstance(self.patterns[pattern], CSVPattern): continue regexp = self.patterns[pattern].pattern for tagname, tag in self.patterns[pattern].tags.items(): # tagTypes defined in the conf file take precedence on the # generic ones. If nothing found either way, fall back to # Anything. tag_regexp = self.tagTypes.get(tag.tagtype, self.genericTagTypes.get(tag.tagtype, self.genericTagTypes['Anything'])).regexp named_group = '(?P%s)' % (increment, tag_regexp) regexp = regexp.replace(tag.substitute, named_group) tags_translations['tag%i' % increment] = tagname tags_to_pattern['tag%i' % increment] = pattern increment += 1 regexps.append("(?:%s)" % regexp) return "|".join(regexps), tags_translations, tags_to_pattern, increment def normalize(self, log, do_not_check_prereq = False): """normalization in standalone mode. @param log: a dictionary or an object providing at least a get() method @param do_not_check_prereq: if set to True, the prerequisite tags check is skipped (debug purpose only) @return: a dictionary with updated tags if normalization was successful.""" if isinstance(log, basestring) or not hasattr(log, "get"): raise ValueError, "the normalizer expects an argument of type Dict" # Test prerequisites if all( [ re.match(value, log.get(prereq, '')) for prereq, value in self.prerequisites.items() ]) or\ do_not_check_prereq: csv_patterns = [csv_pattern for csv_pattern in self.patterns.values() if isinstance(csv_pattern, CSVPattern)] if self.appliedTo in log.keys(): m = getattr(self.full_regexp, self.matchtype)(log[self.appliedTo]) if m is not None: m = m.groupdict() if m: # this little trick makes the following line not type dependent temp_wl = dict([ (u, log[u]) for u in log.keys() ]) for tag in m: if m[tag] is not None: matched_pattern = self.patterns[self.tags_to_pattern[tag]] temp_wl[self.tags_translation[tag]] = m[tag] # apply eventual callbacks for cb in matched_pattern.tags[self.tags_translation[tag]].callbacks: # TODO it could be desirable to make sure the callback # does not try to change important preset values such as # 'raw' and 'uuid'. try: # if the callback doesn't exist in the normalizer file, it will # search in the commonCallBack file. temp_wl = self.callbacks.get(cb, self.genericCallBacks.get(cb))(m[tag], temp_wl) except Exception, e: pattern_name = self.patterns[self.tags_to_pattern[tag]].name raise Exception("Error on callback %s in pattern %s : %s - skipping" % (self.callbacks[cb].name, pattern_name, e)) # remove temporary tags if self.tags_translation[tag].startswith('__'): del temp_wl[self.tags_translation[tag]] log.update(temp_wl) # add the pattern's common Tags log.update(matched_pattern.commonTags) # then add the normalizer's common Tags log.update(self.commonTags) # then add the taxonomy if relevant if self.taxonomy: log['taxonomy'] = self.taxonomy # and finally, apply the final callbacks for cb in self.finalCallbacks: try: log.update(self.callbacks.get(cb, self.genericCallBacks.get(cb))(None, log)) except Exception, e: raise Exception("Cannot apply final callback %s : %r - skipping" % (cb, e)) elif csv_patterns: # this little trick makes the following line not type dependent temp_wl = dict([ (u, log[u]) for u in log.keys() ]) ret = None for csv_pattern in csv_patterns: ret = csv_pattern.normalize(temp_wl[self.appliedTo]) if ret: log.update(ret) # then add the normalizer's common Tags log.update(self.commonTags) # then add the taxonomy if relevant if self.taxonomy: log['taxonomy'] = self.taxonomy # and finally, apply the final callbacks for cb in self.finalCallbacks: try: log.update(self.callbacks.get(cb, self.genericCallBacks.get(cb))(None, log)) except Exception, e: raise Exception("Cannot apply final callback %s : %r - skipping" % (cb, e)) break return log def validate(self): """if the definition file comes with pattern examples, this method can be invoked to test these patterns against the examples. Note that tags not included in the "expectedTags" directives will not be checked for validation. @return: True if the normalizer is validated, raises a ValueError describing the problem otherwise. """ for p in self.patterns: for example in self.patterns[p].examples: w = { self.appliedTo : example.raw_line } if isinstance(self.patterns[p], Pattern): w = self.normalize(w, do_not_check_prereq = True) elif isinstance(self.patterns[p], CSVPattern): w = self.patterns[p].normalize(example.raw_line) if w: w.update(self.commonTags) if self.taxonomy: w['taxonomy'] = self.taxonomy for cb in self.finalCallbacks: try: w.update(self.callbacks.get(cb, self.genericCallBacks.get(cb))(None, w)) except Exception, e: raise Exception("Cannot apply final callback %s : %r - skipping" % (cb, e)) for expectedTag in example.expected_tags.keys(): if isinstance(w.get(expectedTag), datetime): svalue = str(w.get(expectedTag)) elif isinstance(w.get(expectedTag), int): svalue = str(w.get(expectedTag)) else: svalue = w.get(expectedTag) if svalue != example.expected_tags[expectedTag]: raise ValueError, 'Sample log "%s" does not match : expected %s -> %s, %s' % \ (example, expectedTag, example.expected_tags[expectedTag], w.get(expectedTag)) # No problem so far ? Awesome ! return True def get_source(self): """gets the raw XML source for this normalizer.""" return self.text_source def get_languages(self): """guesstimates the available languages from the description field and returns them as a list.""" return self.description.keys() # Documentation generator def doc2RST(description, gettext = None): """ Returns a RestructuredText documentation from a parser description. @param description: the long description of the parser. @param gettext: is the gettext method to use. You must configure gettext to use the domain 'normalizer' and select a language. eg. gettext.translation('normalizer', 'i18n', ['fr_FR']).ugettext """ def escape(text): if isinstance(text, basestring): for c in "*\\": text.replace(c, "\\" + c) return text if not gettext: _ = lambda x: x else: _ = gettext template = _("""%(title)s **Written by** %(authors)s Description ::::::::::: %(description)s %(taxonomy)s This normalizer can parse logs of the following structure(s): %(patterns)s Examples :::::::: %(examples)s""") d = {} d['title'] = description['name'] + ' v.' + str(description['version']) d['title'] += '\n' + '-'*len(d['title']) d['authors'] = '\n'.join( ['* *%s*' % a for a in description['authors'] ] ) d['description'] = escape(description['description']) or _('undocumented') d['taxonomy'] = '' if description["taxonomy"]: d['taxonomy'] = ("\n\n" +\ (_("This normalizer belongs to the category : *%s*") % description['taxonomy']) ) d['patterns'] = '' d['examples'] = '' for p in description['patterns']: d['patterns'] +="""* **%s**""" % escape(p['pattern']) d['patterns'] += _(", where\n\n") for sub in p['substitutes']: d['patterns'] += _(" * **%s** is %s ") % (escape(sub), (p['tags'][p['substitutes'][sub]] or _('undocumented') )) if not p['substitutes'][sub].startswith('__'): d['patterns'] += _("(normalized as *%s*)") % p['substitutes'][sub] d['patterns'] += "\n" if description['commonTags'] or p['commonTags']: d['patterns'] += _("\n Additionally, The following tags are automatically set:\n\n") for name, value in sum([description['commonTags'].items(), p['commonTags'].items()], []): d['patterns'] += " * *%s* : %s\n" % (escape(name), value) d['patterns'] += "\n" if p.get('description') : d['patterns'] += "\n %s\n" % p['description'] d['patterns'] += "\n" for example in p['examples']: d['examples'] += _("* *%s*, normalized as\n\n") % escape(example['sample']) for tag, value in example['normalization'].items(): d['examples'] += " * **%s** -> %s\n" % (escape(tag), value) d['examples'] += '\n' return template % d pylogsparser-0.4/PKG-INFO0000644000175000017500000004676611715707344013336 0ustar fbofboMetadata-Version: 1.1 Name: pylogsparser Version: 0.4 Summary: A log parser library packaged with a set of ready to use parsers (DHCPd, Squid, Apache, ...) Home-page: http://www.wallix.org/pylogsparser-project/ Author: Wallix Author-email: opensource@wallix.org License: LGPL Description: LogsParser ========== Description ::::::::::: LogsParser is an opensource python library created by Wallix ( http://www.wallix.org ). It is used as the core mechanism for logs tagging and normalization by Wallix's LogBox ( http://www.wallix.com/index.php/products/wallix-logbox ). Logs come in a variety of formats. In order to parse many different types of logs, a developer used to need to write an engine based on a large list of complex regular expressions. It can become rapidly unreadable and unmaintainable. By using LogsParser, a developer can free herself from the burden of writing a log parsing engine, since the module comes in with "batteries included". Furthermore, this engine relies upon XML definition files that can be loaded at runtime. The definition files were designed to be easily readable and need very little skill in programming or regular expressions, without sacrificing powerfulness or expressiveness. Purpose ::::::: The LogsParser module uses normalization definition files in order to tag log entries. The definition files are written in XML. The definition files allow anyone with a basic understanding of regular expressions and knowledge of a specific log format to create and maintain a customized pool of parsers. Basically a definition file will consist of a list of log patterns, each composed of many keywords. A keyword is a placeholder for a notable and/or variable part in the described log line, and therefore associated to a tag name. It is paired to a tag type, e.g. a regular expression matching the expected value to assign to this tag. If the raw value extracted this way needs further processing, callback functions can be applied to this value. This format also allows to add useful meta-data about parsed logs, such as extensive documentation about expected log patterns and log samples. Format Description ------------------ A normalization definition file must strictly follow the specifications as they are detailed in the file normalizer.dtd . A simple template is provided to help parser writers get started with their task, called normalizer.template. Most definition files will include the following sections : * Some generic documentation about the parsed logs : emitting application, application version, etc ... (non-mandatory) * the definition file's author(s) (non-mandatory) * custom tag types (non-mandatory) * callback functions (non-mandatory) * Prerequisites on tag values prior to parsing (non-mandatory) * Log pattern(s) and how they are to be parsed * Extra tags with a fixed value that should be added once the parsing is done (non-mandatory) Root .... The definition file's root must hold the following elements : * the normalizer's name. * the normalizer's version. * the flags to apply to the compilation of regular expressions associated with this parser : unicode support, multiple lines support, and ignore case. * how to match the regular expression : from the beginning of the log line (match) or from anywhere in the targeted tag (search) * the tag value to parse (raw, body...) * the service taxonomy, if relevant, of the normalizer. See the end of this document for more details. Default tag types ................. A few basic tag types are defined in the file common_tagTypes.xml . In order to use it, it has to be loaded when instantiating the Normalizer class; see the class documentation for further information. Here is a list of default tag types shipped with this library. * Anything : any character chain of any length. * Integer * EpochTime : an EPOCH timestamp of arbitrary precision (to the second and below). * syslogDate : a date as seen in syslog formatted logs (example : Mar 12 20:13:23) * URL * MACAddress * Email * IP * ZuluTime : a "Zulu Time"-type timestamp (example : 2012-12-21T13:45:05) Custom Tag Types ................ It is always possible to define new tag types in a parser definition file, and to overwrite default ones. To define a new tag type, the following elements are needed : * a type name. This will be used as the type reference in log patterns. * the python type of the expected result : this element is not used yet and can be safely set to anything. * a non-mandatory description. * the regular expression defining this type. Callback Functions .................. One might want to transform a raw value after it has been extracted from a pattern: the syslog normalizer converts the raw log timestamp into a python datetime object, for example. In order to do this, the tag must be used to define a callback function. requires a function name as a mandatory attribute. Its text defines the function body as in python, meaning the PEP8 indentation rules are to be followed. When writing a callback function, the following rules must be respected : * Your callback function will take ONLY two arguments: **value** and **log**. "value" is the raw value extracted from applying the log pattern to the log, and "log" is the dictionary of the normalized log in its current state (prior to normalization induced by this parser definition file). * Your callback function can modify the "log" argument (especially assign the transformed value to the concerned tag name) and must not return anything. * Your callback function has a restricted access to the following facilities: :: "list", "dict", "tuple", "set", "long", "float", "object", "bool", "callable", "True", "False", "dir", "frozenset", "getattr", "hasattr", "abs", "cmp", "complex", "divmod", "id", "pow", "round", "slice", "vars", "hash", "hex", "int", "isinstance", "issubclass", "len", "map", "filter", "max", "min", "oct", "chr", "ord", "range", "reduce", "repr", "str", "unicode", "basestring", "type", "zip", "xrange", "None", "Exception" * Importing modules is therefore forbidden and impossible. The *re* and *datetime* modules are available for use as if the following lines were present: :: import re from datetime import datetime * In version 0.4, the "extras" package is introduced. It allows more freedom in what can be used in callbacks. It also increases execution speed in some cases; typically when you need to use complex objects in your callback like a big set or a big regular expression. In the old approach, this object would be created each time the function is called; by deporting the object's creation in the extras package it is created once and for all. See the modules in logsparser.extras for use cases. Default callbacks ................. As with default tag types, a few generic callbacks are defined in the file common_callBacks.xml . Currently they are meant to deal with common date formattings. Therefore they will automatically set the "date" tag. In order to use it, the callbacks file has to be loaded when instantiating the Normalizer class; see the class documentation for further information. In case of name collisions, callbacks defined in a normalizer description file take precedence over common callbacks. Here is a list of default callbacks shipped with this library. * MM/dd/YYYY hh:mm:ss : parses dates such as 04/13/2010 14:23:56 * dd/MMM/YYYY:hh:mm:ss : parses dates such as 19/Jul/2009 12:02:43 * MMM dd hh:mm:ss : parses dates such as Oct 23 10:23:12 . The year is guessed so that the resulting date is the closest in the past. * DDD MMM dd hh:mm:ss YYYY : parses dates such as Mon Sep 11 09:13:54 2011 * YYYY-MM-DD hh:mm:ss : parses dates such as 2012-12-21 00:00:00 * MM/DD/YY, hh:mm:ss : parses dates such as 10/23/11, 07:24:04 . The year is assumed to be in the XXIst century. * YYMMDD hh:mm:ss: parses dates such as 070811 17:23:12 . The year is assumed to be in the XXIst century. * ISO8601 : converts a combined date and time in UTC expressed according to the ISO 8601 standard. Also commonly referred to as "Zulu Time". * EPOCH : parses EPOCH timestamps * dd-MMM-YYYY hh:mm:ss : parses dates such as 28-Feb-2010 23:15:54 Final callbacks ............... One might want to wait until a pattern has been fully applied before processing data : if for example you'd like to tag a log with a value made of a concatenation of other values, and so on. It is possible to specify a list of callbacks to apply at the end of the parsing with the XML tag "finalCallbacks". Such callbacks will follow the mechanics described above, with one notable change: they will be called with the argument "value" set to None. Therefore, you have to make sure your callback will work correctly that way. There are a few examples of use available : in the test_normalizer.py test code, and in the deny_all normalizer. Pattern definition .................. A definition file can contain as many log patterns as one sees fit. These patterns are simplified regular expressions and applied in alphabetical order of their names, so it is important to name them so that the more precise patterns are tried before the more generic ones. A pattern is a "meta regular expression", which means that every syntactic rule from python's regular expressions are to be followed when writing a pattern, especially escaping special characters. To make the patterns easier to read than an obtuse regular expression, keywords act as "macros" and correspond to a part of the log to assign to a tag. A log pattern has the following components: * A name. * A non-mandatory description of the pattern's context. * The pattern itself, under the tag "text". * The tags as they appear in the pattern, the associated name once the normalization is over, and the callback functions to eventually call on their raw values * Non-mandatory log samples. These can be used for self-validation. If a tag name starts with __ (double underscore), this tag won't be added to the final normalized dictionary. This allows to create temporary tags that will typically be used in conjunction to a series of callback functions, when the original raw value has no actual interest. To define log patterns describing a CSV-formatted message, one must add the following attributes in the tag "text": * type="csv" * separator="," or the relevant separator character * quotechar='"' or the relevant quotation character Tags are then defined normally. Pylogsparser will deal automatically with missing fields. Best practices .............. * Order your patterns in decreasing order of specificity. Not doing so might trigger errors, as more generic patterns will match earlier. * The more precise your tagTypes' regular expressions, the more accurate your parser will be. * Use description tags liberally. The more documented a log format, the better. Examples are also invaluable. Tag naming convention ..................... The tag naming convention is lowercase, underscore separated words. It is strongly recommended to stick to that naming convention when writing new normalizers for consistency's sake. In case of dynamic fields, it is advised to make sure dynamic naming follows the convention. There's an example of this in MSExchange2007MessageTracking.xml; see the callback named "decode_MTLSourceContext". Log contains common informations such as username, IP address, informations about transport protocol... In order to ease log post-processing we must define a common method to name those tags and not deal for example with a series of "login, user, username, userid" all describing a user id. The alphabetical list below is a series of tag names that must be used when relevant. - action : action taken by a component such as DELETED, migrated, DROP, open. - bind_int : binding interface for a network service. - dest_host : hostname or FQDN of a destination host. - dest_ip : IP address of a destination host. - dest_mac : MAC address of a destination host. - dest_port : destination port of a network connection. - event_id : id describing an event. - inbound_int : network interface for incoming data. - len : a data size. - local_host : hostname or FQDN of the local host. - local_ip : IP adress of the local host. - local_mac : MAC address of the local host. - local_port : listening port of a local service. - message_id : message or transaction id. - message_recipient : message recipient id. - message_sender : message sender id. - method : component access method such as GET, key_auth. - outbound_int : network interface for outgoing data. - protocol : network or software protocol name or numeric id such as TCP, NTP, SMTP. - source_host : hostname or FQDN of a source host. - source_ip : IP address of a source host. - source_mac : MAC address of a source host. - source_port : source port of a network connection. - status : component status such as FAIL, success, 404. see below for a complete list. - url : an URL as defined in rfc1738. (scheme://netloc/path;parameters?query#fragment) - user : a user id. Service taxonomy ................ As of pylogsparser 0.4 a taxonomy tag is added to relevant normalizers. It helps classifying logs by service type, which can be useful for reporting among other things. Here is a list of identified services; suggestions and improvements are welcome ! +-----------+----------------------------------------+------------------------+ | Service | Description | Normalizers | +===========+========================================+========================+ | access | A service dealing with authentication | Fail2ban | | control | and/or authorization | pam | | | | sshd | | | | wabauth | +-----------+----------------------------------------+------------------------+ | antivirus | A service dealing with malware | bitdefender | | | detection and prevention | symantec | +-----------+----------------------------------------+------------------------+ | database | A database service such as mySQLd, | mysql | | | postmaster (postGRESQL), ... | | +-----------+----------------------------------------+------------------------+ | address | A service in charge of network address | dhcpd | |assignation| assignations | | +-----------+----------------------------------------+------------------------+ | name | A service in charge of network names | named | | resolution| resolutions | named-2 | +-----------+----------------------------------------+------------------------+ | firewall | A service in charge of monitoring | LEA | | | and filtering network traffic | arkoonFAST360 | | | | deny_event | | | | netfilter | +-----------+----------------------------------------+------------------------+ | file | A file transfer service | xferlog | | transfer | | | +-----------+----------------------------------------+------------------------+ | hypervisor| A virtualization platform service | VMWare_ESX4-ESXi4 | | | | | +-----------+----------------------------------------+------------------------+ | mail | A mail server | MSExchange2007- | | | | MessageTracking | | | | postfix | +-----------+----------------------------------------+------------------------+ | web proxy | A service acting as an intermediary | dansguardian | | | between clients and web resources; | deny_traffic | | | access control and content filtering | squid | | | can also occur | | +-----------+----------------------------------------+------------------------+ | web server| A service exposing web resources | IIS | | | | apache | +-----------+----------------------------------------+------------------------+ Keywords: log parser xml library python Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: Topic :: System :: Logging Classifier: Topic :: Software Development :: Libraries Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL) Requires: lxml Requires: pytz