kdd99±³¾°ÖªÊ¶ ÏÂÔØ±¾ÎÄ

±³¾°ÖªÊ¶

KDDÊÇÊý¾ÝÍÚ¾òÓë֪ʶ·¢ÏÖ£¨Data Mining and Knowledge Discovery£©µÄ¼ò³Æ£¬KDD CUPÊÇÓÉACM£¨Association for Computing Machiner£©µÄ SIGKDD

£¨Special Interest Group on Knowledge Discovery and Data Mining£©×éÖ¯µÄÄê¶È¾ºÈü¡£¾ºÈüÖ÷Ò³ÔÚÕâÀï¡£ ÏÂÃæÊÇÀú½ìKDDCUPµÄÌâÄ¿£º

KDD-Cup 2008, Breast cancer

KDD-Cup 2007, Consumer recommendations

KDD-Cup 2006, Pulmonary embolisms detection from image data KDD-Cup 2005, Internet user search query categorization KDD-Cup 2004, Particle physics; plus Protein homology prediction KDD-Cup 2003, Network mining and usage log analysis KDD-Cup 2002, BioMed document; plus Gene role classification KDD-Cup 2001, Molecular bioactivity; plus Protein locale prediction. KDD-Cup 2000, Online retailer website clickstream analysis KDD-Cup 1999, Computer network intrusion detection KDD-Cup 1998, Direct marketing for profit optimization KDD-Cup 1997, Direct marketing for lift curve optimization

¡±KDD CUP 99 dataset ¡±¾ÍÊÇKDD¾ºÈüÔÚ1999Äê¾ÙÐÐʱ²ÉÓõÄÊý¾Ý¼¯¡£´ÓÕâÀïÏÂÔØKDD99Êý¾Ý¼¯¡£

1998ÄêÃÀ¹ú¹ú·À²¿¸ß¼¶¹æ»®Êð£¨DARPA£©ÔÚMITÁÖ¿ÏʵÑéÊÒ½øÐÐÁËÒ»ÏîÈëÇÖ¼ì²âÆÀ¹ÀÏîÄ¿¡£ÁÖ¿ÏʵÑéÊÒ½¨Á¢ÁËÄ£ÄâÃÀ¹ú¿Õ¾ü¾ÖÓòÍøµÄÒ»¸öÍøÂç»·¾³£¬ÊÕ¼¯ÁË9ÖÜʱ¼äµÄ TCPdump(*) ÍøÂçÁ¬½ÓºÍϵͳÉó¼ÆÊý¾Ý£¬·ÂÕæ¸÷ÖÖÓû§ÀàÐÍ¡¢¸÷ÖÖ²»Í¬µÄÍøÂçÁ÷Á¿ºÍ¹¥»÷ÊֶΣ¬Ê¹Ëü¾ÍÏñÒ»¸öÕæÊµµÄÍøÂç»·¾³¡£ÕâЩTCPdump²É¼¯µÄԭʼÊý¾Ý±»·ÖΪÁ½¸ö²¿·Ö£º7ÖÜʱ¼äµÄѵÁ·Êý¾Ý (**) ´ó¸Å°üº¬5,000,000¶à¸öÍøÂçÁ¬½Ó¼Ç¼£¬Ê£ÏµÄ2ÖÜʱ¼äµÄ²âÊÔÊý¾Ý´ó¸Å°üº¬2,000,000¸öÍøÂçÁ¬½Ó¼Ç¼¡£

Ò»¸öÍøÂçÁ¬½Ó¶¨ÒåΪÔÚij¸öʱ¼äÄÚ´Ó¿ªÊ¼µ½½áÊøµÄTCPÊý¾Ý°üÐòÁУ¬²¢ÇÒÔÚÕâ¶Îʱ¼äÄÚ£¬Êý¾ÝÔÚÔ¤¶¨ÒåµÄЭÒéÏ£¨ÈçTCP¡¢UDP£©´ÓÔ´IPµØÖ·µ½Ä¿µÄIPµØÖ·µÄ´«µÝ¡£Ã¿¸öÍøÂçÁ¬½Ó±»±ê¼ÇΪÕý³££¨normal£©»òÒì³££¨attack£©£¬Òì³£ÀàÐͱ»Ï¸·ÖΪ4´óÀ๲39ÖÖ¹¥»÷ÀàÐÍ£¬ÆäÖÐ22ÖÖ¹¥»÷ÀàÐͳöÏÖÔÚѵÁ·¼¯ÖУ¬ÁíÓÐ17ÖÖδ֪¹¥»÷ÀàÐͳöÏÖÔÚ²âÊÔ¼¯ÖС£ 4ÖÖÒì³£ÀàÐÍ·Ö±ðÊÇ£º

1. DOS, denial-of-service. ¾Ü¾ø·þÎñ¹¥»÷£¬ÀýÈçping-of-death, syn flood, smurfµÈ£» 2. R2L, unauthorized access from a remote machine to a local machine. À´×ÔÔ¶³ÌÖ÷

»úµÄδÊÚȨ·ÃÎÊ£¬ÀýÈçguessing password£»

3. U2R, unauthorized access to local superuser privileges by a local unpivileged user

. δÊÚȨµÄ±¾µØ³¬¼¶Óû§ÌØÈ¨·ÃÎÊ£¬ÀýÈçbuffer overflow attacks£»

4. PROBING, surveillance and probing, ¶Ë¿Ú¼àÊÓ»òɨÃ裬ÀýÈçport-scan, ping-sweep

µÈ¡£

ËæºóÀ´×Ô¸çÂ×±ÈÑÇ´óѧµÄSal Stolfo ½ÌÊÚºÍÀ´×Ô±±¿¨ÂÞÀ³ÄÉÖÝÁ¢´óѧµÄ Wenke Lee ½ÌÊÚ²ÉÓÃÊý¾ÝÍÚ¾òµÈ¼¼Êõ¶ÔÒÔÉϵÄÊý¾Ý¼¯½øÐÐÌØÕ÷·ÖÎöºÍÊý¾ÝÔ¤´¦Àí£¬ÐγÉÁËÒ»¸öеÄÊý¾Ý¼¯¡£¸ÃÊý¾Ý¼¯ÓÃÓÚ1999Äê¾ÙÐеÄKDD CUP¾ºÈüÖУ¬³ÉÎªÖøÃûµÄKDD99Êý¾Ý¼¯¡£ËäÈ»Äê´úÓÐЩ¾ÃÔ¶£¬µ«KDD99Êý¾Ý¼¯ÈÔÈ»ÊÇÍøÂçÈëÇÖ¼ì²âÁìÓòµÄÊÂʵBenckmark£¬Îª»ùÓÚ¼ÆËãÖÇÄܵÄÍøÂçÈëÇÖ¼ì²âÑо¿µì¶¨»ù´¡¡£

Êý¾ÝÌØÕ÷ÃèÊö

KDD99Êý¾Ý¼¯ÖÐÿ¸öÁ¬½Ó£¨*£©ÓÃ41¸öÌØÕ÷À´ÃèÊö£º

2, tcp, smtp, SF, 1684, 363, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0.00, 0.00, 0.00, 0.00, 1.00, 0.00, 0.00, 104, 66, 0.63, 0.03, 0.01, 0.00, 0.00, 0.00, 0.00, 0.00, normal.

0, tcp, private, REJ, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 38, 1, 0.00, 0.00, 1.00, 1.00, 0.03, 0.55, 0.00, 208, 1, 0.00, 0.11, 0.18, 0.00, 0.01, 0.00, 0.42, 1.00, portsweep.

0, tcp, smtp, SF, 787, 329, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0.00, 0.00, 0.00, 0.00, 1.00, 0.00, 0.00, 76, 117, 0.49, 0.08, 0.01, 0.02, 0.00, 0.00, 0.00, 0.00, normal.

ÉÏÃæÊÇÊý¾Ý¼¯ÖеÄ3Ìõ¼Ç¼£¬ÒÔCSV¸ñʽд³É£¬¼ÓÉÏ×îºóµÄ±ê¼Ç£¨label£©£¬Ò»¹²ÓÐ42ÏÆäÖÐǰ41ÏîÌØÕ÷·ÖΪ4´óÀ࣬ÏÂÃæ°´Ë³Ðò½âÊ͸÷¸öÌØÕ÷µÄº¬Ò壺 1. TCPÁ¬½Ó»ù±¾ÌØÕ÷£¨¹²9ÖÖ£©

»ù±¾Á¬½ÓÌØÕ÷°üº¬ÁËһЩÁ¬½ÓµÄ»ù±¾ÊôÐÔ£¬ÈçÁ¬ÐøÊ±¼ä£¬Ð­ÒéÀàÐÍ£¬´«Ë͵Ä×Ö½ÚÊýµÈ¡£ £¨1£©duration. Á¬½Ó³ÖÐøÊ±¼ä£¬ÒÔÃëΪµ¥Î»£¬Á¬ÐøÀàÐÍ¡£·¶Î§ÊÇ [0, 58329] ¡£ËüµÄ¶¨ÒåÊÇ´ÓTCPÁ¬½ÓÒÔ3´ÎÎÕÊÖ½¨Á¢ËãÆð£¬µ½FIN/ACKÁ¬½Ó½áÊøÎªÖ¹µÄʱ¼ä£»ÈôΪUDPЭÒéÀàÐÍ£¬Ôò½«Ã¿¸öUDPÊý¾Ý°ü×÷ΪһÌõÁ¬½Ó¡£Êý¾Ý¼¯ÖгöÏÖ´óÁ¿µÄduration = 0 µÄÇé¿ö£¬ÊÇÒòΪ¸ÃÌõÁ¬½ÓµÄ³ÖÐøÊ±¼ä²»×ã1Ãë¡£

£¨2£©protocol_type. ЭÒéÀàÐÍ£¬ÀëÉ¢ÀàÐÍ£¬¹²ÓÐ3ÖÖ£ºTCP, UDP, ICMP¡£ £¨3£©service. Ä¿±êÖ÷»úµÄÍøÂç·þÎñÀàÐÍ£¬ÀëÉ¢ÀàÐÍ£¬¹²ÓÐ70ÖÖ¡£?aol?, ?auth?, ?bgp?, ?courier?, ?csnet_ns?, ?ctf?, ?daytime?, ?discard?, ?domain?, ?domain_u?, ?echo?, ?eco_i?, ?ecr_i?, ?efs?, ?exec?, ?finger?, ?ftp?, ?ftp_data?, ?gopher?, ?harvest?, ?hostnames?, ?http?, ?http_2784¡ä, ?http_443¡ä, ?http_8001¡ä, ?imap4¡ä, ?IRC?, ?iso_tsap?, ?klogin?, ?kshell?, ?ldap?, ?link?, ?login?, ?mtp?, ?name?, ?netbios_dgm?, ?netbios_ns?, ?netbios_ssn?, ?netstat?, ?nnsp?, ?nntp?, ?ntp_u?, ?other?, ?pm_dump?, ?pop_2¡ä, ?pop_3¡ä, ?printer?, ?private?, ?red_i?, ?remote_job?, ?rje?, ?shell?, ?smtp?, ?sql_net?, ?ssh?, ?sunrpc?, ?supdup?, ?systat?, ?telnet?, ?tftp_u?, ?tim_i?, ?time?, ?urh_i?, ?urp_i?, ?uucp?, ?uucp_path?, ?vmnet?, ?whois?, ?X11¡ä, ?Z39_50¡ä¡£