深度学习 dns tunnel检测使用统计特征全连接网络——精度99.8%

2023年4月12日下午8:01 • 深度学习

代码如下：

import numpy as np
import tflearn
from tflearn.layers.core import dropout
from tflearn.layers.normalization import batch_normalization
from tflearn.data_utils import to_categorical
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import sys


class EarlyStoppingCallback(tflearn.callbacks.Callback):
    def __init__(self, val_acc_thresh):
        """ Note: We are free to define our init function however we please. """
        # Store a validation accuracy threshold, which we can compare against
        # the current validation accuracy at, say, each epoch, each batch step, etc.
        self.val_acc_thresh = val_acc_thresh

    def on_epoch_end(self, training_state):
        """ 
        This is the final method called in trainer.py in the epoch loop. 
        We can stop training and leave without losing any information with a simple exception.  
        """
        #print dir(training_state)
        print("Terminating training at the end of epoch", training_state.epoch)
        if training_state.val_acc >= self.val_acc_thresh and training_state.acc_value >= self.val_acc_thresh:
            raise StopIteration

    def on_train_end(self, training_state):
        """
        Furthermore, tflearn will then immediately call this method after we terminate training, 
        (or when training ends regardless). This would be a good time to store any additional 
        information that tflearn doesn't store already.
        """
        print("Successfully left training! Final model accuracy:", training_state.acc_value)

cols = ["label", "flow_cnt", "len(srcip_arr)", "len(dstip_arr)", "subdomain_num", "uniq_subdomain_ratio", "np.average(dns_request_len_arr)", "np.average(dns_reply_len_arr)", "np.average(subdomain_tag_num_arr)", "np.average(subdomain_len_arr)", "np.average(subdomain_weird_len_arr)", "np.average(subdomain_entropy_arr)", "A_rr_type_ratio", "incommon_rr_type_rato", "valid_ipv4_ratio", "uniq_valid_ipv4_ratio", "request_reply_ratio", "np.max(dns_request_len_arr)", "np.max(dns_reply_len_arr)", "np.max(subdomain_tag_num_arr)", "np.max(subdomain_len_arr)", "np.max(subdomain_weird_len_arr)", "np.max(subdomain_entropy_arr)", "avg_distance", "std_distance"]
#unwanted_cols = set(["uniq_subdomain_ratio", "incommon_rr_type_rato"])
unwanted_cols = set(["uniq_subdomain_ratio", "incommon_rr_type_rato", "np.max(dns_reply_len_arr)", "request_reply_ratio", "uniq_valid_ipv4_ratio", "A_rr_type_ratio"])
wanted_cols = set(['label', 'flow_cnt', 'len(srcip_arr)', 'len(dstip_arr)',
       'subdomain_num',
       'np.average(dns_request_len_arr)', 'np.average(dns_reply_len_arr)',
       'A_rr_type_ratio',
           'valid_ipv4_ratio',
       'request_reply_ratio', 'np.max(dns_request_len_arr)',
       'np.max(dns_reply_len_arr)'])



def parse_line(s):
    s = s.replace("(", "").replace(")", "").replace("[", "").replace("]", "")
    #dat = [float(_) for i,_ in enumerate(s.split(",")) if cols[i] not in unwanted_cols]
    dat = [float(_) for i,_ in enumerate(s.split(",")) if cols[i] in wanted_cols]
    return dat


if __name__ == "__main__":
    training_data = []
    with open("feature_with_dnn_todo.dat") as f:
        training_data = [parse_line(line) for line in f]

    #sys.exit(0)

    X = training_data
    org_labels = [1 if int(x[0])==2.0 else 0 for x in X]
    labels = to_categorical(org_labels, nb_classes=2)
    data = [x[1:] for x in X]
    input_dim = len(data[0])

    X = data
    Y = labels

    print "X len:", len(X), "Y len:", len(Y)
    trainX, testX, trainY, testY = train_test_split(X, Y, test_size=0.2, random_state=42)
    print trainX[0]
    print trainY[0]
    print testX[-1]
    print testY[-1]

    # Build neural network   
    net = tflearn.input_data(shape=[None, input_dim])
    net = batch_normalization(net)
    net = tflearn.fully_connected(net, input_dim)
    net = tflearn.fully_connected(net, 128, activation='tanh')
    net = dropout(net, 0.5)
    net = tflearn.fully_connected(net, 2, activation='softmax')
    net = tflearn.regression(net, optimizer='adam', learning_rate=0.001,
                     loss='categorical_crossentropy', name='target')
    # Define model
    model = tflearn.DNN(net)
    # Start training (apply gradient descent algorithm)
    # Initialize our callback with desired accuracy threshold.  
    early_stopping_cb = EarlyStoppingCallback(val_acc_thresh=0.998)
    try:
        model.fit(trainX, trainY, validation_set=(testX, testY), n_epoch=500, batch_size=8, show_metric=True, callbacks=early_stopping_cb)
    except StopIteration as e:
        print "pass"
    filename = 'tf_model/dns_tunnel2_998.tflearn'
    model.save(filename)
    model.load(filename)

    y_predict_list = model.predict(X)
    y_predict = []
    for i in y_predict_list:
        #print  i[0]
        if i[0] >= 0.5:
            y_predict.append(0)
        else:
            y_predict.append(1)

    print(classification_report(org_labels, y_predict))
    print confusion_matrix(org_labels, y_predict)

结果：

('Terminating training at the end of epoch', 175)
Training Step: 309936 | total loss: 0.00695 | time: 4.371s
| Adam | epoch: 176 | loss: 0.00695 - acc: 0.9988 | val_loss: 0.00661 - val_acc: 0.9991 -- iter: 14084/14084
--
('Terminating training at the end of epoch', 176)
('Successfully left training! Final model accuracy:', 0.9987633228302002)
pass
precision recall f1-score support

0 1.00 1.00 1.00 16529
1 0.97 0.99 0.98 1076

avg / total 1.00 1.00 1.00 17605

从混淆矩阵看，还是非常不错的！
[[16497 32]
[ 8 1068]]

输入数据样例：

(2.0,[39.0,1.0,2.0,38.0,0.974358974359,85.0,86.6666666667,3.0,30.0,0.0,3.84923785837,1.0,0.0,0.512820512821,0.025641025641,0.00150829562594,85.0,169.0,3.0,30.0,0.0,3.98989809546,2.54054054054,1.15301237879])
(2.0,[4437.0,3.0,10.0,13.0,0.00292990759522,48.554428668,45.3955375254,1.92307692308,91.3846153846,0.0,3.69230769231,0.972954699121,0.0,0.0,0.0,2.32087487699e-05,138.0,138.0,2.0,100.0,0.0,4.0,15.25,30.5753849799])
(2.0,[115.0,4.0,8.0,11.0,0.095652173913,99.2260869565,47.0347826087,2.0,74.7272727273,0.0,4.24137616275,0.0,0.0,0.0,0.0,0.000438173692052,131.0,131.0,2.0,82.0,0.0,4.3128598958,7.9,14.1594491418])
(2.0,[177.0,2.0,8.0,11.0,0.0621468926554,88.3389830508,35.6327683616,2.0,66.0,0.0,4.17962650637,0.0,0.0,0.0,0.0,0.000319774878486,115.0,115.0,2.0,66.0,0.0,4.17962650637,2.0,0.0])
(2.0,[38.0,7.0,6.0,23.0,0.605263157895,59.0263157895,120.473684211,1.0,20.5652173913,0.0,3.55684374229,0.657894736842,0.0,0.0263157894737,0.0263157894737,0.00222915737851,65.0,267.0,1.0,26.0,0.0,3.97366068969,14.7727272727,3.20414246338])
(2.0,[232.0,4.0,8.0,18.0,0.0775862068966,94.5301724138,39.9224137931,2.0,71.3333333333,0.0,4.19859571366,0.0,0.0,0.0,0.0,0.000227987779855,131.0,131.0,2.0,82.0,0.0,4.28968752349,5.47058823529,11.241298057])
(2.0,[90.0,3.0,8.0,12.0,0.133333333333,97.6,63.7222222222,2.0,74.0,0.0,4.23623035806,0.0,0.0,0.0,0.0,0.000569216757741,131.0,131.0,2.0,82.0,0.0,4.3128598958,7.36363636364,13.6066342594])
(2.0,[419.0,1.0,2.0,355.0,0.847255369928,72.9403341289,88.2816229117,3.0,30.0,0.0,3.80441789011,1.0,0.0,0.980906921241,0.00238663484487,0.000163601858517,74.0,90.0,3.0,30.0,0.0,4.05656476213,1.86440677966,0.654172884041])
(2.0,[132.0,2.0,8.0,12.0,0.0909090909091,83.446969697,38.446969697,2.0,66.0,0.0,4.15523801434,0.0,0.0,0.0,0.0,0.000453926463913,115.0,115.0,2.0,66.0,0.0,4.15523801434,2.0,0.0])
(2.0,[12399.0,9.0,8.0,48.0,0.00387127994193,131.489636261,63.534236632,2.0,86.5416666667,0.0,4.29632333151,0.92402613114,0.0,0.0,0.0,3.06684495259e-06,143.0,143.0,2.0,94.0,0.0,4.37237921923,7.34042553191,13.9897783289])
(2.0,[13659.0,11.0,11.0,55.0,0.00402664909583,131.545574347,65.8218756864,2.0,88.3272727273,0.0,4.34545972513,0.933670107621,0.0,0.0,0.0,2.78275427e-06,145.0,145.0,2.0,96.0,0.0,4.48022025041,8.31481481481,15.5072552602])
(2.0,[187.0,2.0,5.0,94.0,0.502673796791,88.1229946524,139.229946524,1.98936170213,43.9042553191,0.0,4.27189155149,0.502673796791,0.0,0.0,0.0,0.000303416469446,111.0,701.0,2.0,67.0,0.0,4.56541251219,21.5161290323,7.83926277973])
(2.0,[13651.0,11.0,8.0,50.0,0.00366273533075,131.740458574,66.4286132884,1.98,76.26,0.0,4.30942940291,0.955461138378,0.0,0.0,0.0,2.78026611595e-06,145.0,145.0,2.0,96.0,0.0,4.43135478727,11.6734693878,19.406907833])
(2.0,[13867.0,6.0,8.0,48.0,0.00346145525348,131.98341386,66.6828441624,1.97916666667,83.8541666667,0.0,4.28707673609,0.946347443571,0.0,0.0,0.0,2.73192096662e-06,143.0,143.0,2.0,94.0,0.0,4.3688366088,5.53191489362,11.7361979849])
(2.0,[12882.0,10.0,8.0,58.0,0.00450240645862,130.423381463,63.3864306785,1.96551724138,76.7068965517,0.0,3.93103448276,0.938674118926,0.0,0.0,0.0,2.97598853411e-06,143.0,143.0,2.0,94.0,0.0,4.0,8.98245614035,16.8912841929])
(2.0,[258.0,3.0,2.0,76.0,0.294573643411,77.0,80.9263565891,3.0,29.0,0.0,3.75053197533,0.492248062016,0.0,0.259689922481,0.00387596899225,0.000251686298198,77.0,630.0,3.0,29.0,0.0,3.89246375375,2.74666666667,1.87682947784])
(2.0,[14147.0,12.0,8.0,52.0,0.00367569095921,131.023397187,64.6592210363,1.96153846154,79.8461538462,0.0,4.3284491183,0.922032939846,0.0,0.0,0.0,2.69747106693e-06,143.0,143.0,2.0,94.0,0.0,4.42489759102,11.0588235294,19.7974169089])
(2.0,[13970.0,9.0,8.0,70.0,0.0050107372942,131.400501074,66.2702219041,1.98571428571,82.3571428571,0.0,4.29402071493,0.919183965641,0.0,0.0,0.0,2.72380853805e-06,143.0,143.0,2.0,94.0,0.0,4.36589057319,7.26086956522,13.9778833316])
(2.0,[13431.0,8.0,8.0,48.0,0.00357382175564,131.08234681,65.0862184499,1.97916666667,73.8958333333,0.0,4.26146070383,0.951604497059,0.0,0.0,0.0,2.83999416097e-06,133.0,133.0,2.0,84.0,0.0,4.41617048851,7.82978723404,15.2752325593])
(2.0,[13196.0,7.0,8.0,50.0,0.00378902697787,131.38898151,65.7071082146,2.0,84.28,0.0,4.3113658841,0.921718702637,0.0,0.0,0.0,2.88382399676e-06,143.0,143.0,2.0,94.0,0.0,4.39200898112,7.0612244898,12.5004622988])

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：深度学习 dns tunnel检测使用统计特征全连接网络——精度99.8% - Python技术站

深度学习

0 0 打赏

微信扫一扫

支付宝扫一扫

深度学习——无监督，自动编码器——尽管自动编码器与 PCA 很相似，but自动编码器既能表征线性变换，也能表征非线性变换；而 PCA 只能执行线性变换

上一篇 2023年4月12日

训练深度学习网络时候，出现Nan是什么原因，怎么才能避免？——我自己是因为data有nan的坏数据，clear下解决

下一篇 2023年4月12日

【深度学习】注意力机制 – 李宏毅PPT笔记

自注意力机制李宏毅ML-self attention PPT笔记任务：输入一个向量序列： [v1, v2, ……，vn] 输出一个相同长度的向量序列，如词性标注输出一个向量，如情感分析输出任意长度的向量序列， seq2seq 比如考虑序列标注问题，不能孤立看一个个输入的向量，而是要考虑整个序列。 self-attention…

深度学习 2023年4月11日
000
第四范式智能风控中台架构设计及应用

导读：风控是金融最常见的场景之一，本文将从业务和技术架构两个层面和大家探讨如何落地智能风控中台系统。分享主要围绕下面五点展开：风控中台的设计背景策略的全周期管理模型的全周期管理业务架构和能力原子化应用案例– 01 风控中台的设计背景首先大风控体系或者风控中台的建设在本质上是服务于业务的，所以我们需要构建一个以业务为核心的风控中台体系。以业务为…

深度学习 2023年4月12日
000
有关机器学习/深度学习中的英语词汇

machine learning : 机器学习　　　　　　　　　　　　 deep learning : 深度学习 image processing : 图像处理 natural language processing : 自然语言处理 algorithms : 算法 training data set : 训练数据集 facial detection : 面…

深度学习 2023年4月16日
000
跟着彭亮一起学人工智能之深度学习–零基础学人工智能

写在前面：　　最近，跟着彭亮在麦子学院学习人工智能相关的课程，课程讲的很好，所以打算一边学，一边记。这个系列就当是一个学习笔记。希望可以帮助到更多的人。　　注明：此系列课程适用人员为有一定编程基础，最好有Python的编程基础的从业人员，或相关行业的从业人员，另外所有的人工智能相关的课程，无论是深度学习，还是机器学习。或多或少的都需要一定的数学基础。关于…

深度学习 2023年4月10日
000
深度学习

[1天搞懂深度学习] 读书笔记 lecture I:Introduction of deep learning

– 通常机器学习，目的是，找到一个函数，针对任何输入：语音，图片，文字，都能够自动输出正确的结果。 – 而我们可以弄一个函数集合，这个集合针对同一个猫的图片的输入，可能有多种输出，比如猫，狗，猴子等，而我们通过提供大量的training data给这个函数集合，对集合里的各种函数组合的输出进行比对，最后选出一个能输出最佳结果(结果是猫)的组合，那么因为这个组…

2023年4月10日
000
深度学习环境搭建（ubuntu16.04+Titan Xp安装显卡驱动+Cuda9.0+cudnn+其他软件）

一、硬件环境 ubuntu 16.04LTS + windows10 双系统 NVIDIA TiTan XP 显卡（12G）二、软件环境搜狗输入法下载地址显卡驱动：LINUX X64 (AMD64/EM64T) DISPLAY DRIVER (418.56) 下载地址 CUDA：Cuda9.0 下载地址 CUDNN：cuDNN v7.5.0 …

深度学习 2023年4月13日
000
做深度学习应该如何选服务器？NLP、图像等

待补充【参考博客】【https://blog.csdn.net/mergerly/article/details/83753056】【简书的一篇博客】【简书的另一篇博客-讲组装机的】【讲如何搭配深度学习服务器的博客】【其他参考博客1】【博客2】【博客3】【https://bbs.hupu.com/23084290.html】【https://…

深度学习 2023年4月11日
000
Atitit 机器学习算法分类目录 1. 传统的机器学习算法 vs 深度学习 1 1.1. 传统的机器学习算法包括决策树、聚类、贝叶斯分类、支持向量机、EM、Adaboost等等。 2 2. 监

Atitit 机器学习算法分类目录 1. 传统的机器学习算法 vs 深度学习 1 1.1. 传统的机器学习算法包括决策树、聚类、贝叶斯分类、支持向量机、EM、Adaboost等等。 2 2. 监督学习与非监督学习 2 3. 连续型学习跳跃型学习 2 4. 根据学习方式分类 2 4.1. 包括决策树、聚类、贝叶斯分类、支持向量机、EM、Adaboo…

深度学习 2023年4月11日
000

深度学习 dns tunnel检测 使用统计特征 全连接网络——精度99.8%

相关文章

深度学习 dns tunnel检测使用统计特征全连接网络——精度99.8%