Fundamentals of Computational Forensics:
Machine Learning and Predictive Analytics
Carl Stuart Leichter PhD carl.leichter@ntnu.no NTNU Testimon Digital Forensics Group
Computational Forensics: Machine Learning and Predictive Analytics - - PowerPoint PPT Presentation
Fundamentals of Computational Forensics: Machine Learning and Predictive Analytics Carl Stuart Leichter PhD carl.leichter@ntnu.no NTNU Testimon Digital Forensics Group NTNU Testimon Digital Forensics Group Cyber Threat Intelligence and
Carl Stuart Leichter PhD carl.leichter@ntnu.no NTNU Testimon Digital Forensics Group
2
– Malware, IDS, etc
– Digital Forensics, Network Analysis, Big Data, Simulations, etc
ØKOKRIM, KRIPOS, CYFOR, etc
Telenor, NorSIS, mnemonic, KMPG, PWC, etc
3
https://en.wikipedia.org/wiki/Montparnasse_derailment#/media/File:Train_wrec k_at_Montparnasse_1895.jpg
4
5
M1-0
6
– Exploratory Data Analysis (EDA) – Confirmatory Data Analysis (CDA)
that data’s origin (what we are examining)
analysis is the hidden structure we are seeking
M1-1
7
M1-2
8
– HDD file space is now unallocated – Unallocated space partially over-written
headers.
– HTML files
“<“ “>”
– JPGs
the targets.
– they build internal models of the data
M1-3
9
– It. Depends. Upon. Your. Application. – (IDUYA) M1-4
10
Analysis Interpretation Understanding
Data Information Knowledge Wisdom
M1-5
11
M1-6S
12
A&F-0
13
the feature space.
A&F-1
14
– Ash – Pine
and use them to determine the type of wood
1. Overall brightness 2. Wood grain prominence (peak to peak variation)
A&F-2
15
http://www.dannerscabinets.com/blog/m n-custom-cabinet-shop-custom-cabinets/
A&F-3
16
Brightness Grain Prominence
0.3 10
1 7.5
A&F 4
17
If a feature space is a vector space, => All the tools of Linear Algebra can be utilized!
A&F 5
18
– Length (meters, inches, light years) – Weight (grams, pounds, carats) – Time (seconds, years) – Money – Number of Packets – Number of Bytes – Etc
A&F-4
19
– File Size – Data Section Size – Data Entropy – API Calls
Packet Structure
– Packet Size – Data Size – TTL Time – ACK Sequence
– Character distribution – Data Entropy
A&F-5
20
A&F-6
21
– Remove outliers – Reduce noise
– Spectral Analysis – Principal Component Analysis – Independent Component Analysis
– Remove redundant features (CFS) A&F-7
22
T&T-0
23
Preprocessing Learning/Adaptation Internal Model Feature Extraction/Selection Classification/ Regression Preprocessing Feature Extraction/Selection Training Data Testing Data Application Output Evaluation
T&T-1
24
created by the ML adaptation to the training data
– During training, it will adapt to the hidden structure in the data – If the data contains a good representation of the system under study (by implication, the structure in the system) then it will recognize the test data as new data samples from the system
T&T-2
25
Brightness Grain Prominence
1 10
T&T-3
26
Brightness Grain Prominence
1 10
T&T-4
27
Brightness Grain Prominence
1 10
T&T-4
28
IM-1
30
Brightness a2 a1 Grain Prominence
IM-2
b
f(x) = mx + b
31
a2 a1
IM-3
a1 w1
wT=[w1 w2] a = [a1 a2]T f(w, β) = wTa + β
32
Weight Hardness
A&F-8
33
It Depends Upon Your Application!
– Relativistic mechanics not used – Newtonian mechanics
– Relativity correction required IM-4
34
– Perspectives of the data – Knowledge from the data.
– additional transformation – New representation. IM-5
35
– Rules Based Learning
– Regression (Curve Fitting) – Descriptive Statistics
– Normal (Gaussian)
» “Mean” is sometimes called “the norm”
– Uniform – Etc
IM-6
36
RB-0
37
– describing data samples themselves – describing relationships between data samples – describing relationships between data and outputs
http://people.westminstercollege.edu/faculty/ggagne/fall2014/301/chapters/ chapter8/index.html Every skier likes the snow: ∀x Skier(x) => LikesSnow(x) All brothers are siblings: ∀x ∀y Brother(x, y) => Siblings(x, y)
RB-1
38
– Each branch is selected by the answers to a given decision – The descent down the tree is like a series of feature space partitionings – The series of decisions will lead from the root to a specific leaf.
RB-2
39
high normal false true sunny rain No No Yes Yes Yes Outlook Humidity Windy
(Outlook==rain) and (Windy==false) Pass it though the tree
RB-3
40
From Alpaydin, 2010
RB-4
41
OF-0
42
OF-1
43
OF-2
44
Real world system to be modelled Regression estimated model OF-3
45
It measures how well our internal model accounts for the data OF-4
This is an “Objective Function”
46
the learning process
– Sum of Squares (for the regression example) – Mean Square Error (MSE)
– Least Mean Squares (LMS) – Statistical Measurements
– Information Theoretical Metrics
– Negentropy
47
MC-0
48
MC-2 Regression estimated model
49
MC-2
50
3 3
MC-3
51
MC-6
52
MC-7
53
MC-8
54
Preprocessing Learning/Adaptation Internal Model Feature Extraction/Selection Classification/ Regression Preprocessing Feature Extraction/Selection Training Data Testing Data Application Output Evaluation
T&T-1
55
MC-9
56
requirements!
MC-10
57
MC-11
58
MC-12
59
– Need higher dimensionality to get good class separation
Wood Brightness Grain Prominence
MC-13
60
DM-0
61
– Euclidean Distance – Inner Product (Vector Spaces) – Manhattan Distance – Maximum Norm – Mahalanobis Distance – Hamming Distance – Or any metric you define over the space…
DM-1
62
https://www.quora.com/What-is-the-difference-between-Manhattan-and- Euclidean-distance-measures
DM-2
63
y x
DM-3
64
http://www.jennessent.com/arcview/mahalanobis_description.htm
DM-4
65
http://stats.stackexchange.com/questions/62092/bottom-to-top- explanation-of-the-mahalanobis-distance
DM-5
66
U-0
67
U-C-1
68
Packet Size Packet Data Size
U-C-1
69
– 2 Optical Attributes or Features
– Yielded a 2-Dimensional Feature Space – We had SUPERVISED learning:
– We chose our features well, we saw good clustering/separation of the different classes in the features space. U-C-2
70
Brightness Grain Prominence
1 10
U-C-3
71
U-C-3
dairy crops agronomy forestry AI HCI craft missions botany evolution cell magnetism relativity courses agriculture biology physics CS space ... ... ... … (30) www.yahoo.com/Science ... ...
U-C-3
73
U-C-3
74
U-C-4
75
U-C-5
76
U-C-7
77
http://www.cita.utoronto.ca/~murray/GLG130/Exercises/F2.gif
U-C-8
78
http://www.cita.utoronto.ca/~murray/GLG130/Exercises/F2.gif
U-C-9
79
http://www.cita.utoronto.ca/~murray/GLG130/Exercises/F2.gif
U-C-10
80
– Units of time (different units of time?)
– Units of space
– Units of mass (grams, kilograms, tonnes) – Units of $$$
U-C-11
81
http://www.cis.hut.fi/research/som-research/worldmap.html
U-C-12
82
www.cs.hmc.edu/courses/2003/ fall/cs152/slides/som.pdf Map of Labels in Titles From comp.ai.neural-nets-news newsgroup
U-C-13
83
LAS-0
84
– DFS – BFS
– Can Get Stuck in Local Optimal Solution
– Avoids Local Optima
LAS-1
85
– Hashing techniques – String matching (“Murder”)
– Approximate Hashing – Partial strings – Elastic Search
LAS-7
86
ANN-0
87
ANN-1
88
ANN-2
89
ANN-03
90
ANN-4
91
ANN-5
92
ANN-7
93
ANN-8
94
NF-0
95
– (*Soft?)
Logic (FL)
– Crisp – Sharp
NF-1
96
linguistic terms
– Hot (linguistic terms) – Warm – Cold
– Watery (linguistic terms) – Gooey – Soft – Firm – Hard – Crunchy – Crispy
NF-2
97
http://sci2s.ugr.es/keel/links.php
NF-3
98
http://ispac.diet.uniroma1.it/scarpiniti/files/NNs/Less9.pdf
NF-4
99
trained ANN
http://www.scholarpedia.org/article/Fuzzy_neural_network
NF-5
100
intelligence
http://www.scholarpedia.org/article/Fuzzy_neural_network
NF-6
101
SVM-1
102
SVM-2
103
SVM-3
104
SVM-4
105
– Labelled Data
– Unlabelled Data
– Situational Signals from Environment RL-1
106
signal
– Eg: actions that use battery power
most reward
and error
RL-2
107
must prefer actions that it has tried in the past and found to be effective in producing reward.
– But to discover such actions, it has to try actions that it has not selected before. – The agent has to exploit what it already knows in order to obtain reward – But it also has to explore what it doesn’t know order to make better action selections in the future. – RL systems can learn to forgo an immediate reward in favour of maximizing total reward over long term.
Exploitation versus exploration
RL-3
Build different “experts”, and let them vote
EA-1
– (13 out of 25 get it wrong):
25 i 25 13
i i
EA-2
Where We Get All These Different Data Sets Generating new datasets by Bootstrapping
↵ ↵
’
↵
x1 x2 x3 x4 x5 y 187 80 120 30 4.5 160 70 119 36 5.6 150 80 185 60 8.8 1 192 92 140 50 6.8 1 168 110 155 45 7.8 1
’ ’ fi
↵
— — “gr ”
↵
fi – —
↵
’
↵
187 80 120 30 4.5 160 70 119 36 5.6
’ ’ fi
↵
— — “gr ”
↵
fi – —
↵
’
↵
150 80 185 60 8.8 1
’ ’ fi
↵
— — “gr ”
↵
fi – —
↵
’
↵
150 80 185 60 8.8 1
’ ’ fi
↵
— — “gr ”
↵
fi – —
↵
’
↵
168 110 155 45 7.8 1
’ ’ fi
↵
— — “gr ”
↵
fi – —
↵
’
↵
168 110 155 45 7.8 1
’ ’ fi
↵
— — “gr ”
↵
fi – —
↵
’
↵
150 80 185 60 8.8 1
’ ’ fi
↵
— — “gr ”
↵
fi – —
↵
’
↵
160 70 119 36 5.6
’ ’ fi
↵
— — “gr ”
↵
fi – —
↵
’
↵
192 92 140 50 6.8 1 168 110 155 45 7.8 1
’ ’ fi
↵
— — “gr ”
↵
fi – —
↵
’
↵
160 70 119 36 5.6
’ ’ fi
↵
— — “gr ”
↵
fi – —
↵
fi fi ↵ —w fi fi ↵ ↵
N = 4 N = 3
EA-3
EA-4
EA-5
EA-6
EA-7
EA-7
EA-8
EA-9
118
PE-0
119
PE-1
120
“generalize”
– It will converge on the hidden structure in the data – If the data contains a good representation of the system under study (by implication, the structure in the system) PE-2
121
labeled as positive are actually positive
classifier label as positive?
121
Should have been positives
PE-3
Actual class\Predicted class C1 ¬ C1 C1 True Positives (TP) False Negatives (FN) ¬ C1 False Positives (FP) True Negatives (TN)
122
Actual class\Predicted class buy_computer = yes buy_computer = no Total buy_computer = yes 6954 46 7000 buy_computer = no 412 2588 3000 Total 7366 2634 10000 PE-4
123
PE-5
124
– ML “introspection” of learning performance in training – Used to evaluate training performance
– Used to evaluate testing performance – BEWARE OF TRAINING BY OTHER MEANS PE-6
125
AT-0
126
β θ
AT-1
127
– File Size – Data Section Size – Data Entropy – API Calls <-Bytes (integer) <-Proportion (real) <- Dimensionless (real) <- (Strings?) (Hex) AT-2
128
Mean (μx) of Original Data Standard Deviation (σx) of Original Data
– eg: Correlation analysis
NB: variance = σx
2
AT-3
Macready 1997
necessarily a good approach for solving other types.
– Different basic body types – Divergent regimes of training and adaptation designed for adaptation to execute a specific task
– Same reasons
– Simplex HC for facial biometrics – GA for iris biometrics
AT-4
131