Set Theory, Basics
Contents
- What is a Set?
- Cardinality(size)
- Intersections, Unions
- Medical-testing Example
- Visualizing Sets — Venn Diagrams
1. What is a Set?
> A = {1,2,-3}
> E={Apple, monkey, Daniel}
A set is a collection of things. A set is made up of elements. In set A, {1}, {2}, {-3} are elements.
2∈A , it means 2 is an element of A
2. The Cardinality(size)
The cardinality of a set A is the number of elements in it.
> |A| = 4, |E| = 3
3. Intersections and Unions
A = {1, 2, -3, 7}, B = {2, 8, -3, 10}, D = {5, 10}
1)Intersections
A∩B = {2, -3} , in here, ∩ is a intersect.
B∩D = {10}
A∩D = ∅ , where the cardinality of ∅ is 0.
-> A∩B = {x: x∈A and x∈B}
2)Union
A∪B = {1,2,-3,7,8,10} = {x: x∈A or x∈B}, in here, ∪ is a union.
4. Set Theory X Medical Testings
VBS — very bad symptom
If there is set of people, X = set of people in a clinical trial.
S = {x∈X: X has VBS} (genuinely have VBS)
H = {x∈X: X has not have VBS}
X=S∪H , S∩H = ∅
P = {x∈X | X tests positive for VBS} ; doctor said you are positive
N = {x∈X | X tests negative for VBS}
P∪N = everyone , P∩N = ∅
- S∩P : True Positives
- H∩N : True Negatives
- S∩N : False Negatives
- H∩P : False Positives
|S| / |X| = prortion of people in the study who do genuinely have VBS
|H| / |X| = proportion of people in the study who do not have VBS
- |S∩P| / |S| = True positive rate
- |H∩P| / |H| = False positive rate
- |S∩N| / |S| : False Negative rate
- |H∩P| / |H| : True Negative rate
5. Venn Diagrams
X = H∪S = N∪P
# 0. 환경설정 ----------
library(sets)
library(VennDiagram)
library(RAM)
library(eulerr)
# 1. 집합 정의 ---------
A <- LETTERS[1:10]
B <- LETTERS[5:15]
A_set <- as.set(A)
B_set <- as.set(B)
# 2. 집합 크기/기수(Cardinality) -------
length(A) ; length(B)# 3. 기본 집합 연산 --------------------
# 합집합
A_set | B_set# 교집합
A_set & B_set# 두 집합의 상대 여집합의 합(Symmetric Difference)
A_set %D% B_set
# 4. 벤다이어그램 ---------------------
## 4.1. 종합
draw.pairwise.venn(
area1 = length(A_set),
area2 = length(B_set),
cross.area = length(A_set & B_set),
category = c("집합 A", "집합 B"),
cat.pos = c(0, 180),
euler.d = TRUE,
sep.dist = 0.03,
fill = c("light blue", "pink"),
alpha = rep(0.5, 2),
lty = rep("blank", 2)
## 4.2. 원소도 함께 표현
group.venn(list(집합A=A, 집합B=B), label=TRUE,
fill = c("orange", "blue"),
cat.pos = c(0, 0),
lab.cex=1.3)
>> pregnant_TF
test_PN 0 1
0 55 14
1 24 7pregnant_df <- data.frame(pregnant_TF, test_PN)
## 5.2. 시각화 -------------------------
pregnant_fit <- euler(pregnant_df)
plot(pregnant_fit, auto.key = TRUE, counts=TRUE, labels = c("1종 오류", "2종 오류"))
6. Sigma
In the case of developing a prediction model through supervised learning in data science, it can be expressed as a function as follows. That is, a function of mapping (X1, X2, ,, Xn) collective elements belonging to the domain of definition to the revenue R becomes a supervised learning model.
#library(hrbrthemes)
#library(extrafont)
#library(tidyverse)
#loadfonts()
x <- seq(from=-5, to=5, by=0.5)
y <- 2 * x -1
df <- data.frame(x, y)
ggplot(df, aes(x, y)) +
geom_point() +
geom_line() +
stat_function(fun=function(x)x^2, geom="line", aes(colour="square")) +
theme_ipsum_rc(base_family = "NanumGothic") +
theme(legend.position = "none",
axis.line.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.x=element_blank(),
panel.grid.minor.x=element_blank(),
panel.grid.major.x=element_blank(),
panel.grid.minor.y=element_blank(),
panel.grid.major.y=element_blank()) +
geom_vline(xintercept=0) +
geom_hline(yintercept = 0) +
labs(x="", y="") +
annotate("text", 3, 5, vjust = -1, label = "y=2x-1", parse = FALSE) +
annotate("text", 4, 15, vjust = -1, label = "y=x^2", parse = FALSE)
reference.
coursera, Data Science Math Skills of Duke University
https://aispiration.com/statistics/math-for-data-science.html#fn1