Set Theory, Basics

Yoonseul Choi
4 min readSep 25, 2022

--

Contents

  • What is a Set?
  • Cardinality(size)
  • Intersections, Unions
  • Medical-testing Example
  • Visualizing Sets — Venn Diagrams

1. What is a Set?

> A = {1,2,-3}

> E={Apple, monkey, Daniel}

A set is a collection of things. A set is made up of elements. In set A, {1}, {2}, {-3} are elements.

2∈A , it means 2 is an element of A

2. The Cardinality(size)

The cardinality of a set A is the number of elements in it.

> |A| = 4, |E| = 3

3. Intersections and Unions

A = {1, 2, -3, 7}, B = {2, 8, -3, 10}, D = {5, 10}

1)Intersections

A∩B = {2, -3} , in here, ∩ is a intersect.

B∩D = {10}

A∩D = ∅ , where the cardinality of ∅ is 0.

-> A∩B = {x: x∈A and x∈B}

2)Union

A∪B = {1,2,-3,7,8,10} = {x: x∈A or x∈B}, in here, ∪ is a union.

4. Set Theory X Medical Testings

VBS — very bad symptom

If there is set of people, X = set of people in a clinical trial.

S = {x∈X: X has VBS} (genuinely have VBS)

H = {x∈X: X has not have VBS}

X=S∪H , S∩H = ∅

P = {x∈X | X tests positive for VBS} ; doctor said you are positive

N = {x∈X | X tests negative for VBS}

P∪N = everyone , P∩N = ∅

  • S∩P : True Positives
  • H∩N : True Negatives
  • S∩N : False Negatives
  • H∩P : False Positives

|S| / |X| = prortion of people in the study who do genuinely have VBS

|H| / |X| = proportion of people in the study who do not have VBS

  • |S∩P| / |S| = True positive rate
  • |H∩P| / |H| = False positive rate
  • |S∩N| / |S| : False Negative rate
  • |H∩P| / |H| : True Negative rate

5. Venn Diagrams

X = H∪S = N∪P

# 0. 환경설정 ----------
library(sets)
library(VennDiagram)
library(RAM)
library(eulerr)

# 1. 집합 정의 ---------
A <- LETTERS[1:10]
B <- LETTERS[5:15]

A_set <- as.set(A)
B_set <- as.set(B)

# 2. 집합 크기/기수(Cardinality) -------

length(A) ; length(B)
# 3. 기본 집합 연산 --------------------
# 합집합
A_set | B_set
# 교집합
A_set & B_set
# 두 집합의 상대 여집합의 합(Symmetric Difference)
A_set %D% B_set
# 4. 벤다이어그램 ---------------------
## 4.1. 종합

draw.pairwise.venn(
area1 = length(A_set),
area2 = length(B_set),
cross.area = length(A_set & B_set),
category = c("집합 A", "집합 B"),
cat.pos = c(0, 180),
euler.d = TRUE,
sep.dist = 0.03,
fill = c("light blue", "pink"),
alpha = rep(0.5, 2),
lty = rep("blank", 2)
## 4.2. 원소도 함께 표현
group.venn(list(집합A=A, 집합B=B), label=TRUE,
fill = c("orange", "blue"),
cat.pos = c(0, 0),
lab.cex=1.3)
>>        pregnant_TF
test_PN 0 1
0 55 14
1 24 7
pregnant_df <- data.frame(pregnant_TF, test_PN)

## 5.2. 시각화 -------------------------
pregnant_fit <- euler(pregnant_df)

plot(pregnant_fit, auto.key = TRUE, counts=TRUE, labels = c("1종 오류", "2종 오류"))

6. Sigma

In the case of developing a prediction model through supervised learning in data science, it can be expressed as a function as follows. That is, a function of mapping (X1, X2, ,, Xn) collective elements belonging to the domain of definition to the revenue R becomes a supervised learning model.

#library(hrbrthemes)
#library(extrafont)
#library(tidyverse)
#loadfonts()

x <- seq(from=-5, to=5, by=0.5)
y <- 2 * x -1

df <- data.frame(x, y)

ggplot(df, aes(x, y)) +
geom_point() +
geom_line() +
stat_function(fun=function(x)x^2, geom="line", aes(colour="square")) +
theme_ipsum_rc(base_family = "NanumGothic") +
theme(legend.position = "none",
axis.line.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.x=element_blank(),
panel.grid.minor.x=element_blank(),
panel.grid.major.x=element_blank(),
panel.grid.minor.y=element_blank(),
panel.grid.major.y=element_blank()) +
geom_vline(xintercept=0) +
geom_hline(yintercept = 0) +
labs(x="", y="") +
annotate("text", 3, 5, vjust = -1, label = "y=2x-1", parse = FALSE) +
annotate("text", 4, 15, vjust = -1, label = "y=x^2", parse = FALSE)

reference.

coursera, Data Science Math Skills of Duke University

https://aispiration.com/statistics/math-for-data-science.html#fn1

--

--

Yoonseul Choi
Yoonseul Choi

Written by Yoonseul Choi

Data Scientist, AI/DX Team, Mediplus Solution Co., Ltd. Master's degree of Statistics at Hanyang University. R / Python. Based in Seoul.

No responses yet