Set Theory, Basics

Yoonseul Choi

4 min readSep 25, 2022

Contents

What is a Set?
Cardinality(size)
Intersections, Unions
Medical-testing Example
Visualizing Sets — Venn Diagrams

1. What is a Set?

> A = {1,2,-3}

> E={Apple, monkey, Daniel}

A set is a collection of things. A set is made up of elements. In set A, {1}, {2}, {-3} are elements.

2∈A , it means 2 is an element of A

2. The Cardinality(size)

The cardinality of a set A is the number of elements in it.

> |A| = 4, |E| = 3

3. Intersections and Unions

A = {1, 2, -3, 7}, B = {2, 8, -3, 10}, D = {5, 10}

1)Intersections

A∩B = {2, -3} , in here, ∩ is a intersect.

B∩D = {10}

A∩D = ∅ , where the cardinality of ∅ is 0.

-> A∩B = {x: x∈A and x∈B}

2)Union

A∪B = {1,2,-3,7,8,10} = {x: x∈A or x∈B}, in here, ∪ is a union.

4. Set Theory X Medical Testings

VBS — very bad symptom

If there is set of people, X = set of people in a clinical trial.

S = {x∈X: X has VBS} (genuinely have VBS)

H = {x∈X: X has not have VBS}

X=S∪H , S∩H = ∅

P = {x∈X | X tests positive for VBS} ; doctor said you are positive

N = {x∈X | X tests negative for VBS}

P∪N = everyone , P∩N = ∅

S∩P : True Positives
H∩N : True Negatives
S∩N : False Negatives
H∩P : False Positives

|S| / |X| = prortion of people in the study who do genuinely have VBS

|H| / |X| = proportion of people in the study who do not have VBS

|S∩P| / |S| = True positive rate
|H∩P| / |H| = False positive rate
|S∩N| / |S| : False Negative rate
|H∩P| / |H| : True Negative rate

5. Venn Diagrams

X = H∪S = N∪P

# 0. 환경설정 ----------
library(sets)
library(VennDiagram)
library(RAM)
library(eulerr)

# 1. 집합 정의 ---------
A <- LETTERS[1:10]
B <- LETTERS[5:15]

A_set <- as.set(A)
B_set <- as.set(B)

# 2. 집합 크기/기수(Cardinality) -------

length(A) ; length(B)# 3. 기본 집합 연산 --------------------
# 합집합
A_set | B_set# 교집합
A_set & B_set# 두 집합의 상대 여집합의 합(Symmetric Difference)
A_set %D% B_set

# 4. 벤다이어그램 ---------------------
## 4.1. 종합

draw.pairwise.venn(
  area1 = length(A_set),
  area2 = length(B_set),
  cross.area = length(A_set & B_set),
  category = c("집합 A", "집합 B"),
  cat.pos = c(0, 180),
  euler.d = TRUE,
  sep.dist = 0.03,
  fill = c("light blue", "pink"),
  alpha = rep(0.5, 2),
  lty = rep("blank", 2)
## 4.2. 원소도 함께 표현
group.venn(list(집합A=A, 집합B=B), label=TRUE, 
           fill = c("orange", "blue"),
           cat.pos = c(0, 0),
           lab.cex=1.3)

>>        pregnant_TF
test_PN  0  1
      0 55 14
      1 24  7pregnant_df <- data.frame(pregnant_TF, test_PN)

## 5.2. 시각화 -------------------------
pregnant_fit <- euler(pregnant_df)

plot(pregnant_fit, auto.key = TRUE, counts=TRUE, labels = c("1종 오류", "2종 오류"))

6. Sigma

In the case of developing a prediction model through supervised learning in data science, it can be expressed as a function as follows. That is, a function of mapping (X1, X2, ,, Xn) collective elements belonging to the domain of definition to the revenue R becomes a supervised learning model.

#library(hrbrthemes)
#library(extrafont)
#library(tidyverse)
#loadfonts()

x <- seq(from=-5, to=5, by=0.5)
y <- 2 * x -1

df <- data.frame(x, y)

ggplot(df, aes(x, y)) +
  geom_point() +
  geom_line() +
  stat_function(fun=function(x)x^2, geom="line", aes(colour="square")) +
  theme_ipsum_rc(base_family = "NanumGothic") +
  theme(legend.position = "none",
        axis.line.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.x=element_blank(),
        panel.grid.minor.x=element_blank(),
        panel.grid.major.x=element_blank(),
        panel.grid.minor.y=element_blank(),
        panel.grid.major.y=element_blank()) +
  geom_vline(xintercept=0) +
  geom_hline(yintercept = 0) +
  labs(x="", y="") +
  annotate("text", 3, 5, vjust = -1, label = "y=2x-1", parse = FALSE) +
  annotate("text", 4, 15, vjust = -1, label = "y=x^2", parse = FALSE)

reference.

coursera, Data Science Math Skills of Duke University

https://aispiration.com/statistics/math-for-data-science.html#fn1