Process API

Concepts/OS

Process API

ocwokocw 2024. 5. 15. 00:29

- 출처: Operating System - Three Easy Pieces

- 개요

UNIX 시스템은 fork()와 exec() system call 을 통해 process를 생성한다. 또한 wait() 을 사용하면 process 를 대기할 수도 있다. 이와 관련된 실질적인 예제를 더 자세히 살펴보면서 실제 이런 system call 들이 어떤 방식으로 사용되는지 감을 익혀보자.

- fork() system call

fork() system call 은 새로운 process를 생성하는데 사용된다.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
    printf("hello world (pid:%d)\n", (int) getpid());
    int rc = fork();
    if (rc < 0) { // fork failed; exit
        fprintf(stderr, "fork failed\n");
        exit(1);
    } else if (rc == 0) { // child (new process)
        printf("hello, I am child (pid:%d)\n", (int) getpid());
    } else { // parent goes down this path (main)
        printf("hello, I am parent of %d (pid:%d)\n", rc, (int) getpid());
    }
    return 0;
}

fork()는 수행하면 부모와 자식으로 나누어져 process가 실행된다. 이때 자식의 process 라면 fork() 함수가 0을 반환하고, 부모라면 자식의 pid를 반환 받는다.

prompt> ./p1
hello world (pid:29146)
hello, I am parent of 29147 (pid:29146)
hello, I am child (pid:29147)

위의 결과는 아래와 같은 과정으로 인해 나타나게 된다.

process가 pid와 함께 hello world를 출력
생성된 process(child)는 호출 프로그램의 복사본이다. 새로 생성된 process는 main() 진입점 부터가 아닌 fork()를 호출한것처럼 행동한다. 이때 생성된 process는 완벽한 복제본이 아니다. 즉, 자신만의 address space, register, PC를 가지므로 부모와 자식의 fork()의 반환값이 다르다.

그런데 사실 p1 프로그램은 비-결정적(non-determinism)이다. 그래서 위와 다르게 자식이 먼저 실행되고 부모가 그 이후에 실행될 수 있다. 만약 CPU 가 1개 라면 부모와 자식 중 택1 해서 먼저 실행해야하기 때문이다. 그렇다면 어떤 조건이 주어진다면 어떤게 먼저 실행될지 예측할 수 있을까?

CPU 스케줄러는 복잡해서 어떤게 먼저 수행될것이라고 단정할 수 없다. 이러한 비-결정적인 동작때문에 multi-threaded 프로그램에서 예상치 못한 문제를 일으키기도 한다.

- wait() system call

때로는 자식 process 가 끝나는것을 기다려야 하는 상황이 있다. wait() system call 을 사용하면 자식 process 의 수행을 기다리므로 비-결정적으로 동작하던 이전 예제 프로그램의 동작을 결정적으로 만들 수 있다.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main(int argc, char *argv[]) {
    printf("hello world (pid:%d)\n", (int) getpid());
    int rc = fork();
    
    if (rc < 0) {         // fork failed; exit
        fprintf(stderr, "fork failed\n");
        exit(1);
    } else if (rc == 0) { // child (new process)
        printf("hello, I am child (pid:%d)\n", (int) getpid());
    } else {              // parent goes down this path (main)
        int wc = wait(NULL);
        printf("hello, I am parent of %d (wc:%d) (pid:%d)\n", rc, wc, (int) getpid());
    }
    
    return 0;
}

위의 수정된 버전의 프로그램을 실행하면 항상 자식이 먼저 출력된다.

자식이 먼저 실행되면 출력되고 난 후 부모가 실행된다.
부모가 먼저 실행되면 wait() system call 로 인해 자식의 수행을 먼저 기다리게 되고 이후 부모가 실행된다.

- exec() system call

exec() system call은 호출중인 프로그램과 다른 프로그램을 실행하고 싶을 때 유용하다. 간혹 exec()와 fork() 를 헷갈리기도 하는데 fork()의 경우 같은 프로그램의 복사본을 실행한다는 점에서 다르다.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>

int main(int argc, char *argv[]) {
    printf("hello world (pid:%d)\n", (int) getpid());
    int rc = fork();
    
    if (rc < 0) { // fork failed; exit
        fprintf(stderr, "fork failed\n");
        exit(1);
    } else if (rc == 0) { // child (new process)
        printf("hello, I am child (pid:%d)\n", (int) getpid());
        char *myargs[3];
        myargs[0] = strdup("wc"); // program: "wc" (word count)
        myargs[1] = strdup("p3.c"); // argument: file to count
        myargs[2] = NULL; // marks end of array
        execvp(myargs[0], myargs); // runs word count
        printf("this shouldn’t print out");
    } else { // parent goes down this path (main)
        int wc = wait(NULL);
        printf("hello, I am parent of %d (wc:%d) (pid:%d)\n",
               rc, wc, (int) getpid());
    }
    
    return 0;
}

prompt> ./p3
hello world (pid:29383)
hello, I am child (pid:29384)
      29     107    1030 p3.c
hello, I am parent of 29384 (wc:29384) (pid:29383)
prompt>

위는 fork()와 exec()를 조합하여 단어 수를 세는 프로그램 "wc"를 수행하는 예제코드와 결과이다. 위의 예제를 완벽히 이해하기 위해서는 execvp() 의 동작을 좀 더 자세히 이해할 필요가 있다.

execvp()는 주어진 이름의 프로그램을 실행하는데 아래와 같은 과정을 거친다.

실행하려는 프로그램의 코드(static data)를 불러온다.
현재 code segment를 해당 프로그램 코드로 변경한다.
heap, stack, 프로그램의 메모리 공간이 다시 초기화된다.
새로운 프로세스를 실행하는것이 아니라 현재 실행중인 프로그램을 다른 프로그램으로 변환한다.

그래서 execvp() 뒤의 printf 문을 실행되지 않는다.

- 왜 API를 사용해야 하는가?

왜 process 하나 생성하는 간단한 작업을 꼭 이런 interface를 통해서 수행해야 하는가?

fork()와 exec()를 분리하는것은 UNIX shell 에서 필수적인데, fork() 후 exec() 를 수행사기 전에 shell 이 코드를 수행하도록 하기 위해서이다.

shell 은 사용자의 입력을 대기한다. 만약 어떤 프로그램을 실행하라는 명령을 내리면 실행 파일 위치를 특정하고 fork()를 통해 새로운 process 를 실행한다. 그리고 실행완료까지 wait() 을 호출하여 대기한다.

prompt> wc p3.c > newfile.txt

위는 wc 프로그램의 결과를 newfile.txt 로 리다이렉션 하는 예제이다. 이 명령어가 실행되는 과정은 아래와 같다.

자식 process 가 생성된다.
exec() 를 호출하기 전에 shell 은 표준 출력을 닫는다.
newfile.txt 파일을 연다.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/wait.h>

int main(int argc, char *argv[]) {
    int rc = fork();
    
    if (rc < 0) { // fork failed; exit
        fprintf(stderr, "fork failed\n");
        exit(1);
    } else if (rc == 0) { // child: redirect standard output to a file
        close(STDOUT_FILENO);
        open("./p4.output", O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU);

        // now exec "wc"...
        char *myargs[3];
        myargs[0] = strdup("wc");
        myargs[1] = strdup("p4.c");
        myargs[2] = NULL;
        execvp(myargs[0], myargs);
    } else { // parent
        int wc = wait(NULL);
    }
    
    return 0;
}

위의 코드는 wc 의 결과를 파일로 리다이렉션 하는 예제를 코드로 구현한것이다. 앞서 설명과 마찬가지로

fork() 를 수행한다.
부모는 wait() system call로 자식 process를 기다린다. (혹은 자식 process가 먼저 실행된다.)
자식 process 에서 표준 출력을 닫고, p4.output 이라는 파일을 연다.
execvp()로 wc 프로그램을 실행한다.