Introduction to C: Reading a Line of Text

Video Walkthrough

Getting Started

In the previous lessons, you've learned how to write a simple C program, and compile it with a Makefile. Let's move on to the next thing you're likely to want to do: read input.

The C standard includes a number of ways to read input, and several libraries exist to make it easier to write interactive programs that read input, such as readline, libedit, and linenoise.

C provides low-level functions such as read() and fread(), but these can be cumbersome to use if your aim is to simply read text one line at a time. As of the release of the POSIX.1-2008 standard, the getline() function can be used for this purpose. For this lesson, I've chosen to focus on that.

Since this function might not be available on all platforms you may want to port your software to (I haven't checked if it's available on Windows, for example), in a future lesson I'll show how to move the function for reading a line of text to a self-contained .c file, so that it can be isolated from the rest of your code.

Declaring which POSIX API version you want to use

Before using a POSIX API, you must give a hint to the compiler about which version of the API you plan to use. In order to use the getline() function, you need to include the following line in your .c file, before any headers are included:

#define _POSIX_C_SOURCE 200809L

This will cause any headers you later include to provide the POSIX APIs for the version you specify. If you forget to do this, you will see an error similar to the following when attempting to compile your code:

error: implicit declaration of function getline

With that out of the way, let's look at how use getline()!

The getline() API

Let's see what we need in order to call getline(). You can look at the documentation by typing man getline in your terminal. In the synopsis, you'll see that you must #include <stdio.h> in order to use the function. We've already got that covered, so let's look at the definition of the function:

ssize_t getline(char **lineptr, size_t *n, FILE *stream);

This tells us that the getline() function will return a variable of type ssize_t. In C, the size_t type is used to indicate the size of a data structure (in bytes). Correspondingly, ssize_t is a signed version of the size_t type.

You might want to use ssize_t instead of size_t in cases where the size can be negative (such as to indicate an error condition), or if you want to seek backward in a file.

The first parameter getline() needs is the char **lineptr. In C, declaring a variable with a * indicates that it is a pointer. Declaring a variable with a ** indicates that it is a pointer to a pointer. A pointer indicates where in memory the relevant value can be found.

Pointers in C are a complicated topic, so we'll discuss pointers in more detail in a later lesson. In this context, all you need to know is that using a pointer-to-a-pointer allows getline() the flexibility to either use memory that you have previously allocated for it, or allocate memory itself.

The second parameter for getline() is a size_t *n. This parameter serves a dual purpose. It both allows you (the caller) to indicate to getline() how much memory was pre-allocated, and allows getline() to communicate how much memory it allocated (if it made an allocation on your behalf).

The third and final parameter is a FILE *stream. A stream is a description of a file, or file-like object, plus additional metadata that can allow seeking and buffering within it. C provides functions that can be used with streams (for example, fopen(), fclose(), fread(), fwrite(), fprintf() and fscanf(), ...) which allow working with streams. We'll cover those in another lesson.

Let's Use getline()

For this example, we'll change our main() function as follows:

int main(int argc, char* argv[])
{
    char* line = NULL;
    size_t buffer_size = 0;
    ssize_t count;

    printf("What is your name?\n");
    count = getline(&line, &buffer_size, stdin);
    printf("Hello, %s!\n", line);

    return 0;
}
Declaring Variables

For readability purposes, you may often see variables declared at the top of a scope in C.

More generally, to declare a variable in C, use the format: <type> <name> [= <initial-value>]; If you want to declare multiple variables with the same type, you can use a comma-separated list after the type.

We will need the char* line pointer to store a pointer to the line of text we read. We'll set it to NULL here so that getline() will allocate a buffer for us. The buffer_size will be passed into getline() to store the size of the buffer it allocates. Finally, we'll use count to store the return value from getline() (which will contain the length of the string read in).

Making the Call

Finally, now that we have our variables declared, we can call getline() to read in a line of text:

count = getline(&line, &buffer_size, stdin);
Address Operator

Notice the & that has been used on each of the first two variables. This is the address operator.

Its usage in the first parameter,&line will result in the memory address of the line variable we declared. In other words, rather than passing in NULL, we will pass in the address of the pointer - thus creating the pointer to a pointer (char**) requested by the function signature.

Its usage in the second parameter, &buffer_size, will get the address of the size_t buffer_size variable. This converts it from a variable of type size_t to a variable of type size_t*.

Standard Input/Output Streams

With the inclusion of stdio.h, (as you can see if you look at the manual page; man stdio) three standard input/output streams are available for use:

FILE *stdin;
FILE *stdout;
FILE *stderr;

We've made use of stdin (the standard input stream) in the call to getline().

Normally, stdin is used to read input (either from a user at the keyboard, piped in from another program). stdout is used for "normal" program output, and stderr is used to print errors or debug messages. By using a separate stream for printing errors, they will not interfere with normal program output.

Printing the Result

Earlier we added the following line to our code:

printf("Hello, %s!\n", line);

What we'd like to happen here is, if I enter Mike and press Enter, I'll see the text Hello, Mike! when the program runs.

Format Strings

Functions such as printf() support the concept of format strings. By including a percentage sign (%) followed by an alphanumeric code, printf() recognizes that it should print something in place of the format string. Some commonly-used format strings include: * %d - Prints a decimal integer (int) * %s - Prints a string of characters (terminated by a '\0' character) * %u - Prints an unsigned integer (unsigned int)

Wikipedia features a good article with more complete list.

Running the Code

Let's run what we have so far.

$ make
cc -Wall -Werror -std=c11 -o main main.c
$ ./main
What is your name?
Mike
Hello, Mike
!

It looks like there's a bug: the exclamation point appears on the next line!

The reason for this is because getline() preserves the linefeed at the end of the user's input.

Bug Fix: Removing the Linefeed

The fix for this problem is simple enough: remove the linefeed before printing the string. To do this, we can add a line of code following the getline() as follows:

line[count-1] = '\0';

Remember that count contains the number of characters read in by getline(). As we discovered, that turned out to be one too many (the linefeed). This line of code will replace the linefeed with the \0 character (which indicates the end of the string).

Array Subscripting

Statements in the form<variable>[<value>] are array subscripts in C. Our char* line variable is a pointer to the first character within the larger buffer allocated by getline(). The first character in the array is considered the 0th character and can be accessed by using line[0]. Therefore, if the user just presses Enter, a single character will be in the line buffer (\n), followed by the \0 character. In that case, count will be 1, and setting line[count-1] = '\0' will overwrite the linefeed, resulting in a zero-length string.

You might be wondering why we can use an array subscript on a variable we have defined to be a pointer (not an array). In C, array subscripts are equivalent to pointer addition, which is a separate discussion. In short, the line we wrote is equivalent to writing *(line + count - 1) = '\0';.

Memory Allocation

In C, (unlike other languages, which often feature automatic "garbage collection" when unused objects are detected) memory allocation is the responsibility of the programmer. The standard library functions malloc() and free() are typically used to allocate and free memory.

The getline() manual page, it states the following:

If *lineptr is set to NULL and *n is set 0 before the call, then getline() will allocate a buffer for storing the line. This buffer should be freed by the user program even if getline() failed.

Memory Leaks

Due to our failure to free() the buffer, we have created a memory leak. We must free() the line, so that the memory can be re-used. To do so, we will add a line as follows, before we return:

free(line);

According to the free manual page (see man 3 free), we will also need to #include <stdlib.h> in order to use the free() function. So we'll add that #include statement at the top of our file, as well.

In practice, all memory allocated by our program will be freed when the program ends, so it's not a major concern for a small program like ours. However, in a larger and longer-running program, the memory usage of our program would grow uninhibited over time, eventually exhausting resources and potentially causing an unexpected failure.

Examining getline()'s Other Output

By now, we've printed the line of text we read from the user. What if we also want to print out the other results, such as the number of characters entered, and the size of the allocated buffer? Let's take a quick look at a couple other printf() statements that will do that for us.

printf("Allocated %zu characters.\n", buffer_size);
printf("Read %zd characters.\n", count);

Note: The example code I provided for this lesson contains a mistake. I used %zu to print the count rather than %zd. This will not print the correct value for count if the count was negative (which would happen if there was an error in getline()).

Conclusion

By now, you've learned how to declare variables, use the address operator to create a pointer to a variable, read a line of text, modify a buffer using an array subscript operator, use basic format strings with printf(), and use free() to reclaim memory that has been allocated.

In the next lesson, we'll expand our Makefile so that we can split our code into multiple files.