String Class

In Ruby strings are sequences of characters, but can also be used to store binary data as sequences of bytes. A string can be created from a sequence of characters:"characters")

Ruby has a complete set of functions to deal with strings. Strings are basic objects in Ruby, inherit Objects and include the Comparable module, which implements the basic operator for comparison: < ; > ; <= ; >=, == ; between?.

String comparison is based on the order of characters in in the ASCII sequence, where there are: first numbers, then uppercase letters, then lowercase ones. If the first characters of two strings are the same, then the longer string is considered greater:

'0'<'A' => true
'A'<'a' => true
'a'<'b' => true

'azzz'<'baaa' => true # the first different characters matters!
'aab'<'abb'   => true
'abc'<'abc0'  => true # the second string is longer

String Encoding

In Ruby each string has it's own encoding, which can be obtained by the method: encoding; the default encoding is UTF-8 for Ruby version 2, US-ASCII in Ruby 1.9; all strings where ASCII in Ruby 1.8.

The encodings are described by the Encode class. The class method Encode.list return an array with the list of all the available encodings.

In a source file the encoding of the file can be specified in the first lines:

# coding: utf-8

Some operations between strings can't be done if their encoding is not compatible; the encode method can be used to change the encoding of a string; this method has options to deal with undefined or invalid characters in the new encoding and to change the final newline character; there is also a force_encodig method that sets the encoding property of a string:

'a'.encoding => #<Encoding:UTF-8>


b.encoding   => #<Encoding:ISO-8859-1>

b.encode!("UTF-8")             # encode! changes the string on-place

b.encoding   => #<Encoding:UTF-8>

"abcd".encode("UTF-8", undef:   :replace, replace: "X") # "X" replaces undefined characters
"abcd".encode("UTF-8", invalid: :replace, replace: "X") # "X" replaces invalid characters

b.force_encoding("UTF-8")             # tells to Ruby that b is an UTF-8 string,
b.encoding    => #<Encoding:UTF-8>    # but 'b' is not changed

Double-quoted String

Strings can be created from characters between double quotes;

Es.: a="stringa"

An alternative syntax is: %Q( string ) ; where the string inside parentheses can contain double quotes, but not the parentheses, which are used as a string delimiter. You can use the delimiters you like instead of parentheses, but if you use parentheses the initial and final delimiters must match: {} [] () <>

a=%Q(abcd\n)       # () are used as delimiters
a=%QZ 12 ef(), Z   # 'Z' is used as a delimiter

double quoted strings can extend over many lines, preserving the newline character. The backslash can be used to escape the final newline, to effectively join two lines.

Between double quotes all the usual backslash substitutions are performed:

\n end of line
\b backspace
\e escape
\s space
\t \v tabs
\f \r form-feed, return
\hhh ottale
\xhh exadecimal
\uxxxx unicode
\C-x control-x sequence
\M-x meta-char sequence

Single-quoted Strings

Strings can be created from characters between single quotes, when single quotes are used only some backslash substitutions are performed: "\\" and "\`" .

Es.: a='stringa'

an alternative syntax is: %q( string ) ; the string inside parentheses can contain single quotes, but not parentheses You can use the delimiter you like instead of parentheses, but if you use parentheses the initial and final delimiters must match: {} [] () <>

a=%q(stringa)            # () are used as delimiters
a=%qA xyx string aad A   # The letter 'A' is used as a delimiter

A string between single quotes can extend over many lines , the end of line is not escaped, but inserted into the string as "\n"

String Operators

In the following table a list of some methods of the String class.

+ concatenates strings: "a"+"b" => "ab"
* repeats strings: "abc" * 2 => "abcabc"
<< concatenates strings: "a"<<"b" => "ab"
ascii_only? true if only ascii characters
empty? true if empty
end_with?("string") true if end with the given string
include?("substring") test if substring included: 'abc'.include?(b)=> true
index("substring") index of a given substring: 'abc'.index('b') => 1
rindex("substring") index of a given substring starting from the end
insert(index,string) substring insertion: "abc".insert(1,"xx")=>"axxbc"
split(pattern) splits into an array, default pattern is a space
capitalize ; capitalize! makes the first character uppercase
upcase ; upcase! to upper cases; upcase! changes string in place
downcase ; downcase! to lowercase
swpacase ;swapcase! upper case to lower and lower to uppercase
sub(pattern,replacement) first occurrence substring replacement
gsub(pattern,replacement) all occurrence substring replacement
tr('old char','new') .tr! change characters, as the "tr" Unix command
center(n," ") centers in n characters, specifying the padding character
ljust(n," ") shifted to left, in n characters, padded with space
rjust(n," ") shifted to right
lstrip ; lstrip! strip leading spaces
rstrip ; rstrip! strip final spaces
strip ; strip! strip final and leading spaces
squeeze(characters) ;squeeze eliminates duplicates for the given characters
reverse ; reverse! reverse the string
clear empties the string
replace(newstring) replaces the string with a new one
chomp ; chomp! strips the final end of line, if present
encoding returns the string encoding
valid_encoding? if a valid encoding
encode("iso-8859-1") ; encode! re-encode the string in the given encoding
force_encoding("utf-8") tell the encoding to Ruby
to_i ; to_f conversion to numbers
length ; bytesize length in characters or bytes
getbyte(num) get a single byte at a given position
setbyte(num) set a single byte at a given position
bytes.to_a byte contents: "ab".bytes.to_a =>[97, 98]
count("substring") counts how many times the substring is found
count("a-c") count characters
delete("chars") delete!("b") delete characters
crypt crypt the string using the operating system function
sum computes a simple checksum for the string
next ; succ next in the ascii sequence: "a".next => "b"
ord encoding number of first character : "ab".ord => 97

If the argument of the "<<" operator is a number it is intended as the the numeric code of a character in the encoding of the string; the corresponding character is appended to the string:

"a"<<"b" => "ab"
"a"<<98  => "ab"