[Coco] INSTR question
Mathieu Bouchard
matju at artengine.ca
Sun Feb 19 13:23:03 EST 2017
Ruby's .index and =~ return nil, which is worth false in conditionals, but is
distinct from it. nil, false and true are not of a number type. Also, Ruby's
zero counts as a true value, so that a match at position zero counts as true.
Perl's .index returns -1. Perl has undef, which is similar to Ruby's nil and
false, but is not used here. You're supposed to put <0 in the condition as in C
language. Perl's =~ returns undef when not matching, and then you're not
supposed to use the @- array.
Python's index throws a ValueError. A thrown value is not a return value, and to
avoid aborting the program with "ValueError: substring not found", you have to
do the equivalent of a temporary ON ERROR for just that error type or any
broader category of errors:
try:
print "ABC".index("Z")
except ValueError:
print "not found !"
Python's re.search returns None, which is like Ruby's nil. if you call .start()
on it, you'll abort with "AttributeError: 'NoneType' object has no attribute
'start'". Normally you'd use an if-statement for avoiding that, not a
try-statement.
Java's indexOf returns -1. Note that indexOf is constrained to returning
integers, therefore it can't return null (equivalent of Ruby's nil).
Javascript's indexOf returns -1 too, even though the language doesn't restrict
types.
C's strstr is constrained to returning a pointer. You're supposed to check
whether the pointer is NULL (which is zero), before subtracting s to find the
index.
C++ STL's find is constrained to returning an unsigned integer. It returns
std::string::npos, which is equal to the largest unsigned integer, which is
really just a -1 in disguise (the return type of find doesn't allow negatives).
Unix shell's grep is a program that produces a text stream (pseudo-file) that
lists lines that match. When there is no match, the output is empty (the stream
will start with an EOF). You can also check the exit code of grep, which is 1
when not found, 0 when found (0 is the true value, as in "no error"). The pipe
symbol is as in OS9.
Tcl's "string first" returns -1. Tcl's "regexp" returns 0, which is the false
value, and in my example, if the string is not found, the x variable is not even
set. Therefore if you try to do the next command (or whatever with a $x in it),
it will abort, saying x does not exist !
PHP's strpos returns FALSE, which is equal to zero (==0) but not identical to
zero (===0). preg_match returns 0 and sets the $m variable to an empty array
(whereas if it matched it would return 1). preg_match could also return FALSE
but only if the pattern string has a syntax error in it, so, usually, you don't
have to distinguish it from 0.
Le 2017-02-19 à 10:51:00, Allen Huffman a écrit :
> In the examples that return 0 if matching the in the first position or "", what do they return if no match is found?
>
>> On Feb 19, 2017, at 8:36 AM, Mathieu Bouchard <matju at artengine.ca> wrote:
>>
>>
>> I searched for real and it isn't exactly that universal. Let's start with some that are consistent :
>>
>> Ruby (both plain search & pattern matching) :
>> "ABC".index"A"
>> 0
>> "ABC".index""
>> 0
>> /A/ =~ "ABC"
>> 0
>> // =~ "ABC"
>> 0
>>
>> Perl :
>> print index("ABC","A")."\n"
>> 0
>> print index("ABC","")."\n"
>> 0
>> "ABC" =~ /A/; print "@-\n"
>> 0
>> "ABC" =~ //; print "@-\n"
>> 0
>>
>> Python :
>> "ABC".index("A")
>> 0
>> "ABC".index("")
>> 0
>> re.search("A","ABC").start()
>> 0
>> re.search("","ABC").start()
>> 0
>>
>> Java :
>> System.out.println("ABC".indexOf("A"));
>> 0
>> System.out.println("ABC".indexOf(""));
>> 0
>>
>> C (where this behaviour probably originated from) :
>> const char *s="abc"; printf("%zd %zd\n",strstr(s,"a")-s,strstr(s,"")-s);
>> 0 0
>>
>> C++ STL :
>> string s="abc"; printf("%zd %zd\n",s.find("a"),s.find(""));
>> 0 0
>>
>> Unix shells pattern matching :
>> echo ABC | grep -b A
>> 0:ABC
>> echo ABC | grep -b ""
>> 0:ABC
>>
>> (the list could go on)
>>
>> However, Tcl is not consistent (doesn't find empty string) :
>> string first A ABC
>> 0
>> string first "" ABC
>> -1
>>
>> And also not consistent in PHP and issues a warning (wow !) :
>> var_export(strpos("abc","a"));
>> 0
>> var_export(strpos("abc",""));
>> PHP Warning: strpos(): Empty needle in php shell code on line 1
>> false
>>
>> But there's an alternate consistent way in Tcl, using pattern matching :
>> regexp -indices a abc x; lindex $x 0
>> 0
>> regexp -indices "" abc x; lindex $x 0
>> 0
>>
>> And in PHP too :
>> preg_match("/a/","abc",$m,PREG_OFFSET_CAPTURE); var_export($m[0][1]);
>> 0
>> preg_match("//","abc",$m,PREG_OFFSET_CAPTURE); var_export($m[0][1]);
>> 0
>>
>>
>>> Le 2017-02-10 à 15:05:00, Paulo Garcia a écrit :
>>>
>>> Interesting discussion. Indeed the same behaviour is found in Python and
>>> Javascript:
>>>
>>> NodeJS:
>>>
>>>> a='ABC'
>>> 'ABC'
>>>> a.indexOf('A')
>>> 0
>>>> a.indexOf('B')
>>> 1
>>>> a.indexOf('C')
>>> 2
>>>> a.indexOf('')
>>> 0
>>>>
>>>
>>> Python:
>>>
>>>>>> a='ABC'
>>>>>> a.index('B')
>>> 1
>>>>>> a.index('A')
>>> 0
>>>>>> a.index('')
>>> 0
>>>>>>
>>>
>>>
>>> Paulo
>>>
>>> On Fri, Feb 10, 2017 at 2:29 PM, Mathieu Bouchard <matju at artengine.ca>
>>> wrote:
>>>
>>>>
>>>> Nope, it's like that in probably every language that has such a search
>>>> function : an empty string is found at EVERY position in the string,
>>>> therefore the first match it finds is wherever the search begins. It's the
>>>> normal way of doing it, because it logically fits the way N characters are
>>>> searched in a string, for N=0, and the behaviour you wish would mean adding
>>>> a special case for N=0 where programmers prefer to define functions so that
>>>> they have the least possible number of cases.
>>>>
>>>> (However, in other languages, 0 is the first position in the string,
>>>> whereas "no match" is represented by another value (such as -1 or nil or
>>>> error))
>>>>
>>>>
>>>> Le 2017-02-09 à 15:12:00, Allen Huffman a écrit :
>>>>
>>>> ...but I noticed today it finds the empty string: ""
>>>>>
>>>>> PRINT INSTR("ABCDE", "")
>>>>> 1
>>>>>
>>>>> That seems like a bug.
>>>>> A$=""
>>>>> PRINT INSTR("ABCD", A$)
>>>>> 1
>>>>>
>>>>
>>>> ______________________________________________________________________
>>>> | Mathieu BOUCHARD --- tél: 514.623.3801, 514.383.3801 --- Montréal, QC
>>>>
>>>>
>>>> --
>>>> Coco mailing list
>>>> Coco at maltedmedia.com
>>>> https://pairlist5.pair.net/mailman/listinfo/coco
>>>>
>>>
>>>
>>>
>>> --
>>> --------------------------------------------
>>> Paulo
>>>
>>> --
>>> Coco mailing list
>>> Coco at maltedmedia.com
>>> https://pairlist5.pair.net/mailman/listinfo/coco
>>
>> ______________________________________________________________________
>> | Mathieu BOUCHARD --- tél: 514.623.3801, 514.383.3801 --- Montréal, QC
>>
>> --
>> Coco mailing list
>> Coco at maltedmedia.com
>> https://pairlist5.pair.net/mailman/listinfo/coco
>
>
> --
> Coco mailing list
> Coco at maltedmedia.com
> https://pairlist5.pair.net/mailman/listinfo/coco
______________________________________________________________________
| Mathieu BOUCHARD --- tél: 514.623.3801, 514.383.3801 --- Montréal, QC
More information about the Coco
mailing list