Question

Ben boşluklarla dize bölmek gerekiyor, ama tırnak içinde ifade unsplitted muhafaza edilmelidir. Örnek:

  word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5

Bu preg_split sonrası dizi ile sonuçlanmalıdır:

array(
 [0] => 'word1',
 [1] => 'word2',
 [2] => 'this is a phrase',
 [3] => 'word3',
 [4] => 'word4',
 [5] => 'this is a second phrase',
 [6]  => 'word5'
)

Bunu nasıl yapmak benim sıradanifade oluşturmak gerekir?

PS. Orada related question, ama benim durumumda çalıştığını sanmıyorum. Kabul cevabı kelime yerine whitespaces bulmak için regexpi sağlar.

Answer 1

# Regex irc kanalından kullanıcı MizardX yardımı ile (irc.freenode.net) çözüm bulundu. Hatta tek tırnak destekler.

$str= 'word1 word2 \'this is a phrase\' word3 word4 "this is a second phrase" word5 word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5';

$regexp = '/\G(?:"[^"]*"|\'[^\']*\'|[^"\'\s]+)*\K\s+/';

$arr = preg_split($regexp, $str);

print_r($arr);

Sonucudur:

Array (
    [0] => word1
    [1] => word2
    [2] => 'this is a phrase'
    [3] => word3
    [4] => word4
    [5] => "this is a second phrase"
    [6] => word5
    [7] => word1
    [8] => word2
    [9] => "this is a phrase"
    [10] => word3
    [11] => word4
    [12] => "this is a second phrase"
    [13] => word5  
)

PS. Tek dezavantajı bu sıradanifade sadece PCRE'nin 7 için çalışıyor olmasıdır.

Ben üretim sunucusunda PCRE'nin 7 desteği yok ki çıktı, sadece PCRE 6 orada yüklenir. Bu PCRE'nin 7 için önceki, (\ G ve \ K kurtuldum) olduğunu çalışacaktır regexp kadar esnek olmasa da:

/(?:"[^"]*"|\'[^\']*\'|[^"\'\s]+)+/

Verilen giriş sonucu elde etmek için yukarıdaki ile aynıdır.

Answer 2

Lütfen tırnak iyi çiftleri, yani tanımlandığı varsayarak, her 2 alanları patlayabilir ve döngü için gidebilirsiniz. örneğin

$str = "word1 word2 \"this is a phrase\" word3 word4 \"this is a second phrase\" word5 word6 \"lastword\"";
print $str ."\n";
$s = explode('"',$str);
for($i=1;$i<count($s);$i+=2){
    if ( strpos($s[$i] ," ")!==FALSE) {
        print "Spaces found: $s[$i]\n";
    }
}

çıktı

$ php test.php
Spaces found: this is a phrase
Spaces found: this is a second phrase

Hayır karmaşık sıradanifade gereklidir.

Answer 3

Bu bağlantılı diğer soruda regex kullanarak oldukça kolaydır?

<?php

$string = 'word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5';

preg_match_all( '/(\w+|"[\w\s]*")+/' , $string , $matches );

print_r( $matches[1] );

?>

çıktı:

Array
(
     [0] => word1
     [1] => word2
     [2] => "this is a phrase"
     [3] => word3
     [4] => word4
     [5] => "this is a second phrase"
     [6] => word5
)

Answer 4

Herkes regex vs kriter tokenizing istiyorsun? Benim tahminim patlayabilir () fonksiyonu herhangi bir hız yararı için biraz ağır olmasıdır. Bununla birlikte, burada başka bir yöntem var:

(edited because I forgot the else case for storing the quoted string)

$str = 'word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5';

// initialize storage array
$arr = array();
// initialize count
$count = 0;
// split on quote
$tok = strtok($str, '"');
while ($tok !== false) {
    // even operations not in quotes
    $arr = ($count % 2 == 0) ? 
                               array_merge($arr, explode(' ', trim($tok))) :
                               array_merge($arr, array(trim($tok)));
    $tok = strtok('"');
    ++$count;
}

// output results
var_dump($arr);

Nasıl bütün boşlukları tırnak arasında olanları hariç bulabiliriz?

4 Cevap

etiketler