Views: 1384
Last Modified: 29.03.2024

Let's overview general sequence for site conversion from encoding CP1251 to UTF-8.

Attention! Before starting the site conversion, please make sure you have created site and database backup copy. We strongly recommend to execute test conversion runs at a separate site copy. Site conversion is a complex operation and each case is highly detailed. Please exercise caution, because the likelihood of loosing important data is significant when performing such operations.

  General sequence of actions

You can connect to SSH to edit files and introduce updates to the server.

General sequnce:

  1. Change the encoding to UTF-8 for all languages inside the regional settings Settings > System settings > Language parameters > Regional settings ;

  2. mbstring.func_overload prior to the main module version 20.100.0

  3. Set the value default_charset = "utf8" in the settings file php.ini;

    The location of the settings file php.ini can be viewed previously in the admin section at the page PHP settings PHP settings (Settings > Tools > System Administration > PHP settings), this page displays information about current PHP settings.
    (Loaded Configuration File) or using a PHP function phpinfo().

    In case of the Hosting-located site, you may need to request a hosting provider to implement these settings.

  4. Add to /bitrix/php_interface/dbconn.php
    define("BX_UTF", true);
    

    In the same file, delete the strings, having the encoding CP1251:

    setlocale(LC_ALL, 'en_EN.CP1251');
    mb_internal_encoding("Windows-1251");
    
  5. Set the value 'value' => true for utf_mode in the file /bitrix/.settings.php:
    utf_mode =>
        array(
            'value' => true,
            'readonly' => true,
        ),
    
  6. Re-encode the complete database to UTF-8. Most likely, you will have to request assistance of server administrator.
  7. Configure the file /bitrix/php_interface/after_connect.php
    $DB->Query("SET NAMES 'utf8'");
    $DB->Query('SET collation_connection = "utf8_unicode_ci"');
    
    and the file /bitrix/php_interface/after_connect_d7.php
    $this->queryExecute("SET NAMES 'utf8'");
    $this->queryExecute('SET collation_connection = "utf8_unicode_ci"');
    //Versions prior to main 22.0 used variable $connection instead of $this.
  8. Set inside /.htaccess:
    php_value default_charset utf-8
    
  9. Re-encode all site files to UTF-8.
  10. Reset all cache;
  11. Exit and enter the site to refresh the session data.

  Database

For database (DB) conversion, you need to change the database encoding, all its tables and all table text fields. DO NOT convert a database from administrative section. Use other available tools for this purpose.

In a simplest case (without serialized data) you can re-encode the database all tables using the following procedure:

  • Update the encoding for the site database itself:
    ALTER DATABASE database_name charset=utf8;
    
  • Change the encoding for the database connection:
    SET NAMES 'utf8'
    
    ALTER DATABASE database_name CHARACTER SET utf8 COLLATE utf8_unicode_ci;
    
  • Execute the query that will allow to find all the database tables and to generate the query for updating the encoding for each one of them:
    SELECT CONCAT('ALTER   TABLE `', t.`TABLE_SCHEMA`, '`.`', t.`TABLE_NAME`, '` CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;') as sqlcode
    FROM `information_schema`.`TABLES` t
    WHERE 1
    AND t.`TABLE_SCHEMA` = 'database_name'
    ORDER BY 1
    ;
    
  • Get the list of queries as a response:
    ALTER TABLE `database_name`.`table_name` CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
    
  • Execute all queries. Database and tables have been re-encoded.

Attention: When database stores serialized data, the abovementioned method of conversion won't be suitable for them. Use special methods/tools for conversion of such data.

  Files

In a simple case, when all site files have the encoding CP1251, re-encode them to UTF-8 by executing the following command in the root site folder (for UNIX systems):

// fo to the site root folder. For example:
cd /var/www/html/ 

// execute the command for file re-encoding
find . -name '*.php' -type f -exec iconv -fcp1251 -tutf8 -o /tmp/tmp_file {} \; -exec mv /tmp/tmp_file {} \;

Important:
  1. This method won't be suitable for sites that have different localization languages, because the structure will have files with different encodings.
  2. Please consider specifics for Unix version that you use. The example specified above may not suitable for it. In this case, adapt it for your OS. For example:
    // execute the command for re-coding of files
    find ./ -type f -name "*.php" -exec bash -c 'file="$1"; iconv -f cp1251 -t utf8 "$file" > "${file}.tmp" && mv "${file}.tmp" "$file"' _ {} \;

Using third-party software or converting files manually

Sometimes, when using third-party software or choosing to convert files manually there is an occurring special sequence of characters, the so-called BOM. These characters must be located only at the start of file, because the final page is a composite of several php files, and wildcard characters appear in the page body. If you convert files manually - do not save with BOM! file signature

  Workflows

Workflow templates with variables, constants and parameters are stored in a serialized and packed format inside the table b_bp_workflow_template. Changing database encoding won't affect them. To update their encoding, you need to perform additional actions.

First, create a copies of tables b_bp_workflow_template using one of several methods detailed below:

  1. via copying via SQL queries:
    //create a new table, similar to the original one
    CREATE TABLE b_bp_workflow_template_bak LIKE b_bp_workflow_template;
    // copy data into a created table
    INSERT INTO b_bp_workflow_template_bak SELECT * FROM b_bp_workflow_template;
    
  2. via creating a full database reserve copy.

Next step is to execute script in a command PHP string that will update data encoding:

cmodule::includemodule("bizproc");
$connection = \Bitrix\Main\Application::getConnection();

$sql_select = "select * from b_bp_workflow_template";
$process = $connection->query($sql_select);

while ($r = $process->fetch())
{
	$gztemp = $r['TEMPLATE'];
	$gzvar = $r['VARIABLES'];
	$gzconst = $r['CONSTANTS'];
	$gzpar = $r['PARAMETERS'];
	
	// Unpack workflow data.
	$serializedTemplate = @gzuncompress($gztemp);
	$serializedVariables = @gzuncompress($gzvar);
	$serializedConstants = @gzuncompress($gzconst);
	$serializedParameters = @gzuncompress($gzpar);
	
	// Unserialize workflow data.
	$serializedTemplate = @unserialize($serializedTemplate);
	$serializedVariables = @unserialize($serializedVariables);
	$serializedConstants = @unserialize($serializedConstants);
	$serializedParameters = @unserialize($serializedParameters);
	
	if ($serializedTemplate === false) continue;
	
	// Update data encoding.
	$serializedTemplate = $APPLICATION->ConvertCharsetArray(
		$serializedTemplate,
		'windows-1251',
		'utf-8'
	);
	$serializedVariables = $APPLICATION->ConvertCharsetArray(
		$serializedVariables,
		'windows-1251',
		'utf-8'
	);
	$serializedConstants = $APPLICATION->ConvertCharsetArray(
		$serializedConstants,
		'windows-1251',
		'utf-8'
	);
	$serializedParameters = $APPLICATION->ConvertCharsetArray(
		$serializedParameters,
		'windows-1251',
		'utf-8'
	);
	
	$r["TEMPLATE"] = $serializedTemplate;
	$r["VARIABLES"] = $serializedVariables;
	$r["CONSTANTS"] = $serializedConstants;
	$r["PARAMETERS"] = $serializedParameters;
	
	// Save updated data.
	CBPWorkflowTemplateLoader::update(
		$r["ID"],
		[
			'TEMPLATE' => $r['TEMPLATE'],
			'VARIABLES' => $r['VARIABLES'],
			'CONSTANTS' => $r['CONSTANTS'],
			'PARAMETERS' => $r['PARAMETERS']
		],
		$r,
		false,
		false
	);
}

  Hints and links

Main steps for site conversion are complete. In case of any errors occurring when opening the site, enable the debugging mode 'debug' => true in the file /bitrix/.settings.php. This will allows to see, where and which errors have occurred.

You must perform the system check The System check form (Settings > Tools > System check) is designed for comprehensive system parameters check to match recommended minimal product technical requirements and enabling proper project operation.
. The check results will display what issues must be addressed and rectified. Use pop-up hints under the question mark characters to view additional details.

Check the system check log if issues have occurred with database tables. The log file ends with queries that can be used for removing these errors. It's recommended to copy the database before starting the repairing procedure.

Attention! Starting from version 23.200.0 there is an alternative method to change the encoding - Convert to UTF-8 wizard. It's located at the wizard list page /bitrix/admin/wizard_list.php?lang=en. All its steps are supplied with necessary clarifications.

Related links:





Courses developed by Bitrix24