[rkward] rkward/rbackend: Revert most of 2d6d7f2c95531df17699b51a61f3024e9b8e123a and use a different encoder, instead.

Thomas Friedrichsmeier thomas.friedrichsmeier at ruhr-uni-bochum.de
Mon Jul 4 10:32:11 UTC 2016

Git commit 47f09cf2c7dce00ec6dbd53869c53fe319586f37 by Thomas Friedrichsmeier.
Committed on 04/07/2016 at 10:29.
Pushed by tfry into branch 'master'.

Revert most of 2d6d7f2c95531df17699b51a61f3024e9b8e123a and use a different encoder, instead.

Problem with previous attempt was that R would understand the command was given in UTF8, but would
not be smart enough to flag resulting strings as UTF8-encoded. This fix instead simply uses an
encoder that will turn - e.g. - "รค" into "\303\244", instead of "".

M  +5    -4    rkward/rbackend/rkrbackend.cpp


diff --git a/rkward/rbackend/rkrbackend.cpp b/rkward/rbackend/rkrbackend.cpp
index a4f09e2..9550ed1 100644
--- a/rkward/rbackend/rkrbackend.cpp
+++ b/rkward/rbackend/rkrbackend.cpp
@@ -307,8 +307,6 @@ int RReadConsole (const char* prompt, unsigned char* buf, int buflen, int hist)
 					RKRBackend::repl_status.user_command_completely_transmitted = false;
 					RKRBackend::repl_status.user_command_parsed_up_to = 0;
 					RKRBackend::repl_status.user_command_successful_up_to = 0;
-					// TODO FIXME: This is a problem when sending characters which are not encodable in R's current locale. I wish we could simply tell R that the input is UTF8 (as we do in parseCommand()).
-					// Or is there? Alternatively, perhaps we can somehow hack-in UTF8 character positions?
 					RKRBackend::repl_status.user_command_buffer = RKRBackend::this_pointer->current_locale_codec->fromUnicode (command->command);
 					RKTransmitNextUserCommandChunk (buf, buflen);
 					RKRBackend::repl_status.user_command_status = RKRBackend::RKReplStatus::UserCommandTransmitted;
@@ -1029,6 +1027,9 @@ bool RKRBackend::startR () {
 	RKSignalSupport::installSignalProxies ();	// for the crash signals
 	RKSignalSupport::installSigIntAndUsrHandlers (RK_scheduleIntr);
+	RKRBackend::this_pointer->current_locale_codec = RKGetCurrentLocaleCodec ();  // Ok, why is this needed? Beats me (mostly), but the result is different form codecForLocale() used in initialization:
+	                                                                              // This one will turn non-representable characters into unicode numbers, the other one will just strip them...
+	                                                                              // KF5 TODO: Use makeEncoder() and makeDecoder() to get defined behavior on this
 // register our functions
 	R_CallMethodDef callMethods [] = {
@@ -1135,11 +1136,11 @@ SEXP parseCommand (const QString &command_qstring, RKRBackend::RKWardRError *err
 	SafeParseWrap wrap;
 	wrap.status = PARSE_NULL;
-	QByteArray localc = command_qstring.toUtf8 (); // needed so the string below does not go out of scope
+	QByteArray localc = RKRBackend::this_pointer->current_locale_codec->fromUnicode (command_qstring); // needed so the string below does not go out of scope
 	const char *command = localc.data ();
 	PROTECT(wrap.cv=Rf_allocVector(STRSXP, 1));
-	SET_STRING_ELT(wrap.cv, 0, Rf_mkCharCE(command, CE_UTF8));
+	SET_STRING_ELT(wrap.cv, 0, Rf_mkChar(command));
 	// Yes, if there is an error in the parse, R does jump back to toplevel!
 	// trying to parse list(""=1) is an example in R 3.1.1

More information about the rkward-tracker mailing list