How do I make the weird characters in Spanish go away? It persists even after changing JDB URL to UTF 8

I see words such as súbito, autónomo. Why aren't they proper. I had a problem while entering all Russian characters via JDBC into the MySQL database. The problem there was that the Russian characters were appearing as ???? instead of the words. That got fixed when I changed the JDBC URL to have UTF-8 encoding

jdbc:mysql://localhost/metaphor_repository?characterEncoding=utf8"

Doing the same does not fix the problem here.

public void readPatterns() throws FileNotFoundException, IOException, InstantiationException, ClassNotFoundException, IllegalAccessException, SQLException {

    //Code to initialize database and stuff
    PreparedStatement preparedStatement = null;
    String key1 = null;
    String databaseURL = "jdbc:mysql://localhost/metaphor_repository?characterEncoding=utf8";
    String databaseUser = "root";
    String databasePassword = "D0samrD9";
    String dbName = "metaphor_repository";
    Connection conn = null;
    Class.forName("com.mysql.jdbc.Driver").newInstance();
    conn = DriverManager.getConnection(databaseURL, databaseUser, databasePassword);
    System.out.println("CONNECTED");
    String insertTableSQL = "INSERT INTO source_domain_spanish_oy2_jul2014_2(filename, seed, words, frequency, type, after_before) VALUES(?,?,?,?,?,?);";


    String foldername = "/Desktop/Espana/AdjectiveBefore/";
    File Folder = new File(foldername);
    File[] ListOfFiles = Folder.listFiles();
    for (int x = 0; x < ListOfFiles.length; x++) {
        File file = new File(ListOfFiles[x].getAbsolutePath());
        InputStream in = new FileInputStream(file);
        InputStreamReader reader1 = new InputStreamReader(in);
        BufferedReader br = new BufferedReader(reader1);
        String fileData = new String();
        String filename = ListOfFiles[x].getName().toUpperCase();
        int total;
        BufferedWriter out;
        FileWriter fstream;
        BufferedWriter outLog;
        String fileName = new String("/Desktop/Espana/AdjectiveBeforeResult/" + ListOfFiles[x].getName());
        fstream = new FileWriter(fileName);
        out = new BufferedWriter(fstream);
        while ((fileData = br.readLine()) != null) {
            Map<String, Integer> sortedMapDesc = searchDatabase(fileData);;
            //Code Written By Aniruth to extract some info: seed, before_after
            String seed = fileData;
            String before_after = seed.split("\\[")[0];
            seed = seed.replaceAll("\\(v.\\)", "");
            seed = seed.replaceAll("\\(n.\\)", "");
            seed = seed.substring(seed.indexOf("]") + 1, seed.indexOf("."));
            seed = seed.substring(seed.indexOf("[") + 1, seed.indexOf("]"));
            seed = seed.replaceAll("'", "");
            seed = seed.trim();
            seed = seed.toUpperCase();


            Set<String> keySet = sortedMapDesc.keySet();
            total = 0;
            Iterator<String> keyItr = keySet.iterator();
            out.write("++++++++++++++++++++++++++++++++++++++++++\n");

            if (sortedMapDesc.isEmpty()) {
                out.write(fileData + "\n");
                out.write(fileData + "returned zero results \n");
                out.flush();
            } else {
                out.write(fileData + "\n");
                int i = 1;

                String spaceString = " ";
                while (keyItr.hasNext()) {
                    key1 = keyItr.next();


                    for (int k = 0; k < 40 - key1.length(); k++) {
                        spaceString = spaceString + " ";
                    }
                    total = total + sortedMapDesc.get(key1);

                    out.write(i + ":" + "'" + filename + "'" + ":" + "'" + seed + "'" + ":" + "'" + key1.replaceAll("'", "") + "'" + ":" + sortedMapDesc.get(key1) + ":" + "'" + "ADJ" + "'" + ":" + "'" + before_after + "'" + "\n");

                    //Code to add to the databases
                    preparedStatement = conn.prepareStatement(insertTableSQL);

                    preparedStatement.setString(1, filename);
                    preparedStatement.setString(2, seed);
                    preparedStatement.setString(3, key1);

                    if (sortedMapDesc.get(key1) != null) {
                        preparedStatement.setInt(4, sortedMapDesc.get(key1));
                    } else {
                        preparedStatement.setInt(4, 0);
                    }
                    preparedStatement.setString(5, "ADJ");
                    preparedStatement.setString(6, before_after);
                    System.out.println("Checking Prepared Statement:" + preparedStatement);
                    preparedStatement.executeUpdate();
                    System.out.println("Record Inserted :| ");
                    preparedStatement.close();





                    //System.out.println(out.toString());
                    i++;
                    spaceString = " ";
                }
                out.flush();
            }


        }
    }
    conn.close();
}
Jon Skeet
people
quotationmark

Well this is probably the first problem:

InputStreamReader reader1 = new InputStreamReader(in);

That's loading the file using the platform default encoding, which may or may not be appropriate for the file in question.

Likewise later:

fstream = new FileWriter(fileName);

Again, that will use the platform default encoding.

Always be explicit about your encoding - UTF-8 is usually a good choice, if you're in a position to choose.

Next, work out where issues are actually coming up. Log the exact UTF-16 code units in your strings, as integers, and try to spot when they go from "good" to "bad" (if they're ever good in the first place). See my blog post on diagnosing this sort of issue for more details. Something like this is useful:

public static void dumpString(String text) {
    for (int i = 0; i < text.length(); i++) {
        int codeUnit = text.charAt(i);
        System.out.printf("%d: %c %04x%n", i, (char) codeUnit, codeUnit);
    }
}

(Adjust to your logging infrastructure etc, of course.)

people

See more on this question at Stackoverflow