【正文】
for English – ASCII (American Standard Code for Information Interchange) – “plain text” – Special characters are exceptions, which are represented in SGML version of TEI and CES using entity references (included between ampersand and semicolon) ? 163。 = 163。 ? 233。 = amp。eacute。 ? The ISO8859 family of 15 members – Complementary standardized character codes ? Unicode (Unification Code) – Supported in XML – UTF8 (8bit Unicode transformation format) – UTF16 (16bit Unicode transformation format) ? See Unicode official website for latest updates – Character encoding ? ASCII (ANSI), GB2312, Big5, UTF8, Unicode – For more details see corpora/ ? WordSmith 5 is based on Unicode (16bit) – Unless your corpus is all ASCII characters, WST may NOT produce reliable results unless it is converted into Unicode – WST Utilities – Text Converter – MLCT for conversion ? The bination of XML and Unicode is the current standards in corpus building (Xiao et al 2021) Text conversion Data capture tools ? Freeware tools that help you to download all pages at a selected website at one go – GrabaSite ? ? HTTrack ? ? Webgetter in WST or – WST menu – Utilities – WebGetter – Downloads all the pages containing the specified search word – But does not tidy up the data ? Multilingual Corpus Toolkit (MLCT) – – Can download, tidy up and POS tag the selected webpage – Can markup textual anization automatically (p, s) WST WebGetter Using MLCT to capture web text Using MLCT to capture web text Transcriber ? A tool for assisting the manual annotation of speech signals – Segmenting long duration speech recordings – Transcribing audio recordings – Labelling speech turns, topic changes and acoustic conditions ? Supporting multiple platforms – Windows XP/2k – Mac OS X – Linux ? Downloading the programme, user manual, annotation guide – Transcriber Praat Well known and widely used (many online tutorials) Suitable for acoustic analysis of files that are shorter than 15 minutes Audacity Recording and editing sounds Can work with large files Digitalise your cassette tapes Download at