What is XML?
- Extensible Markup Language XML 1.0 1998
- Easier-to-use subset of SGML (Standard Generalized Markup Language)
- XML is a text-based markup language
- Standard for data interchange on the web
- Set of rules for designing semantic tags
- Meta-markup language to define other languages
- XML 1.0 Specification
http://www.w3.org/TR/REC-xml
HTML and XML
- HTML is an application of SGML
- XML is a subset of SGML
- XHTML is an application of XML
XML File Sample
<?xml version="1.0"?>
<dining-room>
<manufacturer>The Wood Shop</manufacturer>
<table type="round" wood="maple">
<price>$199.99</price>
</table>
<chair wood="maple">
<quantity>6</quantity>
<price>$39.99</price>
</chair>
</dining-room>
XML describes Structure and Semantics, Not Formatting
HTML Example
<DL>
<DT>Mambo
<DD>by Enrique Garcia
</DL>
<UL>
<LI>Producer: Enrique Garcia
<LI>Publisher: Sony Music Entertainment
<LI>Length: 3:46
<LI>Written: 1991
<LI>Artist: Azucar Moreno
</UL>
XML describes Structure and Semantics, Not Formatting (2)
XML Example
<SONG>
<TITLE>Mambo</TITLE>
<COMPOSER>Enrique Garcia</COMPOSER>
<PRODUCER>Enrique Garcia</PRODUCER>
<PUBLISHER>Sony Music Entertainment</PUBLISHER>
<LENGTH>3:46</LENGTH>
<YEAR>1991</YEAR>
<ARTIST>Azucar Moreno</ARTIST>
</SONG>
What's So Great About XML?
Easy Data Exchange
- Growth of propietary data formats
- Conversion Programs (Applications, versions ..)
- Data and markup are stored as text
- Avoid store simple data in huge files
What's So Great About XML? (2)
Customizing Markup Languages
- Banking Industry Technology Secretariat (BITS)
- Financial Exchange (IFX)
- Schools Interoperability Framework (SIF)
- Common Business Library (CBL)
- Electronic Business XML Initiative (ebXML)
- The Text Encoding Initiative (TEI)
What's So Great About XML? (3)
Self-Describing Data
<?xml version="1.0" encoding="UTF-8"?>
<DOCUMENT>
<GREETING>Hello from XML</GREETING>
<MESSAGE>Welcome to Programing XML in Java</MESSAGE>
</DOCUMENT>
What's So Great About XML? (4)
Structured and Integrated Data
<?xml version="1.0"?>
<SCHOOL>
<CLASS type="seminar">
<CLASS_TITLE>XML In The Real World</CLASS_TITLE>
<CLASS_NUMBER>6.031</CLASS_NUMBER>
<SUBJECT>XML</SUBJECT>
<START_DATE>6/1/2002</START_DATE>
<STUDENTS>
<STUDENT status="attending">
<FIRST_NAME>Edward</FIRST_NAME>
<LAST_NAME>Samson</LAST_NAME>
</STUDENT>
<STUDENT status="withdrawn">
<FIRST_NAME>Ernestine</FIRST_NAME>
<LAST_NAME>Johnson</LAST_NAME>
</STUDENT>
</STUDENTS>
</CLASS>
</SCHOOL>
Well-Formed XML Documents
- Follow the syntax rules setup for XML by W3C in
- The XML 1.0 Specification (www.w3.org/TR/REC-xml)
- Contain one or more elements
- Root element must contain all the other elements
- Each element nest inside any enclosing elements properly
Valid XML Documents
- Association with a Document Type Definition (DTD)
- Comply with that DTD
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="first.css"?>
<!DOCTYPE DOCUMENT [
<!ELEMENT DOCUMENT (GREETING, MESSAGE)>
<!ELEMENT GREETING (#PCDATA)>
<!ELEMENT MESSAGE (#PCDATA)>
]>
<DOCUMENT>
<GREETING>Hello from XML</GREETING>
<MESSAGE>Welcome to Programing XML in Java</MESSAGE>
</DOCUMENT>
Related Technologies
Hypertext Markup Language
- HTML most common output format of XML
- Web Browsers: Internet Explorer 5.0, Netscape 6.0
- Different way to design a Web site.
Related Technologies (2)
Cascading Style Sheets
- Define formatting properties
- Font Size
- Font family
- Font weight
- Paragraph indentation
- Paragraph alignment
- Multiple style sheets can be applied to a single document
- Multiple styles can be applied to a single element.
Related Technologies (3)
URLs and URIs
Related Technologies (4)
The Unicode Character Set
- American Standard Code for Information Interchange (ASCII) 0-255 'A' - 65
- XML provides full Support for the two-byte Unicode Character Set. 0-65,535
http://www.unicode.org
- XML Documents written in:
- ASCII
- UTF-8 Compressed version of Unicode (uses 8 bits to represent characters)
<?xml version="1.0" encoding="UTF-8"?>
- XML defines character reference to encode Unicode characters.
© < π
- Universal Character System (UCS ISO 10646)
- 4 bytes per symbol
- UCS-2 and UCS-4 encoding
How Do I Use XML?
- XML Document is parsed
- Data is manipulated
- APIs available in Java, C, C++, Perl..
Simple API for XML - SAX
- Event-based framework for parsing XML data
- Methods such as startDocument(), endElement()
- Set of errors and warnings
- http://www.megginson.com/SAX
- Several parsers can be plugged into the SAX API
Document Object Model - DOM
- Manipulation of XML Data
- Provides a representation of an XML Document as a tree.
- Reads XML Document into memory
- http://www.w3.org/DOM
Sun's Java API for XML Parsing - JAXP
Java and XML: A Perfect Match
- Java is portable code, XML is portable data
- Applications completely portable
- Java Virtual Machine (JVM)
- Standards-based data layer
- Java provides the most robust set of:
- APIs - JAXP
- Parsers - XP
- Processors - Saxon
- Publishing Frameworks - Cocoon
- Tools for XML - XML Pro
The Life of an XML Document
XML Editors
Create XML documents
- Text Editors - vi, emacs, notepad
- XML Editors
- Adobe FrameMaker, www.adobe.com
- XML Pro, www.vervet.com
- XML Writer, xmlwriter.net
- XML Notepad, msdn.microsoft.com/xml/notepad/intro.asp
- XMetal from SoftQuad, xmetal.com
- XML Spy, www.xmlspy.com
XML Editors (XML Spy)
More XML Spy
XML Parsers
- Read XML Document
- Verify that XML is well formed
- Verify that XML is valid
XML Validators
Verify that XML is valid
XML Browsers
Display the Data to the User
- Internet Explorer 5
- Display directly XML Documents
- Handle XML in scripting Languages (JScript, VBScript)
- Bind XML to ActiveX Data Object (ADO) database recordsets
- XML integrated into the Office 2000 suite of applications
- Netscape Navigator 6
- Display directly XML Documents
- Handle XML in scripting Languages (Javascript 1.5)
- Support the XML-based User Interface Language (XUL). XUL lets you
configure the controls in the browser
- Jumbo
- Display XML
- Use CML to draw molecules
XML Resources
XML Applications
Languages based on XML
- Chemical Markup Language (CML)
- Mathematical Markup Language (MathML)
- Channel Definition Format (CDF)
- Synchronized Multimedia Integration Language (SMIL)
- XHTML
- Scalable Vector Graphics (SVG)
- MusicML
- VoxML